[plt-scheme] help about regexp or read-line
At Fri, 27 Apr 2007 11:16:37 +0400, wwall wrote:
> I have problem with this code
> (define rx #rx"[_A-Za-zА-Яа-я0-9]+")
> (define zz "function яя(z){ret?rn 1+z;} zzz(2);")
> (regexp-match-positions rx (open-input-string zz))
> return ((0 . 8))
> This is right, but if define zz so
> (define zz "функция яя(z){ret?rn 1+z;} zzz(2);")
> then (regexp-match-positions rx (open-input-string zz)) return ((0 . 14))
> I think it becouse i use UTF, but i have question - how corret this
> error?
Yes, it's a limitation of regexps on strings. The position results of
`regexp-match-positions' are always in terms of bytes, and strings are
implicitly encoded via UTF-8 to obtain bytes.
Here's one way to get the answer in terms of characters:
(define (regexp-match-string-positions rs port)
(let ([m (regexp-match-peek-positions rx port)])
(and
m
(let ([start (bytes-utf-8-length (read-bytes (caar m) port))])
(list
(cons start
(+ start
(bytes-utf-8-length
(read-bytes (- (cdar m) (caar m)) port)))))))))
Matthew