[racket] regexp operations on character input ports returning bytes
When doing a regexp on a character input port, what's the best way to
get string results out instead of bytes results?
For example, this is documented behavior, but not actually what I want,
because I don't want to have to re-encode the bytes as a string (plus, I
would have to query the input port to find out what its character
encoding, if I don't know it a priori):
(regexp-match #rx"^a*" (open-input-string "aaab"))
;;==> '(#"aaa")
I could re-encode the result as string (and is UTF-8 correct?) or try to
do "regexp-match-peek-positions" as a peek and then use "read-string"
(which *does* get a character encoding)?
(bytes->string/utf-8 (car (regexp-match #rx"^a*" (open-input-string
"aaab"))))
;;==> "aaa"
(let ((in (open-input-string "aaab")))
(read-string (cdr (car (regexp-match-peek-positions #rx"^a*" in))) in))
;;==> "aaa"
Is there a better way using regexp operations on input ports? Or
perhaps I should do it manually rather than use regexps?
--
http://www.neilvandyke.org/