[racket] regexp operations on character input ports returning bytes

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Sat Dec 25 10:23:54 EST 2010

When doing a regexp on a character input port, what's the best way to 
get string results out instead of bytes results?

For example, this is documented behavior, but not actually what I want, 
because I don't want to have to re-encode the bytes as a string (plus, I 
would have to query the input port to find out what its character 
encoding, if I don't know it a priori):

(regexp-match #rx"^a*" (open-input-string "aaab"))
;;==> '(#"aaa")

I could re-encode the result as string (and is UTF-8 correct?) or try to 
do "regexp-match-peek-positions" as a peek and then use "read-string" 
(which *does* get a character encoding)?

(bytes->string/utf-8 (car (regexp-match #rx"^a*" (open-input-string 
"aaab"))))
;;==> "aaa"

(let ((in (open-input-string "aaab")))
  (read-string (cdr (car (regexp-match-peek-positions #rx"^a*" in))) in))
;;==> "aaa"

Is there a better way using regexp operations on input ports?  Or 
perhaps I should do it manually rather than use regexps?

-- 
http://www.neilvandyke.org/



Posted on the users mailing list.