[plt-scheme] Bug in regexp-match*, regexp-split, regexp-match-positions*, maybe others...?

From: Jon Zeppieri (zeppieri at gmail.com)
Date: Sat Sep 26 02:53:39 EDT 2009

Byte string regexp patterns containing bytes with the high bit set don't
seem to work properly with any of the regexp procedures that match multiple
times.  For example...

This works as expected:

> (regexp-split #rx#"\x7f" #"hello\x7fworld")
(#"hello" #"world")

But this does not:

> (regexp-split #rx#"\x80" #"hello\x80world")
(#"hello\200world")


Similarly:

> (regexp-match* #rx#"\x7f" #"hello\x7fworld")
(#"\177")

> (regexp-match* #rx#"\x80" #"hello\x80world")
()


This doesn't affect the procedures that only match once.  For example, this
works fine:

> (regexp-match #rx#"\x80" #"hello\x80world")
(#"\200")


I can reproduce this behavior in 4.1.5 and 4.2.2.1, both on OS X 10.5.8.

-Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20090926/657761ef/attachment.html>

Posted on the users mailing list.