[plt-scheme] Bug in regexp-match*, regexp-split, regexp-match-positions*, maybe others...?

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sat Sep 26 14:51:49 EDT 2009

This is fixed in SVN. It was a problem with the way `regexp-split' and
`regexp-match*' convert a byte regexp to avoid empty start and end
matches.

Thanks for the report!

At Sat, 26 Sep 2009 02:53:39 -0400, Jon Zeppieri wrote:
> Byte string regexp patterns containing bytes with the high bit set don't
> seem to work properly with any of the regexp procedures that match multiple
> times.  For example...
> 
> This works as expected:
> 
> > (regexp-split #rx#"\x7f" #"hello\x7fworld")
> (#"hello" #"world")
> 
> But this does not:
> 
> > (regexp-split #rx#"\x80" #"hello\x80world")
> (#"hello\200world")
> 
> 
> Similarly:
> 
> > (regexp-match* #rx#"\x7f" #"hello\x7fworld")
> (#"\177")
> 
> > (regexp-match* #rx#"\x80" #"hello\x80world")
> ()
> 
> 
> This doesn't affect the procedures that only match once.  For example, this
> works fine:
> 
> > (regexp-match #rx#"\x80" #"hello\x80world")
> (#"\200")
> 
> 
> I can reproduce this behavior in 4.1.5 and 4.2.2.1, both on OS X 10.5.8.
> 
> -Jon



Posted on the users mailing list.