[plt-scheme] patch to escape R6RS library names [was: SXML for R6RS]

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Tue Jul 8 09:34:44 EDT 2008

At Fri, 04 Jul 2008 02:22:14 -0700, Derick Eddington wrote:
> On Thu, 2008-07-03 at 07:47 -0600, Matthew Flatt wrote:
> > At Thu, 03 Jul 2008 03:43:02 -0700, Derick Eddington wrote:
> > > I've been meaning to bring this up and ask: would it be possible for PLT
> > > to support any symbols for library names?  Ikarus currently does it by
> > > encoding filename-unfriendly characters like * to %2A.  I'd be willing
> > > to make a patch to do this if it's not a dead end.
> > 
> > That would be great.
> 
> OK, here's my patch, below and attached.  Comments:
> 
> I made it escape all characters that are not valid for a PLT Scheme
> unquoted module path component because that seemed the most
> conservative.
> 
> I made the %-escape encoding have a terminating delimiter of the ;
> character so that library names like (foo \x3BB;) and (foo \x3B;B) do
> not resolve to the same filename.
> 
> Because the (lib rel-string) require form does not allow the % nor ;
> characters in the rel-string, I changed `convert-library-reference' to
> use the (file string) require form and changed `parse-library-reference'
> to return a platform-specific absolute path string.

Thanks!

But I'm uneasy with mixing %-escape encoding with R6RS-style semi-colon
terminators. How about UTF-8 encoding followed by simple two-digit
%-escapes (consistent, I think, with URI percent-encoding and RFC
3986)?


More significantly, I'm worried about generating `file' module
references instead of `lib' module references. That shift will create
various little problems. For example, it won't work with shared
installations that are access through different filesystem paths. Also,
it won't be possible to compile R6RS libraries to bytecode and
distribute them without source (though compile-time version resolution
already interferes to some degree with distributing bytecode).

Ideally, R6RS references should all be encoded in `lib' references ---
but as things stand, you'd have to pick an encoding that would make
some `lib' paths impossible to express as an R6RS encoding (since you'd
have to pick some character sequence to act as an escape, but that
sequence might be used in an existing name).

To resolve this mismatch, we could extend the allowed syntax of `lib'
module elements to include %-escapes. That is, a `%' could be allowed
in a `lib' module-path element, as long as its followed by two
lowercase hexadecimal digits. From the perspective of mapping `lib'
paths to file names, this `%' isn't an encoding; it's just part of the
file name. But if R6RS library names are mapped to `lib' paths by UTF-8
encoding followed by %-encoding of all "special" characters, then
there's a 1-to-1 mapping (ignoring versioning and suffixes) between
R6RS paths and `lib' paths.

If that sounds ok, I can make the needed changes.


Matthew



Posted on the users mailing list.