[plt-scheme] patch to escape R6RS library names [was: SXML for R6RS]
On Tue, 2008-07-08 at 07:34 -0600, Matthew Flatt wrote:
> At Fri, 04 Jul 2008 02:22:14 -0700, Derick Eddington wrote:
> > On Thu, 2008-07-03 at 07:47 -0600, Matthew Flatt wrote:
> > > At Thu, 03 Jul 2008 03:43:02 -0700, Derick Eddington wrote:
> > > > I've been meaning to bring this up and ask: would it be possible for PLT
> > > > to support any symbols for library names? Ikarus currently does it by
> > > > encoding filename-unfriendly characters like * to %2A. I'd be willing
> > > > to make a patch to do this if it's not a dead end.
> > >
> > > That would be great.
> >
> > OK, here's my patch, below and attached. Comments:
> >
> > I made it escape all characters that are not valid for a PLT Scheme
> > unquoted module path component because that seemed the most
> > conservative.
> >
> > I made the %-escape encoding have a terminating delimiter of the ;
> > character so that library names like (foo \x3BB;) and (foo \x3B;B) do
> > not resolve to the same filename.
> >
> > Because the (lib rel-string) require form does not allow the % nor ;
> > characters in the rel-string, I changed `convert-library-reference' to
> > use the (file string) require form and changed `parse-library-reference'
> > to return a platform-specific absolute path string.
>
> Thanks!
>
> But I'm uneasy with mixing %-escape encoding with R6RS-style semi-colon
> terminators. How about UTF-8 encoding followed by simple two-digit
> %-escapes (consistent, I think, with URI percent-encoding and RFC
> 3986)?
>
>
> More significantly, I'm worried about generating `file' module
> references instead of `lib' module references. That shift will create
> various little problems. For example, it won't work with shared
> installations that are access through different filesystem paths. Also,
> it won't be possible to compile R6RS libraries to bytecode and
> distribute them without source (though compile-time version resolution
> already interferes to some degree with distributing bytecode).
>
> Ideally, R6RS references should all be encoded in `lib' references ---
> but as things stand, you'd have to pick an encoding that would make
> some `lib' paths impossible to express as an R6RS encoding (since you'd
> have to pick some character sequence to act as an escape, but that
> sequence might be used in an existing name).
>
> To resolve this mismatch, we could extend the allowed syntax of `lib'
> module elements to include %-escapes. That is, a `%' could be allowed
> in a `lib' module-path element, as long as its followed by two
> lowercase hexadecimal digits. From the perspective of mapping `lib'
> paths to file names, this `%' isn't an encoding; it's just part of the
> file name. But if R6RS library names are mapped to `lib' paths by UTF-8
> encoding followed by %-encoding of all "special" characters, then
> there's a 1-to-1 mapping (ignoring versioning and suffixes) between
> R6RS paths and `lib' paths.
>
> If that sounds ok, I can make the needed changes.
> One more detail: we'd have to constrain `%' escapes in `lib' path
> elements to rule out encodings of the characters that are currently
> allowed. For example, "%41" (= "A", when read as an encoding) would be
> disallowed.
All that sounds ok to me. Thanks!
--
: Derick
----------------------------------------------------------------