[plt-scheme] file: urls & their relation to path names

From: Robby Findler (robby at cs.uchicago.edu)
Date: Sun Jan 8 18:20:16 EST 2006

At Sun, 08 Jan 2006 14:19:36 -0500, Ray Racine wrote:
> On Sat, 07 Jan 2006 12:52:27 -0600, Robby Findler wrote:
> 
> > At Sat, 07 Jan 2006 11:34:28 -0500, Paul Schlie wrote:
> >> Although I may misunderstand, it would seem file://<some-path> is
> >> formally invalid regardless of the encoding of <some-path> in the
> >> absents of an explicitly specified authority; i.e.:
> > 
> > Yes, I think that's right. 2 slashes is not okay, but 3 and zero are
> ok.
> > I meant to put three in each of my examples.
> > 
> > Robby wrote:
> >>  file://%3f/etc/hosts
> > 
> > but I should have written:
> > 
> >   file:///%3f/etc/hosts
> >
> I was also coding a URI parser/validator on a Sat. :) My interpretation
> of rfc3986 aligns with Paul's and his relative and absolute examples.
> 
> >From 3.3
> If a URI contains an authority component, then the path component must
> either be empty or begin with a slash ("/") character.  If a URI does
> not contain an authority component, then the path cannot begin with two
> slash characters ("//").
> 
> I'm not sure this allows the 3 slash version:
>     file:///%3f/etc/hosts   --> constraint violation

That would seem to be okay to me, it would be an empty authority
component which, as far as I can tell, means the host name is the empty
string and there is no user and there is no port.

> Of course my interpretation falls apart with the cited example in the
> introduction of the rfc where an online help system refers to the
> local system's host file as follows:
>     file:///etc/hosts
> 
> So either there is a fairly sloppy typo in the spec, or I'm missing
> something.  Since these things routinely end up (for me) in the missing
> category, this discussion sent me scurry back to the spec where I found:
> 
> Section 6.2.3
> 
> ... Another case where normalization varies by scheme is in the handling
> of an empty authority component or empty host subcomponent. For many
> scheme specifications, an empty authority or host is considered an
> error; for others, it is considered equivalent to "localhost" or the
> end-user's host. When a scheme defines a default for authority and a URI
> reference to that default is desired, the reference should be normalized
> to an empty authority for the sake of uniformity, brevity, and
> internationalization. If, however, either the userinfo or port
> subcomponents are non-empty, then the host should be given explicitly
> even if it matches the default. ...
> 
> So I constructed a tentative "new" interpretation over morning coffee.
> Which goes something like this...
> 
> 1) file:///etc/hosts is (in general) an invalid URI by rfc3986.

I agree. This seems clear from the grammar given.

> 2) However, the file: scheme specifically allows for an empty authority
>     as equivalent to an implicit "localhost" authority.  So
>    file:///etc/hosts is a legal URI in the context of the file: scheme.

Does the rfc say somewhere that the empty authority should be
equivalent to the authority "localhost"?

> 3) This normalization is a pre- rfc normalization which must occur prior
>    to parsing by a "generic" rfc3986 parser/validator. i.e. An rfc3986
>    validator must be wrapped by a scheme: aware pre- post- normalizer.
> 
> e.g.
> 
> file:///etc/hosts%3f -->[|file: scheme pre- normalizer|] -->
> 
> file://localhost/etc/hosts%3f --> [|rfc3986 parser/validator|] -->
> 
> scheme=file,auth=localhost,path=/etc/hosts%3f --> 
> 
> --> [|rfc3986 post normalizer(pct-decode)|] 
> 
> --> scheme=file,auth=localhost,path=/etc/hosts?
> 
> In this case [|file: scheme post- normalizer|] is not shown as its a
> NOP. The [|mailto: scheme post- normalizer|] might re-introduce case
> sensitivities drop by generic rfc3986 normalization.

I don't think it is, in general, possible to reintroduce
case-sensitivity, since case insensitivity is usually accomplished by
lowercaseing everything. Regardless, I don't think that the mailto:
scheme needs to re-introduce case sensitivity.

Robby


Posted on the users mailing list.