[plt-scheme] file: urls & their relation to path names
At Sun, 08 Jan 2006 14:19:36 -0500, Ray Racine wrote:
> On Sat, 07 Jan 2006 12:52:27 -0600, Robby Findler wrote:
>
> > At Sat, 07 Jan 2006 11:34:28 -0500, Paul Schlie wrote:
> >> Although I may misunderstand, it would seem file://<some-path> is
> >> formally invalid regardless of the encoding of <some-path> in the
> >> absents of an explicitly specified authority; i.e.:
> >
> > Yes, I think that's right. 2 slashes is not okay, but 3 and zero are
> ok.
> > I meant to put three in each of my examples.
> >
> > Robby wrote:
> >> file://%3f/etc/hosts
> >
> > but I should have written:
> >
> > file:///%3f/etc/hosts
> >
> I was also coding a URI parser/validator on a Sat. :) My interpretation
> of rfc3986 aligns with Paul's and his relative and absolute examples.
>
> >From 3.3
> If a URI contains an authority component, then the path component must
> either be empty or begin with a slash ("/") character. If a URI does
> not contain an authority component, then the path cannot begin with two
> slash characters ("//").
>
> I'm not sure this allows the 3 slash version:
> file:///%3f/etc/hosts --> constraint violation
That would seem to be okay to me, it would be an empty authority
component which, as far as I can tell, means the host name is the empty
string and there is no user and there is no port.
> Of course my interpretation falls apart with the cited example in the
> introduction of the rfc where an online help system refers to the
> local system's host file as follows:
> file:///etc/hosts
>
> So either there is a fairly sloppy typo in the spec, or I'm missing
> something. Since these things routinely end up (for me) in the missing
> category, this discussion sent me scurry back to the spec where I found:
>
> Section 6.2.3
>
> ... Another case where normalization varies by scheme is in the handling
> of an empty authority component or empty host subcomponent. For many
> scheme specifications, an empty authority or host is considered an
> error; for others, it is considered equivalent to "localhost" or the
> end-user's host. When a scheme defines a default for authority and a URI
> reference to that default is desired, the reference should be normalized
> to an empty authority for the sake of uniformity, brevity, and
> internationalization. If, however, either the userinfo or port
> subcomponents are non-empty, then the host should be given explicitly
> even if it matches the default. ...
>
> So I constructed a tentative "new" interpretation over morning coffee.
> Which goes something like this...
>
> 1) file:///etc/hosts is (in general) an invalid URI by rfc3986.
I agree. This seems clear from the grammar given.
> 2) However, the file: scheme specifically allows for an empty authority
> as equivalent to an implicit "localhost" authority. So
> file:///etc/hosts is a legal URI in the context of the file: scheme.
Does the rfc say somewhere that the empty authority should be
equivalent to the authority "localhost"?
> 3) This normalization is a pre- rfc normalization which must occur prior
> to parsing by a "generic" rfc3986 parser/validator. i.e. An rfc3986
> validator must be wrapped by a scheme: aware pre- post- normalizer.
>
> e.g.
>
> file:///etc/hosts%3f -->[|file: scheme pre- normalizer|] -->
>
> file://localhost/etc/hosts%3f --> [|rfc3986 parser/validator|] -->
>
> scheme=file,auth=localhost,path=/etc/hosts%3f -->
>
> --> [|rfc3986 post normalizer(pct-decode)|]
>
> --> scheme=file,auth=localhost,path=/etc/hosts?
>
> In this case [|file: scheme post- normalizer|] is not shown as its a
> NOP. The [|mailto: scheme post- normalizer|] might re-introduce case
> sensitivities drop by generic rfc3986 normalization.
I don't think it is, in general, possible to reintroduce
case-sensitivity, since case insensitivity is usually accomplished by
lowercaseing everything. Regardless, I don't think that the mailto:
scheme needs to re-introduce case sensitivity.
Robby