[plt-scheme] file: urls & their relation to path names

From: Ray Racine (rracine at adelphia.net)
Date: Sun Jan 8 14:19:36 EST 2006

On Sat, 07 Jan 2006 12:52:27 -0600, Robby Findler wrote:

> At Sat, 07 Jan 2006 11:34:28 -0500, Paul Schlie wrote:
>> Although I may misunderstand, it would seem file://<some-path> is
>> formally invalid regardless of the encoding of <some-path> in the
>> absents of an explicitly specified authority; i.e.:
> Yes, I think that's right. 2 slashes is not okay, but 3 and zero are
> I meant to put three in each of my examples.
> Robby wrote:
>>  file://%3f/etc/hosts
> but I should have written:
>   file:///%3f/etc/hosts
I was also coding a URI parser/validator on a Sat. :) My interpretation
of rfc3986 aligns with Paul's and his relative and absolute examples.

>From 3.3
If a URI contains an authority component, then the path component must
either be empty or begin with a slash ("/") character.  If a URI does
not contain an authority component, then the path cannot begin with two
slash characters ("//").

I'm not sure this allows the 3 slash version:
    file:///%3f/etc/hosts   --> constraint violation

Of course my interpretation falls apart with the cited example in the
introduction of the rfc where an online help system refers to the
local system's host file as follows:

So either there is a fairly sloppy typo in the spec, or I'm missing
something.  Since these things routinely end up (for me) in the missing
category, this discussion sent me scurry back to the spec where I found:

Section 6.2.3

... Another case where normalization varies by scheme is in the handling
of an empty authority component or empty host subcomponent. For many
scheme specifications, an empty authority or host is considered an
error; for others, it is considered equivalent to "localhost" or the
end-user's host. When a scheme defines a default for authority and a URI
reference to that default is desired, the reference should be normalized
to an empty authority for the sake of uniformity, brevity, and
internationalization. If, however, either the userinfo or port
subcomponents are non-empty, then the host should be given explicitly
even if it matches the default. ...

So I constructed a tentative "new" interpretation over morning coffee.
Which goes something like this...

1) file:///etc/hosts is (in general) an invalid URI by rfc3986.

2) However, the file: scheme specifically allows for an empty authority
    as equivalent to an implicit "localhost" authority.  So
   file:///etc/hosts is a legal URI in the context of the file: scheme.

3) This normalization is a pre- rfc normalization which must occur prior
   to parsing by a "generic" rfc3986 parser/validator. i.e. An rfc3986
   validator must be wrapped by a scheme: aware pre- post- normalizer.


file:///etc/hosts%3f -->[|file: scheme pre- normalizer|] -->

file://localhost/etc/hosts%3f --> [|rfc3986 parser/validator|] -->

scheme=file,auth=localhost,path=/etc/hosts%3f --> 

--> [|rfc3986 post normalizer(pct-decode)|] 

--> scheme=file,auth=localhost,path=/etc/hosts?

In this case [|file: scheme post- normalizer|] is not shown as its a
NOP. The [|mailto: scheme post- normalizer|] might re-introduce case
sensitivities drop by generic rfc3986 normalization.

This is a logical model for the rfc and not a code design suggestion.

Is there another way file:///etc/hosts is valid under rfc3986?


Posted on the users mailing list.