[racket-dev] URL escaping: question for web experts
Although I'm hardly a web "expert", I think net/uri-codec is currently
a little confusing.
I get the impression that it was originally written prior to 2005,
because the detailed introduction talks only about RFCs 1738 and
2396.[1]
It looks like perhaps functions such as uri-path-segment-encode were
added at a later date, to support RFC 3986. Although these functions'
docs tersely link to RFC 3986, the overall net/uri-codec introduction
wasn't revised accordingly, nor is there a simple explanation like
"these also encode #\( #\) ...". (As a result, I actually ended up
writing my own variation because I overlooked them.)
Aside from the history of the documentation and organization, another
point is the treatment of +, which the docs say intentionally doesn't
follow RFC 2396, but don't really explain why. (One of my earliest
experiments with Racket was a simple web crawler, and this #\+ <->
#\space translation caused difficulties (although it's possible I was
confused in other ways).)
Wikipedia (usual caveats apply) says RFC 3986 is the the current
standard since 2005.[2]
I almost wonder if there should be a brand-new module that implements
RFC 3986 strictly. (Either just that, or, any options/parameters
default to 3986). With the current net/uri-codec deprecated but
preserved for backward compatibility.
I wonder if that would be best because the functions and documentation
may already be confusing. And this is a topic where it's easy for
people to get confused to begin with and choose the wrong function.
[1]: http://docs.racket-lang.org/net/uri-codec.html
[2]: http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_in_a_URI
On Mon, Dec 17, 2012 at 9:59 AM, Eli Barzilay <eli at barzilay.org> wrote:
> For many people there is a constant source of annoyance when you
> copy+paste doc URLs into a markdown context as with stackoverflow and
> others. The problem is that these URLs have parens in them and at
> least in Chrome, the copied URL still has them -- and because markdown
> texts use parens for URLs "[text](url)" they get confused which means
> that you have to manually replace parens with %28 and %29.
>
> Danny submitted a pull request that eventually got changed by Matthew
> into a new parameter that controls which characters get encoded by
> `net/uri-codec', so it can escape these too. The result on Chrome is
> that the copied URL has the escapes instead of parens, and clicking
> such a URL makes the copy-able address have the escapes too. The
> actuall page that is displayed is still the same one, of course, it's
> just weird that Chrome has a certain context where the original URL
> string is preserved as is. (It even considered the escaped URL as one
> that I didn't visit, even though I visited the one with the unescaped
> parens.)
>
> In any case, given all of this I thought that maybe the default mode
> could do the extra escaping -- it seems to me that there is no damage
> with doing that, since in theory every character could be escaped
> anyway. There's a minor overhead of a few extra characters, but
> there's the above benefit of doing it (which might be a temporary
> thing for all I know).
>
> Neither Matthew nor I feel confident enough to have this encoding be
> the default without consulting some potential web standard gurus.
>
> So?
>
> --
> ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
> http://barzilay.org/ Maze is Life!
> _________________________
> Racket Developers list:
> http://lists.racket-lang.org/dev