[racket-dev] URL escaping: question for web experts

From: Greg Hendershott (greghendershott at gmail.com)
Date: Mon Dec 17 11:55:46 EST 2012

Although I'm hardly a web "expert", I think net/uri-codec is currently
a little confusing.

I get the impression that it was originally written prior to 2005,
because the detailed introduction talks only about RFCs 1738 and
2396.[1]

It looks like perhaps functions such as uri-path-segment-encode were
added at a later date, to support RFC 3986. Although these functions'
docs tersely link to RFC 3986, the overall net/uri-codec introduction
wasn't revised accordingly, nor is there a simple explanation like
"these also encode #\( #\) ...".  (As a result, I actually ended up
writing my own variation because I overlooked them.)

Aside from the history of the documentation and organization, another
point is the treatment of +, which the docs say intentionally doesn't
follow RFC 2396, but don't really explain why.  (One of my earliest
experiments with Racket was a simple web crawler, and this #\+ <->
#\space translation caused difficulties (although it's possible I was
confused in other ways).)


Wikipedia (usual caveats apply) says RFC 3986 is the the current
standard since 2005.[2]

I almost wonder if there should be a brand-new module that implements
RFC 3986 strictly. (Either just that, or, any options/parameters
default to 3986). With the current net/uri-codec deprecated but
preserved for backward compatibility.

I wonder if that would be best because the functions and documentation
may already be confusing. And this is a topic where it's easy for
people to get confused to begin with and choose the wrong function.


[1]: http://docs.racket-lang.org/net/uri-codec.html

[2]: http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_in_a_URI

On Mon, Dec 17, 2012 at 9:59 AM, Eli Barzilay <eli at barzilay.org> wrote:
> For many people there is a constant source of annoyance when you
> copy+paste doc URLs into a markdown context as with stackoverflow and
> others.  The problem is that these URLs have parens in them and at
> least in Chrome, the copied URL still has them -- and because markdown
> texts use parens for URLs "[text](url)" they get confused which means
> that you have to manually replace parens with %28 and %29.
>
> Danny submitted a pull request that eventually got changed by Matthew
> into a new parameter that controls which characters get encoded by
> `net/uri-codec', so it can escape these too.  The result on Chrome is
> that the copied URL has the escapes instead of parens, and clicking
> such a URL makes the copy-able address have the escapes too.  The
> actuall page that is displayed is still the same one, of course, it's
> just weird that Chrome has a certain context where the original URL
> string is preserved as is.  (It even considered the escaped URL as one
> that I didn't visit, even though I visited the one with the unescaped
> parens.)
>
> In any case, given all of this I thought that maybe the default mode
> could do the extra escaping -- it seems to me that there is no damage
> with doing that, since in theory every character could be escaped
> anyway.  There's a minor overhead of a few extra characters, but
> there's the above benefit of doing it (which might be a temporary
> thing for all I know).
>
> Neither Matthew nor I feel confident enough to have this encoding be
> the default without consulting some potential web standard gurus.
>
> So?
>
> --
>           ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                     http://barzilay.org/                   Maze is Life!
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev

Posted on the dev mailing list.