[racket] net/url current-proxy-servers implementation for ssl?
On Wed, Feb 19, 2014 at 8:16 AM, Matthew Eric Bassett
<mebassett at gegn.net> wrote:
> Hello fellow racketeers,
>
> So we have a few web crawlers written in Racket that for the most part work
> quite well - however, we often have them running behind our corporate proxy.
> We were quite bemused today when one stopped working when it found an https
> link.
>
> A quick check of the docs revealed:
>
>>> (current-proxy-servers)
>>> → (listof (list/c string? string? (integer-in 0 65535)))
>>> (current-proxy-servers mapping) → void?
>>> mapping : (listof (list/c string? string? (integer-in 0 65535)))
>>>
>>> A parameter that determines a mapping of proxy servers used for
>>> connections. Each mapping is a list of three elements:
>>> * the URL scheme, such as "http";
>>> * the proxy server address; and
>>> * the proxy server port number.
>>>
>>> Currently, **the only proxiable scheme is "http"**. The default mapping
>>> is the empty list (i.e., no proxies).
>
>
> Are there any plans for a proxy implementation for https?
As far as I know, there are no plans. That code has not been touched
in a long time and the last time I touched it, a major thing that I
did was make the proxy implementation more decoupled from the HTTP
connection code.
> Or any major
> obstacles preventing such? I know very little about the dark innerworkings
> of ssl.
I believe that main challenge is that proxying defeats the
no-eavesdropping promises of HTTPS by definition. This page about
Squid (a popular proxy) talks about some of the problems and how it
deals with them:
http://wiki.squid-cache.org/Features/SslBump
The key that would need to be implemented is "HTTP CONNECT":
http://en.wikipedia.org/wiki/HTTP_tunnel#HTTP_CONNECT_Tunneling
In our code, that would mean that these functions
https://github.com/plt/racket/blob/master/racket/collects/net/url.rkt#L117
https://github.com/plt/racket/blob/master/racket/collects/net/url.rkt#L130
would need to see (a) a proxy is around and (b) the url's scheme is
HTTPS, and then send a CONNECT request and set things up transparently
to the rest of the Racket code.
It is totally do-able and my guess is that it would be less than 50
lines of code to change. My worry is that it would be a beast to test,
as I don't know how reliable the RFC is on this matter. I'm willing to
help, but would prefer if it were easy to get a high-level test of the
proxy and https site that you were working with.
Jay
> Thanks,
>
> Matthew Eric
>
>
>
> --
> Matthew Eric Bassett | http://mebassett.info
>
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
--
Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University
http://faculty.cs.byu.edu/~jay
"The glory of God is Intelligence" - D&C 93