[racket] net/url current-proxy-servers implementation for ssl?

From: Jay McCarthy (jay.mccarthy at gmail.com)
Date: Wed Feb 19 11:18:46 EST 2014

On Wed, Feb 19, 2014 at 8:16 AM, Matthew Eric Bassett
<mebassett at gegn.net> wrote:
> Hello fellow racketeers,
>
> So we have a few web crawlers written in Racket that for the most part work
> quite well - however, we often have them running behind our corporate proxy.
> We were quite bemused today when one stopped working when it found an https
> link.
>
> A quick check of the docs revealed:
>
>>> (current-proxy-servers)
>>>      → (listof (list/c string? string? (integer-in 0 65535)))
>>> (current-proxy-servers mapping) → void?
>>>       mapping : (listof (list/c string? string? (integer-in 0 65535)))
>>>
>>> A parameter that determines a mapping of proxy servers used for
>>> connections. Each mapping is a list of three elements:
>>>     * the URL scheme, such as "http";
>>>     * the proxy server address; and
>>>     * the proxy server port number.
>>>
>>> Currently, **the only proxiable scheme is "http"**. The default mapping
>>> is the empty list (i.e., no proxies).
>
>
> Are there any plans for a proxy implementation for https?

As far as I know, there are no plans. That code has not been touched
in a long time and the last time I touched it, a major thing that I
did was make the proxy implementation more decoupled from the HTTP
connection code.

> Or any major
> obstacles preventing such?  I know very little about the dark innerworkings
> of ssl.

I believe that main challenge is that proxying defeats the
no-eavesdropping promises of HTTPS by definition. This page about
Squid (a popular proxy) talks about some of the problems and how it
deals with them:

http://wiki.squid-cache.org/Features/SslBump

The key that would need to be implemented is "HTTP CONNECT":

http://en.wikipedia.org/wiki/HTTP_tunnel#HTTP_CONNECT_Tunneling

In our code, that would mean that these functions

https://github.com/plt/racket/blob/master/racket/collects/net/url.rkt#L117
https://github.com/plt/racket/blob/master/racket/collects/net/url.rkt#L130

would need to see (a) a proxy is around and (b) the url's scheme is
HTTPS, and then send a CONNECT request and set things up transparently
to the rest of the Racket code.

It is totally do-able and my guess is that it would be less than 50
lines of code to change. My worry is that it would be a beast to test,
as I don't know how reliable the RFC is on this matter. I'm willing to
help, but would prefer if it were easy to get a high-level test of the
proxy and https site that you were working with.

Jay

> Thanks,
>
> Matthew Eric
>
>
>
> --
> Matthew Eric Bassett | http://mebassett.info
>
> ____________________
>  Racket Users list:
>  http://lists.racket-lang.org/users



-- 
Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University
http://faculty.cs.byu.edu/~jay

"The glory of God is Intelligence" - D&C 93


Posted on the users mailing list.