[racket-dev] PLaneT and proxies

From: Norman Gray (norman at astro.gla.ac.uk)
Date: Mon Jun 21 04:43:00 EDT 2010

Eli, hello.

On 2010 Jun 20, at 22:53, Eli Barzilay wrote:

> [Moving the thread to dev.]

I've subscribed for the moment.

> What about specifications of conditional proxies?  Ones that apply
> only to some domain patterns.  And I can't think of a good way to do
> this if instead of name patterns the specs are for a numeric range --
> doing an explicut DNS lookup in these cases seem like a bad idea.

[and]

> I think it's important to find where it's documented.  I looked around
> and didn't see anything official-looking.  It's especially important
> to find out the preference of this over other facilities (as in the
> OSX case).

My impression is that it's _not_ documented, but instead just part of unix lore.

Libcurl says <http://curl.haxx.se/libcurl/c/curl_easy_setopt.html> that "libcurl respects the environment variables http_proxy, ftp_proxy, all_proxy etc, if any of those are set", without saying what it respects them to say; Lynx <http://lynx.isc.org/lynx2.8.5/lynx2-8-5/lynx_help/keystrokes/environments.html#proxy> gives the nearest thing to a specification, and since it's been around for a long time, it probably gets to win.

The description there is pretty simple-minded, but probably elaborate enough, in the sense that if any organisation sets up a network environment which has proxy requirements which can't be handled by this configuration framework, then it's clear it's their stupid fault for being fussy, and they can sort it out themselves (by getting out their router manuals and doing it transparently, at the correct layer).  Short version: proxies are supposed to be quick and dirty, so there's no need for complication here.

The following logic might be adequate:

#lang racket

(define getenv
  ;; trashy little getenv simulator
  (let ((table '()))
    (λ (k . rest)
      (cond ((pair? rest)
             (set! table (cons (cons k (car rest)) table)))
            ((assoc k table) => cdr)
            (else #f)))))

;; RETRIEVE-URL-WITH-PROXY? : string -> boolean
;; Given a host name as a string, determine whether HTTP URLs at this
;; host should be retrieved via a proxy, respecting the settings
;; in http_proxy and no_proxy.
(define (retrieve-url-with-proxy? host-string)
  (let ((proxy (getenv "http_proxy"))
        (no_proxy (cond ((getenv "no_proxy") => (λ (s) (regexp-split #rx", *" s)))
                        (else #f))))
    (cond ((not proxy) #f)
          (no_proxy
           (let loop ((vetos no_proxy))
             (cond ((null? vetos) #t)
                   ((regexp-match? (regexp (string-append (car vetos) "$")) host-string) #f)
                   (else (loop (cdr vetos))))))
          (else #t))))


(printf "without anything: ~a~%" (retrieve-url-with-proxy? "www.racket-lang.org"))
(getenv "http_proxy" "cache.example.org")
(printf "with proxy: ~a~%" (retrieve-url-with-proxy? "www.racket-lang.org"))
(getenv "no_proxy" "localhost, racket-lang.org, example.com")
(printf "with no_proxy, too: ~a~%" (retrieve-url-with-proxy? "www.racket-lang.org"))

Possible improvements: 

1. Support ftp_proxy, etc, too (though note that no_proxy applies to all protocols) -- does this matter?

2. Numeric hosts in the no_proxy field.  Perhaps parse "129.10." as 129.10/16, and then, if and only if there's a numeric spec in the no_proxy list, do the hostname lookup within retrive-url-with-proxy? and compare it.  The lookup is going to have to happen at some point anyway, and this way, it'll be in the DNS cache when the HTTP connection is subsequently opened

>>  * On OS X, I think that http_proxy will typically not be set
>>    (it'll only be set by people who use the command-line a lot).
>>    Instead, the system-wide proxy information can be obtained by
>>    using system APIs, or by parsing the output of /usr/sbin/scutil.
>>    That has to be reasonably dynamic, since someone might change
>>    the 'location' on a laptop, say, from 'at home' to 'at work',
>>    with consequent changes to the proxy.
>> [...]
>>  * ...plus whatever windows requires.
> 
> Yeah, the OS-specific defaults should be consulted too.  IIRC, I
> looked at Windows once, and ran into some mess of a "standard" way to
> get the proxy settings for a machine.

I think I remember hearing about that once.  I've managed to forget it.

Separately, Robby said:

> So, going forward, what is the right thing? Is there something simple
> we can do for now so that Norman can use planet on his school's
> network, or should we wait for the net/url rewrite?

I'll leave my hack in my local resolver.rkt -- don't worry about me.

...and:

> Oh-- Eli pointed out something offlist: did you try starting up
> drracket, setting its proxy preferences and then trying to install
> something from planet?

I didn't try that -- it didn't occur to me, I'm afraid.  I imagine that would work fine (if I need to do this again, and it doesn't work, I'll shout).

Good luck,

Norman


-- 
Norman Gray  :  http://nxg.me.uk



Posted on the dev mailing list.