[racket] Client-side cookies
In case no one offers a better library, enclosed is a small one that I
recently created for a web-scraping task.
Start a simulation of a browser with `make-connection`, use `goto!` to
follow a link to a relative URL (following redirects), and use `back!`
to go back. The `goto!` function returns two values: the headers as a
string and the page content as bytes.
Beware: My application accessed a single site, so this library doesn't
attempt to do the right thing with cookies across sites.
At Wed, 08 Jan 2014 03:48:44 -0800, Duncan Bayne wrote:
> Hi All,
>
> I'm trying to re-write some Common Lisp web-scraping code in Racket.
>
> In Common Lisp, I'm POSTing a login request, and storing the cookie-jar
> for subsequent GETs:
>
> (defun login (username password)
> "Logs in to www.example.com. Returns a cookie-jar containing
> authentication details."
> (let ((cookie-jar (make-instance 'drakma:cookie-jar)))
> (drakma:http-request "http://www.example.com/login"
> :method :post
> :parameters `(("username" . ,username) ("password" .
> ,password))
> :cookie-jar cookie-jar)
> cookie-jar))
>
> ; snip
>
> (defun get-page (page-num cookie-jar)
> "Downloads a potentially invalid HTML page containing data to scrape.
> Returns a string containing the HTML."
> (let ((url (concatenate 'string "http://www.example.com/data/"
> (write-to-string page-num))))
> (let ((body (drakma:http-request url :cookie-jar cookie-jar)))
> (if (search "No data found." body)
> nil
> body))))
>
> However, I can't find an equivalent in Racket. The latest HTTP
> library[1] makes no mention of cookies at all, and AFAICT the cookie
> library[2] seems more about correctly serializing and deserializing
> them.
>
> Can anyone suggest a way of re-writing the above CL in Racket without
> having to implement a bunch of header-parsing stuff?
>
> TIA for any help ...
>
> [1]
> https://github.com/plt/racket/blob/master/racket/collects/net/http-client.rkt
> [2] http://docs.racket-lang.org/net/cookie.html
>
> --
> Duncan Bayne
> ph: +61 420817082 | web: http://duncan-bayne.github.com/ | skype:
> duncan_bayne
>
> I usually check my mail every 24 - 48 hours. If there's something
> urgent going on, please send me an SMS or call me.
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: connection.rkt
Type: application/octet-stream
Size: 3659 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20140108/42ab87f8/attachment.obj>