[racket] Client-side cookies

From: Evan Donahue (emdonahu at gmail.com)
Date: Wed Jan 8 12:58:44 EST 2014

I don't know the specifics of your task, but I have been working on a sort
of soup to nuts web crawling system. The system itself allows for pretty
concise specification of most types of web crawling that I've encountered,
and depending on what you need it contains a functional http-browser that
manages cookies etc, and that in turn contains a reimplementation of a
chunk of net/ that can parse and generate client side cookies. It should be
install-able as a racket package.

https://github.com/emdonahu/boris


On Wed, Jan 8, 2014 at 9:33 AM, Matthew Flatt <mflatt at cs.utah.edu> wrote:

> In case no one offers a better library, enclosed is a small one that I
> recently created for a web-scraping task.
>
> Start a simulation of a browser with `make-connection`, use `goto!` to
> follow a link to a relative URL (following redirects), and use `back!`
> to go back. The `goto!` function returns two values: the headers as a
> string and the page content as bytes.
>
> Beware: My application accessed a single site, so this library doesn't
> attempt to do the right thing with cookies across sites.
>
> At Wed, 08 Jan 2014 03:48:44 -0800, Duncan Bayne wrote:
> > Hi All,
> >
> > I'm trying to re-write some Common Lisp web-scraping code in Racket.
> >
> > In Common Lisp, I'm POSTing a login request, and storing the cookie-jar
> > for subsequent GETs:
> >
> > (defun login (username password)
> >   "Logs in to www.example.com.  Returns a cookie-jar containing
> >   authentication details."
> >   (let ((cookie-jar (make-instance 'drakma:cookie-jar)))
> >     (drakma:http-request "http://www.example.com/login"
> >              :method :post
> >              :parameters `(("username" . ,username) ("password" .
> >              ,password))
> >              :cookie-jar cookie-jar)
> >     cookie-jar))
> >
> > ; snip
> >
> > (defun get-page (page-num cookie-jar)
> >   "Downloads a potentially invalid HTML page containing data to scrape.
> >   Returns a string containing the HTML."
> >   (let ((url (concatenate 'string "http://www.example.com/data/"
> >   (write-to-string page-num))))
> >     (let ((body (drakma:http-request url :cookie-jar cookie-jar)))
> >       (if (search "No data found." body)
> >     nil
> >   body))))
> >
> > However, I can't find an equivalent in Racket. The latest HTTP
> > library[1] makes no mention of cookies at all, and AFAICT the cookie
> > library[2] seems more about correctly serializing and deserializing
> > them.
> >
> > Can anyone suggest a way of re-writing the above CL in Racket without
> > having to implement a bunch of header-parsing stuff?
> >
> > TIA for any help ...
> >
> > [1]
> >
> https://github.com/plt/racket/blob/master/racket/collects/net/http-client.rkt
> > [2] http://docs.racket-lang.org/net/cookie.html
> >
> > --
> > Duncan Bayne
> > ph: +61 420817082 | web: http://duncan-bayne.github.com/ | skype:
> > duncan_bayne
> >
> > I usually check my mail every 24 - 48 hours.  If there's something
> > urgent going on, please send me an SMS or call me.
> > ____________________
> >   Racket Users list:
> >   http://lists.racket-lang.org/users
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20140108/d1c33515/attachment.html>

Posted on the users mailing list.