[racket] data mining business information on web sites w/Racket

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Fri Mar 18 17:05:04 EDT 2011

Scraping information off of Web pages was the first thing I ever did 
with Racket (then called PLT Scheme).  It worked great.

Racket is up to most tasks in this general category that I can imagine.

Also, Racket's support for minilanguages and rapid programming in 
general make it especially productive for tasks like individually 
specializing behavior for lots of different Web sites, if that's what 
you need.

You mentioned AI.  Racket's Lisp heritage makes it convenient for 
symbolic AI.  It can also do numeric/statistical work.  There's a good 
chance you'll have to code up any particular "AI" algorithm, or use the 
FFI or subprocesses to hook up an off-the-shelf library. 

If you find you need to evaluate JavaScript to do some of the scraping, 
you can find ways to do that (I would first try calling an off-the-shelf 
tool intended for this purpose, using subprocesses; they do exist), but 
it's going to be annoying-to-intractable no matter what language you use.

Geoffrey S. Knauth wrote at 03/18/2011 03:29 PM:
> I'm evaluating whether to use Racket to data mine hundreds of websites pulling out business information within an industry.  I think Racket is up to it, but I'm wondering if anyone else has had experiences positive or negative.  I've used other tools to do rudimentary digging, but this project is likely to touch AI, which brings me back to the Lisp family.


Posted on the users mailing list.