[plt-scheme] Parsing html : match or regexp (beginner question)
I'm trying to port some TCL web-scraping code to PLT-Scheme as a way to gain
more understanding of Scheme.
The TCL code fetches a page on the web, looks for some tidbits in the HTML
code and then acts on the tidbits, a very standard behaviour for a
web-scraping script.
I know how to fetch a page in Scheme, and I know how to act on the tidbits
found. I was ready to use a regexp to parse the HTML page for the interesting
tidbits (which is what the TCL code does), but I read in the Cookbook :
"Lisps in general are sort of famous for looking down on Regular Expressions.
Other languages that lack Scheme's powerful pattern matching tend to fall
back on regular expressions to provide some of that capability"
http://schemecookbook.org/Cookbook/RegexChapter
But the pattern matching chapter of the Cookbook is just a stub without any
recipe or example, and I haven't been able to understand what the Help Desk
says on the two pattern matching libraries...
So, would anybody care to tell me if I should use regexps or pattern-matching,
and eventually point me to a good explanation (with examples, please ?) of
pattern-matching in Scheme ?
Or am I doing this all wrong ? Maybe I should read the HTML as an Xexp and use
the underlying structure instead of parsing a flat string. (Some of the
tidbits I parse for are the external links in the HTML page)
Any pointers towards enlightenment would be greatly appreciated!
--
Sincerely,
Thomas-Xavier MARTIN
txm+plt-scheme at m4x.org