[plt-scheme] Recommendations for parsing HTML

From: John Clements (clements at brinckerhoff.org)
Date: Thu Dec 4 03:46:55 EST 2008

On Dec 3, 2008, at 9:57 PM, Patrick Lozzi wrote:

> It appears I'm really at a loss without the planet's htmlprag  
> library, I tried to set the file to R5RS but it wouldn't allow me  
> to use the planet's htmlprag library as it came up with "reference  
> to undefined identifier: require"... mustn't the file be set to  
> module in order to use require?  That's the impression I'm  
> getting.  In other words, whenever I received this error in the  
> past, I realized the current file wasn't set to module, so simply  
> setting it to module corrected this error... but if I set it back  
> to module, I'm back at square one with the mutable cons cells  
> problem that plagues > v4 versions combined with htmlprag.

Because... well, because I had so many *other* things that I should  
have been doing instead, I took a crack at updating htmlprag to work  
with 4.0.

Let me just say... BLECCH!  This is not a comment about this code,  
per se; it's just that trying to get a handle on the flow of values  
in a world with mutable pairs is a horrible awful nightmare.

I tried not to let the mutable pairs bleed into everything too much,  
but I wouldn't claim the result is anything but a hack job.  Perhaps  
it will convince Neil to take a look at it himself!

Anyhow, it works, in the sense that it passes the built-in 146 tests.

For what it's worth, this would have been a twenty-hour project  
rather than a two-hour project if it weren't for that test suite.

Attached please find an updated planet package.

If Neil likes it, he can update the planet server. Until then, you  
can inject it yourself by putting it in /tmp, say, and running

planet fileinject neil /tmp/htmlprag.plt 1 4

Finally, you'd then require it by saying something like:

(require (planet neil/htmlprag:1:4/htmlprag))

Okay, *now* someone can tell me that there's a perfectly good  
alternative to htmlprag.

All the best,

John Clements


-------------- next part --------------
A non-text attachment was scrubbed...
Name: htmlprag.plt
Type: application/octet-stream
Size: 69131 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20081204/745299cb/attachment.plt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2484 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20081204/745299cb/attachment.p7s>

Posted on the users mailing list.