[plt-scheme] reading HTML as XML

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Fri Dec 12 10:39:46 EST 2008

At Fri, 12 Dec 2008 09:48:59 -0500, Prabhakar Ragde wrote:
> Am I misreading the documentation somehow?
> 
> (require html)
> (require net/url)
> (require xml/xml)
> 
> (xml->xexpr
>    (read-html-as-xml
>      (get-pure-port (string->url "http://www.nytimes.com"))))
> 
> gets me something where the elements are not (list symbol ...), but 
> element structures, all the way down. This doesn't seem to me to conform 
> to the description of xexprs given in the XML Parsing and Writing 
> library documentation. Sure, I can unwrap the element structures 
> recursively, but it seems to me that xml->xexpr is supposed to do that. 

I've run into this before. The `read-html'as-xml' function produces a
list of content values, and `xml->xexpr' wants a single content value.
So, you probably want to `map xml->xexr'. (I'm not sure why
`xml->xexpr' doesn't complain when it's given a list.)


Matthew



Posted on the users mailing list.