[plt-scheme] reading HTML as XML
At Fri, 12 Dec 2008 09:48:59 -0500, Prabhakar Ragde wrote:
> Am I misreading the documentation somehow?
>
> (require html)
> (require net/url)
> (require xml/xml)
>
> (xml->xexpr
> (read-html-as-xml
> (get-pure-port (string->url "http://www.nytimes.com"))))
>
> gets me something where the elements are not (list symbol ...), but
> element structures, all the way down. This doesn't seem to me to conform
> to the description of xexprs given in the XML Parsing and Writing
> library documentation. Sure, I can unwrap the element structures
> recursively, but it seems to me that xml->xexpr is supposed to do that.
I've run into this before. The `read-html'as-xml' function produces a
list of content values, and `xml->xexpr' wants a single content value.
So, you probably want to `map xml->xexr'. (I'm not sure why
`xml->xexpr' doesn't complain when it's given a list.)
Matthew