[plt-scheme] Using ssax with broken web pages...
On Jun 3, 2006, at 10:00 AM, geb a wrote:
> Hello all,
>
> I am trying to process web pages using ssax and I have
> used someone's example on the web and gotten it
> working ("somewhat"). The problem comes when trying
> to process something on the internet. For instance,
> processing google's web page yields the error:
>
>
> Saturday, June 3rd, 2006 9:50:21am session 1:
> xml-server exception: [GIMatch] broken for (END .
> head) while expecting ENDMETA
>
> So apparently, the parser expected an ending tag but
> didn't find it. Does it make sense to use ssax on web
> pages that are not developed by yourself or can
> permissive parsers be developed to ignore these
> problems? How would the parser be modified to ignore
> this problem?
>
> Thanks ahead of time for the help!
I think you're probably looking for Neil Van Dyke's "htmlprag"
package. It's available as a planet package, and here's the doc page:
http://planet.plt-scheme.org/300/docs/neil/htmlprag.plt/1/3/doc.txt
Here's the library home page:
http://www.neilvandyke.org/htmlprag/
John Clements
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2484 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20060603/b0262399/attachment.p7s>