[plt-scheme] Using ssax with broken web pages...

From: John Clements (clements at brinckerhoff.org)
Date: Sat Jun 3 23:25:41 EDT 2006

On Jun 3, 2006, at 10:00 AM, geb a wrote:

> Hello all,
>
> I am trying to process web pages using ssax and I have
> used someone's example on the web and gotten it
> working ("somewhat").  The problem comes when trying
> to process something on the internet. For instance,
> processing google's web page yields the error:
>
>
>  Saturday, June 3rd, 2006 9:50:21am session 1:
> xml-server exception:  [GIMatch] broken for (END .
> head) while expecting ENDMETA
>
> So apparently, the parser expected an ending tag but
> didn't find it.  Does it make sense to use ssax on web
> pages that are not developed by yourself or can
> permissive parsers be developed to ignore these
> problems?  How would the parser be modified to ignore
> this problem?
>
> Thanks ahead of time for the help!


I think you're probably looking for Neil Van Dyke's "htmlprag"  
package.  It's available as a planet package, and here's the doc page:

http://planet.plt-scheme.org/300/docs/neil/htmlprag.plt/1/3/doc.txt

Here's the library home page:

http://www.neilvandyke.org/htmlprag/

John Clements

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2484 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20060603/b0262399/attachment.p7s>

Posted on the users mailing list.