[plt-scheme] Re: Some sort of documentation tool from the toplevel?

From: Neil W. Van Dyke (neil at neilvandyke.org)
Date: Thu Jul 7 17:47:34 EDT 2005

> to extract content from the HTML-ified reference documentation, but
> the parsers seem particularly unhappy about the non-well-formedness

For parsing non-well-formed HTML, I get kickbacks to recommend HtmlPrag:


HtmlPrag will emit Oleg Kiselyov's SXML format.

To extract and massage the content out of the SXML, I would try Jim
Bender's "sxml-match" and/or Kirill Lisovsky and Dmitri Lizorkin's
SXPath.  Perhaps SXPath to get desired subtree(s), and "sxml-match" to
reformat.  For developing the SXPath query, sometimes WebScraperHelper
is helpful.


As Robby suggested, I'd use the "keywords" and "hdindex" files both for
the one-line syntax quick-reference, and for getting HTML anchor names
to help with the HTML scraping.

