[plt-scheme] Re: Some sort of documentation tool from the toplevel?
> to extract content from the HTML-ified reference documentation, but
> the parsers seem particularly unhappy about the non-well-formedness
For parsing non-well-formed HTML, I get kickbacks to recommend HtmlPrag:
http://www.neilvandyke.org/htmlprag/
HtmlPrag will emit Oleg Kiselyov's SXML format.
To extract and massage the content out of the SXML, I would try Jim
Bender's "sxml-match" and/or Kirill Lisovsky and Dmitri Lizorkin's
SXPath. Perhaps SXPath to get desired subtree(s), and "sxml-match" to
reformat. For developing the SXPath query, sometimes WebScraperHelper
is helpful.
http://celtic.benderweb.net/sxml-match/
http://www196.pair.com/lisovsky/query/sxpath/
http://www.neilvandyke.org/webscraperhelper/
As Robby suggested, I'd use the "keywords" and "hdindex" files both for
the one-line syntax quick-reference, and for getting HTML anchor names
to help with the HTML scraping.