[plt-scheme] Re: Some sort of documentation tool from the toplevel?

From: Neil W. Van Dyke (neil at neilvandyke.org)
Date: Thu Jul 7 17:47:34 EDT 2005

> to extract content from the HTML-ified reference documentation, but
> the parsers seem particularly unhappy about the non-well-formedness

For parsing non-well-formed HTML, I get kickbacks to recommend HtmlPrag:

    http://www.neilvandyke.org/htmlprag/

HtmlPrag will emit Oleg Kiselyov's SXML format.

To extract and massage the content out of the SXML, I would try Jim
Bender's "sxml-match" and/or Kirill Lisovsky and Dmitri Lizorkin's
SXPath.  Perhaps SXPath to get desired subtree(s), and "sxml-match" to
reformat.  For developing the SXPath query, sometimes WebScraperHelper
is helpful.

    http://celtic.benderweb.net/sxml-match/
    http://www196.pair.com/lisovsky/query/sxpath/
    http://www.neilvandyke.org/webscraperhelper/

As Robby suggested, I'd use the "keywords" and "hdindex" files both for
the one-line syntax quick-reference, and for getting HTML anchor names
to help with the HTML scraping.



Posted on the users mailing list.