[plt-scheme] html parsing and maintaining structure...

From: geb a (geb_a at yahoo.com)
Date: Fri Oct 3 21:27:41 EDT 2008


I'm trying to maintain the structure of a table in xhtml.  Unfortunately, I'm having trouble understanding matching structures and in particular the structures defined in the html library.

In particular, I'm not understanding the documentation here.
(struct (body html-full)())
A body is (make-body (listof attribute) (listof Contents-of-body))

How are the listof attributes accessed?  How about the contents?  There don't seem to be any accessor functions (at least, I'm unable to pull the information out of the structure).

The following (modified) example would help to understand what's going on.  How can I pull out just the table with its structure intact?

(module html-example scheme

    ; Some of the symbols in html and xml conflict with
    ; each other and with scheme/base language, so we prefix
    ; to avoid namespace conflict.

    (require (prefix-in h: html)
             (prefix-in x: xml))

    (define an-html
         "<html><head><title>My title</title></head><body>"
         "<p>Hello world</p><p><b>Testing</b>!</p>"
         "<table><tr><td>first cell</td><td> second cell</td></tr></table>"

    ; extract-pcdata: html-content -> (listof string)
    ; Pulls out the pcdata strings from some-content.
    (define (extract-pcdata some-content)
      (cond [(x:pcdata? some-content)
             (list (x:pcdata-string some-content))]
            [(x:entity? some-content) (list)]
            [else (extract-pcdata-from-element some-content)]))

    ; extract-pcdata-from-element: html-element -> (listof string)
    ; Pulls out the pcdata strings from an-html-element.
    (define (extract-pcdata-from-element an-html-element)
      (match an-html-element
        [(struct h:html-full (content)) (map extract-pcdata content)]
        [(struct h:html-element (attributes))'()]
        [else an-html-element]))
        ;(struct (html html-full) ())
        ;[(struct h:body  )))


Posted on the users mailing list.