[plt-scheme] html parsing and maintaining structure...

From: Robby Findler (robby at cs.uchicago.edu)
Date: Sat Oct 4 08:05:52 EDT 2008

You probably want to use xexprs instead of xml. You write something like this:

  `(table (tr (td "a" "b" "c") (td "e")) (tr (td ((colspan "2")) "pqr")))

and then use xexpr->xml and display-xml/content to render it. But when
you're building them you just stick with x-expressions (ie,
s-expressions that represent xml).

Or just use SSAX.

Robby

On Fri, Oct 3, 2008 at 8:27 PM, geb a <geb_a at yahoo.com> wrote:
> Hello,
>
> I'm trying to maintain the structure of a table in xhtml.  Unfortunately, I'm having trouble understanding matching structures and in particular the structures defined in the html library.
>
> In particular, I'm not understanding the documentation here.
> (struct (body html-full)())
> A body is (make-body (listof attribute) (listof Contents-of-body))
>
> How are the listof attributes accessed?  How about the contents?  There don't seem to be any accessor functions (at least, I'm unable to pull the information out of the structure).
>
> The following (modified) example would help to understand what's going on.  How can I pull out just the table with its structure intact?
>
> (module html-example scheme
>
>    ; Some of the symbols in html and xml conflict with
>    ; each other and with scheme/base language, so we prefix
>    ; to avoid namespace conflict.
>
>    (require (prefix-in h: html)
>             (prefix-in x: xml))
>
>
>    (define an-html
>      (h:read-xhtml
>       (open-input-string
>        (string-append
>         "<html><head><title>My title</title></head><body>"
>         "<p>Hello world</p><p><b>Testing</b>!</p>"
>         "<table><tr><td>first cell</td><td> second cell</td></tr></table>"
>         "</body></html>"))))
>
>    ; extract-pcdata: html-content -> (listof string)
>    ; Pulls out the pcdata strings from some-content.
>    (define (extract-pcdata some-content)
>      (cond [(x:pcdata? some-content)
>             (list (x:pcdata-string some-content))]
>            [(x:entity? some-content) (list)]
>            [else (extract-pcdata-from-element some-content)]))
>
>
>    ; extract-pcdata-from-element: html-element -> (listof string)
>    ; Pulls out the pcdata strings from an-html-element.
>    (define (extract-pcdata-from-element an-html-element)
>      (match an-html-element
>        [(struct h:html-full (content)) (map extract-pcdata content)]
>        [(struct h:html-element (attributes))'()]
>        [else an-html-element]))
>        ;(struct (html html-full) ())
>        ;[(struct h:body  )))
>
>
>
>
>
> _________________________________________________
>  For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>
>


Posted on the users mailing list.