[plt-scheme] html parsing and maintaining structure...
You probably want to use xexprs instead of xml. You write something like this:
`(table (tr (td "a" "b" "c") (td "e")) (tr (td ((colspan "2")) "pqr")))
and then use xexpr->xml and display-xml/content to render it. But when
you're building them you just stick with x-expressions (ie,
s-expressions that represent xml).
Or just use SSAX.
Robby
On Fri, Oct 3, 2008 at 8:27 PM, geb a <geb_a at yahoo.com> wrote:
> Hello,
>
> I'm trying to maintain the structure of a table in xhtml. Unfortunately, I'm having trouble understanding matching structures and in particular the structures defined in the html library.
>
> In particular, I'm not understanding the documentation here.
> (struct (body html-full)())
> A body is (make-body (listof attribute) (listof Contents-of-body))
>
> How are the listof attributes accessed? How about the contents? There don't seem to be any accessor functions (at least, I'm unable to pull the information out of the structure).
>
> The following (modified) example would help to understand what's going on. How can I pull out just the table with its structure intact?
>
> (module html-example scheme
>
> ; Some of the symbols in html and xml conflict with
> ; each other and with scheme/base language, so we prefix
> ; to avoid namespace conflict.
>
> (require (prefix-in h: html)
> (prefix-in x: xml))
>
>
> (define an-html
> (h:read-xhtml
> (open-input-string
> (string-append
> "<html><head><title>My title</title></head><body>"
> "<p>Hello world</p><p><b>Testing</b>!</p>"
> "<table><tr><td>first cell</td><td> second cell</td></tr></table>"
> "</body></html>"))))
>
> ; extract-pcdata: html-content -> (listof string)
> ; Pulls out the pcdata strings from some-content.
> (define (extract-pcdata some-content)
> (cond [(x:pcdata? some-content)
> (list (x:pcdata-string some-content))]
> [(x:entity? some-content) (list)]
> [else (extract-pcdata-from-element some-content)]))
>
>
> ; extract-pcdata-from-element: html-element -> (listof string)
> ; Pulls out the pcdata strings from an-html-element.
> (define (extract-pcdata-from-element an-html-element)
> (match an-html-element
> [(struct h:html-full (content)) (map extract-pcdata content)]
> [(struct h:html-element (attributes))'()]
> [else an-html-element]))
> ;(struct (html html-full) ())
> ;[(struct h:body )))
>
>
>
>
>
> _________________________________________________
> For list-related administrative tasks:
> http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>
>