[plt-scheme] xml library: getting DTD information

From: Richard Cobbe (cobbe at ccs.neu.edu)
Date: Sun May 25 12:29:44 EDT 2008

I'm having a little difficulty with the XML library, and I'm not sure
whether this is a bug in the library or in my XML file.

DrScheme 3.99.0.25-svn19may2008, Mac OS 10.5.2.

I have this saved as /Users/cobbe/test.xml:

    <?xml version="1.1" encoding="UTF-8"?>
    <!DOCTYPE keyboard SYSTEM "file://localhost/System/Library/DTDs/KeyboardLayout.dtd">
    <keyboard group="126" id="-2" name="US Extended" maxout="3">
    </keyboard>

I don't know if this is relevant, but
/System/Library/DTDs/KeyboardLayout.dtd exists on my machine.  (Also
probably irrelevant: the XML file above doesn't match the DTD; it's
missing several required elements inside the 'keyboard' tag.  But I get the
same results when I use the full file, of which the above example is a
small excerpt.)

When I read the XML document using the built-in xml library, I'm able to
see everything *except* the DOCTYPE:

    > (define p (open-input-file "/Users/cobbe/test.xml"))
    > (require xml)
    > (define doc (read-xml p))
    > (close-input-port p)
    > (document-misc doc)
    ()

This is slightly odd; according to the docs, `document-misc' is supposed to
return a comment or a pcdata, not a list.  Don't know if this is relevant,
though.

    > (define prolog (document-prolog doc))
    > (prolog-misc prolog)
    (#<pi>)
    > (p-i-target-name (car (prolog-misc prolog)))
    xml
    > (p-i-instruction (car (prolog-misc prolog)))
    "version=\"1.1\" encoding=\"UTF-8\""

This next is the surprising bit; I'd expect to get some representation of
the DOCTYPE line here:

    > (prolog-dtd prolog)
    #f

I'm not familiar enough with XML to know what could appear in the misc2
slot, so I don't know if this is the Right Thing:

    > (prolog-misc2 prolog)
    ()

And the rest is what I'd expect:

    > (define elt (document-element doc))
    > (element-name elt)
    keyboard
    > (element-attributes elt)
    (#<attribute> #<attribute> #<attribute> #<attribute>)
    > (element-content elt)
    (#<pcdata>)
    > (pcdata-string (car (element-content elt)))
    "\n"
    > (map (lambda (a) (list (attribute-name a) (attribute-value a)))
           (element-attributes elt))
    ((group "126") (id "-2") (maxout "3") (name "US Extended"))

Now, I'm no XML expert, so I'm quite prepared to believe that the XML file
is ill-formed.  I did check the W3C's XML tutorial, though, and the DOCTYPE
declaration does appear to match the examples they give, so it looks good
to me.

I did try changing the DOCTYPE line to use just an absolute pathname rather
than a URI, and then just the filename "KeyboardLayout.dtd" (which I copied
into the same directory as the XML file) but this didn't change anything.

Is this a bug in the XML library, or did I miss something?

Thanks,

Richard


Posted on the users mailing list.