[plt-scheme] xml.ss and DTDs

From: Jay McCarthy (jay.mccarthy at gmail.com)
Date: Thu Sep 18 12:02:45 EDT 2008

The xml library will print back the DTD if it is in the prolog struct
of the document struct. However, there is no parser for DTDs, so
read-xml will always have #f in the dtd field of the prolog struct.
This shows up on xml/private/reader.ss:30. I imagine that the skip-dtd
function therein might (roughly) know how to parse them.

Now, as far as "Is this intentional?" I'm responsible for the XML
library, because someone who was responsible for it once was also
responsible for the web server. I've fixed maybe one bug in my tenure
and never touched it otherwise. If you submit a bug, I will either
update the documentation or try to write the parser.

Jay

On Thu, Sep 18, 2008 at 9:50 AM, Felix Klock's PLT scheme proxy
<pltscheme at pnkfx.org> wrote:
> PLTers-
>
> From the documentation for the xml.ss library:
>
>> "The xml library does not provides [sic] Document Type Declaration (DTD)
>> processing, validation, expanding user-defined entities, or reading
>> user-defined entities in attributes."
>
>
> Is the phrase "DTD processing" meant to include functionality such as
> "reading the DOCTYPE declaration given in the input file"?
>
> From what I can tell from the observable behavior and the source text of
> xml.ss, the XML parsing skips over DOCTYPE declarations in the input.
>
> If the silent dropping of the DTD declaration is intentional, I think the
> documentation should be clearer, and I will file a bug report against the
> docs.  If it is not intentional (or a feature waiting to be implemented),
> then I will file a bug report against the source code.  Right now I cannot
> tell where the bug report belongs.
>
> -Felix
>
> p.s. Here's an example illustrating what I'm talking about:
>
> #lang scheme
> (require (lib "xml.ss" "xml"))
>
> (define source-html-string #<<END
> <!DOCTYPE html PUBLIC
>  "-//W3C//DTD XHTML 1.0 Transitional//EN"
>  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">         </html>
> END
>  )
>
> (define source-document (read-xml (open-input-string source-html-string)))
>
> (write-xml source-document)
> (newline)
> ;; prints:
> ;; <html xmlns="http://www.w3.org/1999/xhtml">         </html>
> ;; and so we've lost information from the input
>
> _________________________________________________
>  For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>



-- 
Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University
http://jay.teammccarthy.org

"The glory of God is Intelligence" - D&C 93


Posted on the users mailing list.