[racket] XML library: representing CDATA
Greetings.
In the XML module's cdata struct, "[t]he string field is assumed to be of the form <![CDATA[‹content›]]> with proper quoting of ‹content›." It's not clear that this is a very useful design of the interface.
Principally, it makes it inconvenient to get at the <content>, and requires calls to substring (or something like that) in order to extract the <content> from cdata-string.
Secondly, it represents low-level syntactical information which should not, I think, be present in the result of a parse of an XML document. The fact that the content string originated from within a CDATA section is, I think, useful to know, but only just. Note that the fact that a string or character originated within a CDATA section is not part of the XML information set (<http://www.w3.org/TR/xml-infoset/> Sect. 2.6, and Appx D point 19). Supposing (which would be sturdily defensible) that xexprs should represent no more than the content of the XML information set, then there would be no need for the cdata structure at all (though this obviously makes escaping characters on output somewhat more involved).
It's also completely counterintuitive: the documentation of this struct is only three sentences long, and when reading it I _still_ managed to elide the explanation that the CDATA line-noise actually had to be included in the string, presumaly because it seemed so obvious that it wouldn't.
Side-issue regarding the wording of the documentation: it's not completely clear what "proper quoting of content" means. I presume it means purely racket-quoting of the string contents, and doesn't refer to XML quoting at all. Thus (cdata #f #f "<![CDATA[\"&]]>") would be acceptable in principle (it is acceptable in fact).
Is there any chance of a (admittedly backward-incompatible) change to this part of the interface? I doubt that the cdata structure is very extensively used.
Best wishes,
Norman
--
Norman Gray : http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK