[racket] Help with sockets.

From: Danny Yoo (dyoo at hashcollision.org)
Date: Wed Apr 30 17:27:55 EDT 2014

>> The reason is because certain Strings can't be represented in the Text
>> node of XML documents.  We ran across this problem in practice when
>> students started writing programs and copying and pasting content from
>> the web, which introduced characters like vertical tabs and other
>> characters that can't be represented in XML text nodes.
>
> What about CDATA, or escaping xml sequences?

This did not work when I tried it.

See:

    https://sourceware.org/bugzilla/show_bug.cgi?id=4462

for an example of the kind of things that "fix" the problem, which is
to say, it doesn't: in the example above, the particular "fix"
sanitizes the original content of the string.


Let's try the following in Racket using the xml library:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (define msg "\f")
> msg
"\f"
> (require xml)
> (xexpr->string `(message ,msg))
"<message>\f</message>"
> (define test-file (open-output-file "test-bad.xml"))
> (write-xexpr `(message ,msg) test-file)
> (close-output-port test-file)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Huh!  Unfortunately, that's a bug in Racket's xml library.  You're not
allowed to put form feed characters in xml text nodes: it violates the
XML 1.0 standard.  I'll file a bug when I have time.


Let's try reading this "test-bad.xml" file from another client library
just to show what happens:

################################################################
dannyyoo at melchior:~$ python
Python 2.7.3 (default, Feb 27 2014, 19:58:35)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.dom.minidom
>>> dom1 = xml.dom.minidom.parse("test-bad.xml")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 1920, in parse
    return expatbuilder.parse(file)
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 924, in parse
    result = builder.parseFile(fp)
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 9
################################################################

Better.  Or worse, depending on your perspective.  Python's
xml.dom.minidom library properly reports that the file is malformed.
If we stick with XML 1.0, you can't represent this data structure
without encoding it external to XML.


So when folks say that XML is just like s-expressions or JSON, I pause.

Posted on the users mailing list.