[racket] se-path* returning multiple strings when tag contains XML entities

From: Jay McCarthy (jay.mccarthy at gmail.com)
Date: Mon Dec 2 15:27:22 EST 2013

Hi Giacomo,

First, the question is not really about se/list, because if you look
at the xexpr you're giving it, the "name" node has three string
children:

'(bands () (name () "Derek " "&" " the Dominos") (name () "Nick Cave "
"&" " the Bad Seeds"))

And se/list* gives you these children all appended together. If you
got the name nodes themselves, then you could concatenate their
children.

Second, there real question is about why parsing XML works like that.
If you look at this:

(define xs
  "<bands><name>Derek & the Dominos</name><name>Nick Cave &
the Bad Seeds</name></bands>")
(define x
  (read-xml/document (open-input-string xs)))
x

Then you'll see that the core is that name doesn't have a single piece
of PCDATA. It has three, one of which is an entity.

I don't consider this an error in the XML parser, but a consequence of
XML entities that might not be obvious: they are their only nodes in
the list of children of the parent node.

Jay


On Sun, Dec 1, 2013 at 8:36 AM, Giacomo Ritucci
<giacomo.ritucci at gmail.com> wrote:
> Hi Racket Users,
>
> I'm using se-path*/list to extract values from an XML collection but I found
> a strange behaviour when the extracted values contain entities.
>
> For example, given the following XML:
>
> <bands>
>     <name>Derek & the Dominos</name>
>     <name>Nick Cave & the Bad Seeds</name>
> </bands>
>
> when I extract a list of band names with (se-path*/list '(name) xe) I'd
> expect this result:
>
>     '("Derek & the Dominos" "Nick Cave & the Bad Seeds")
>
> but what I actually receive is:
>
>     '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds")
>
> Is this the intended behaviour? How can I overcome this and make
> se-path*/list return one string for tag?
>
> Here's my test code, I'm running Racket v5.3.6 on Linux x86_64 and maybe I'm
> doing overlooking something because I'm new to Racket.
>
> Thank you in advance!
>
> Best regards,
> Giacomo
>
> #lang racket
>
> (require xml
>          xml/path)
>
> (define xe (string->xexpr "<bands><name>Derek & the
> Dominos</name><name>Nick Cave & the Bad Seeds</name></bands>"))
>
> (module+ test
>   (require rackunit)
>
>   ;; what I get
>   (check-equal? (se-path*/list '(name) xe)
>                 '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad
> Seeds"))
>
>   ;; what I'd expect
>   (check-equal? (se-path*/list '(name) xe)
>                 '("Derek & the Dominos" "Nick Cave & the Bad Seeds")))
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>



-- 
Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University
http://faculty.cs.byu.edu/~jay

"The glory of God is Intelligence" - D&C 93

Posted on the users mailing list.