[racket] se-path* returning multiple strings when tag contains XML entities

From: Giacomo Ritucci (giacomo.ritucci at gmail.com)
Date: Sat Dec 7 20:30:40 EST 2013

Hi Jay,

thanks for your reply.

Unfortunately I can't find a way in my code to detect that in the resulting
list from se-path*/list

    '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds")

the first three elements should be actually treated as a single string and
so the last three.

Is there a common idiom in Racket to extract a list of values from an XML
collection, in a way that works with & and other entities?

Thanks in advance.


On Mon, Dec 2, 2013 at 9:27 PM, Jay McCarthy <jay.mccarthy at gmail.com> wrote:

> Hi Giacomo,
>
> First, the question is not really about se/list, because if you look
> at the xexpr you're giving it, the "name" node has three string
> children:
>
> '(bands () (name () "Derek " "&" " the Dominos") (name () "Nick Cave "
> "&" " the Bad Seeds"))
>
> And se/list* gives you these children all appended together. If you
> got the name nodes themselves, then you could concatenate their
> children.
>
> Second, there real question is about why parsing XML works like that.
> If you look at this:
>
> (define xs
>   "<bands><name>Derek & the Dominos</name><name>Nick Cave &
> the Bad Seeds</name></bands>")
> (define x
>   (read-xml/document (open-input-string xs)))
> x
>
> Then you'll see that the core is that name doesn't have a single piece
> of PCDATA. It has three, one of which is an entity.
>
> I don't consider this an error in the XML parser, but a consequence of
> XML entities that might not be obvious: they are their only nodes in
> the list of children of the parent node.
>
> Jay
>
>
> On Sun, Dec 1, 2013 at 8:36 AM, Giacomo Ritucci
> <giacomo.ritucci at gmail.com> wrote:
> > Hi Racket Users,
> >
> > I'm using se-path*/list to extract values from an XML collection but I
> found
> > a strange behaviour when the extracted values contain entities.
> >
> > For example, given the following XML:
> >
> > <bands>
> >     <name>Derek & the Dominos</name>
> >     <name>Nick Cave & the Bad Seeds</name>
> > </bands>
> >
> > when I extract a list of band names with (se-path*/list '(name) xe) I'd
> > expect this result:
> >
> >     '("Derek & the Dominos" "Nick Cave & the Bad Seeds")
> >
> > but what I actually receive is:
> >
> >     '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds")
> >
> > Is this the intended behaviour? How can I overcome this and make
> > se-path*/list return one string for tag?
> >
> > Here's my test code, I'm running Racket v5.3.6 on Linux x86_64 and maybe
> I'm
> > doing overlooking something because I'm new to Racket.
> >
> > Thank you in advance!
> >
> > Best regards,
> > Giacomo
> >
> > #lang racket
> >
> > (require xml
> >          xml/path)
> >
> > (define xe (string->xexpr "<bands><name>Derek & the
> > Dominos</name><name>Nick Cave & the Bad Seeds</name></bands>"))
> >
> > (module+ test
> >   (require rackunit)
> >
> >   ;; what I get
> >   (check-equal? (se-path*/list '(name) xe)
> >                 '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad
> > Seeds"))
> >
> >   ;; what I'd expect
> >   (check-equal? (se-path*/list '(name) xe)
> >                 '("Derek & the Dominos" "Nick Cave & the Bad Seeds")))
> >
> > ____________________
> >   Racket Users list:
> >   http://lists.racket-lang.org/users
> >
>
>
>
> --
> Jay McCarthy <jay at cs.byu.edu>
> Assistant Professor / Brigham Young University
> http://faculty.cs.byu.edu/~jay
>
> "The glory of God is Intelligence" - D&C 93
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20131208/db7079f7/attachment-0001.html>

Posted on the users mailing list.