[racket] se-path* returning multiple strings when tag contains XML entities

From: Stephen Chang (stchang at ccs.neu.edu)
Date: Sun Dec 15 16:40:27 EST 2013

Here's another way, using CDATA to represent band names.

(Searching around, your question seems to be a common problem [1,2]
and this was one of the suggested solutions.)

*Disclaimer: I haven't worked with xml much, so maybe your/Jay's way
is preferred.


#lang racket
(require xml xml/path)

(define (cdata->string cd)
  (second (regexp-match #px"^<!\\[CDATA\\[(.*)\\]\\]>$" (cdata-string cd))))

(map cdata->string
  (se-path*/list '(name)
    (xml->xexpr
     (read-xml/element
      (open-input-string
       "<bands><name><![CDATA[Derek & the Dominos]]></name>
           <name><![CDATA[Nick Cave & the Bad Seeds]]></name></bands>")))))



[1]: http://forums.asp.net/t/1340605.aspx
[2]: http://stackoverflow.com/questions/1654674/url-and-the-ampersand





On Sun, Dec 15, 2013 at 12:52 PM, Giacomo Ritucci
<giacomo.ritucci at gmail.com> wrote:
> Thanks Jay, string-append* is really handy here.
>
> Another hint came from Matthew Butterick that pointed me to a message from
> Matthias Felleisen that suggested to use match
> (http://lists.racket-lang.org/users/archive/2013-June/058426.html)
>
> I experimented a bit with the following example that combines a simple but
> not trivial XML structure, whitespace and entities:
>
> https://gist.github.com/rjack/7968318
>
> (Any feedback is highly appreciated. For example, Jay mentioning
> string-append* allowed me to get rid of all (apply string-append ...))
>
> Honestly, my first thought has been "That's a overly difficult approach to a
> simple query on XML data".
>
> Thoughts:
>
> 1. eliminate-whitespace was key to successfully use match, I wish I found it
> earlier
> 2. match patterns and list operations are really difficult to read (and
> write) compared to the equivalent xpath expression
> 3. it would be great if the XML library could provide helper functions
> (something like xe->string and xe-string=?)
>
> Is there some interest to polish this example so it can be turned into a
> tutorial or a guide for the Racket XML library documentation? From a newbie
> point of view this way of querying XML is not obvious.
>
> Feedback, fixes and suggestions are highly appreciated.
>
> Thanks again,
> Giacomo
>
>
>
> On Tue, Dec 10, 2013 at 12:45 AM, Jay McCarthy <jay.mccarthy at gmail.com>
> wrote:
>>
>> Hi Giacomo,
>>
>> I think I would do this:
>>
>> (define (xe->string n)
>>     (string-append* (rest (rest n))))
>>
>>   (check-equal? (map xe->string (se-path*/list '(bands) xe))
>>                 '("Derek & the Dominos" "Nick Cave & the Bad Seeds"))
>>
>> Because you want the children of "bands" and you want to turn each one
>> into a string.
>>
>>
>> On Sat, Dec 7, 2013 at 6:30 PM, Giacomo Ritucci
>> <giacomo.ritucci at gmail.com> wrote:
>> > Hi Jay,
>> >
>> > thanks for your reply.
>> >
>> > Unfortunately I can't find a way in my code to detect that in the
>> > resulting
>> > list from se-path*/list
>> >
>> >
>> >     '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds")
>> >
>> > the first three elements should be actually treated as a single string
>> > and
>> > so the last three.
>> >
>> > Is there a common idiom in Racket to extract a list of values from an
>> > XML
>> > collection, in a way that works with & and other entities?
>> >
>> > Thanks in advance.
>> >
>> >
>> > On Mon, Dec 2, 2013 at 9:27 PM, Jay McCarthy <jay.mccarthy at gmail.com>
>> > wrote:
>> >>
>> >> Hi Giacomo,
>> >>
>> >> First, the question is not really about se/list, because if you look
>> >> at the xexpr you're giving it, the "name" node has three string
>> >> children:
>> >>
>> >> '(bands () (name () "Derek " "&" " the Dominos") (name () "Nick Cave "
>> >> "&" " the Bad Seeds"))
>> >>
>> >> And se/list* gives you these children all appended together. If you
>> >> got the name nodes themselves, then you could concatenate their
>> >> children.
>> >>
>> >> Second, there real question is about why parsing XML works like that.
>> >> If you look at this:
>> >>
>> >> (define xs
>> >>   "<bands><name>Derek & the Dominos</name><name>Nick Cave &
>> >> the Bad Seeds</name></bands>")
>> >> (define x
>> >>   (read-xml/document (open-input-string xs)))
>> >> x
>> >>
>> >> Then you'll see that the core is that name doesn't have a single piece
>> >> of PCDATA. It has three, one of which is an entity.
>> >>
>> >> I don't consider this an error in the XML parser, but a consequence of
>> >> XML entities that might not be obvious: they are their only nodes in
>> >> the list of children of the parent node.
>> >>
>> >> Jay
>> >>
>> >>
>> >> On Sun, Dec 1, 2013 at 8:36 AM, Giacomo Ritucci
>> >> <giacomo.ritucci at gmail.com> wrote:
>> >> > Hi Racket Users,
>> >> >
>> >> > I'm using se-path*/list to extract values from an XML collection but
>> >> > I
>> >> > found
>> >> > a strange behaviour when the extracted values contain entities.
>> >> >
>> >> > For example, given the following XML:
>> >> >
>> >> > <bands>
>> >> >     <name>Derek & the Dominos</name>
>> >> >     <name>Nick Cave & the Bad Seeds</name>
>> >> > </bands>
>> >> >
>> >> > when I extract a list of band names with (se-path*/list '(name) xe)
>> >> > I'd
>> >> > expect this result:
>> >> >
>> >> >     '("Derek & the Dominos" "Nick Cave & the Bad Seeds")
>> >> >
>> >> > but what I actually receive is:
>> >> >
>> >> >     '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds")
>> >> >
>> >> > Is this the intended behaviour? How can I overcome this and make
>> >> > se-path*/list return one string for tag?
>> >> >
>> >> > Here's my test code, I'm running Racket v5.3.6 on Linux x86_64 and
>> >> > maybe
>> >> > I'm
>> >> > doing overlooking something because I'm new to Racket.
>> >> >
>> >> > Thank you in advance!
>> >> >
>> >> > Best regards,
>> >> > Giacomo
>> >> >
>> >> > #lang racket
>> >> >
>> >> > (require xml
>> >> >          xml/path)
>> >> >
>> >> > (define xe (string->xexpr "<bands><name>Derek & the
>> >> > Dominos</name><name>Nick Cave & the Bad Seeds</name></bands>"))
>> >> >
>> >> > (module+ test
>> >> >   (require rackunit)
>> >> >
>> >> >   ;; what I get
>> >> >   (check-equal? (se-path*/list '(name) xe)
>> >> >                 '("Derek " "&" " the Dominos" "Nick Cave " "&" " the
>> >> > Bad
>> >> > Seeds"))
>> >> >
>> >> >   ;; what I'd expect
>> >> >   (check-equal? (se-path*/list '(name) xe)
>> >> >                 '("Derek & the Dominos" "Nick Cave & the Bad
>> >> > Seeds")))
>> >> >
>> >> > ____________________
>> >> >   Racket Users list:
>> >> >   http://lists.racket-lang.org/users
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jay McCarthy <jay at cs.byu.edu>
>> >> Assistant Professor / Brigham Young University
>> >> http://faculty.cs.byu.edu/~jay
>> >>
>> >> "The glory of God is Intelligence" - D&C 93
>> >
>> >
>>
>>
>>
>> --
>> Jay McCarthy <jay at cs.byu.edu>
>> Assistant Professor / Brigham Young University
>> http://faculty.cs.byu.edu/~jay
>>
>> "The glory of God is Intelligence" - D&C 93
>
>
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>

Posted on the users mailing list.