<div dir="ltr"><div><div><div>Thanks Jay, string-append* is really handy here.<br><br>Another hint came from Matthew Butterick that pointed me to a message from Matthias Felleisen that suggested to use match (<a href="http://lists.racket-lang.org/users/archive/2013-June/058426.html" target="_blank">http://lists.racket-lang.org/users/archive/2013-June/058426.html</a>)<br>
<br>I experimented a bit with the following example that combines a simple but not trivial XML structure, whitespace and entities:<br><br><a href="https://gist.github.com/rjack/7968318" target="_blank">https://gist.github.com/rjack/7968318</a><br>
<br></div><div>(Any feedback is highly appreciated. For example, Jay mentioning string-append* allowed me to get rid of all (apply string-append ...))<br></div><div><br></div>Honestly, my first thought has been "That's a overly difficult approach to a simple query on XML data".<br>
<br></div><div>Thoughts:<br><br></div><div>1. eliminate-whitespace was key to successfully use match, I wish I found it earlier<br></div><div>2. match patterns and list operations are really difficult to read (and write) compared to the equivalent xpath expression<br>
</div><div>
3. it would be great if the XML library could provide helper functions (something like xe->string and xe-string=?)<br><br></div>Is there some interest to polish this example so it can be turned into a tutorial or a guide for the Racket XML library documentation? From a newbie point of view this way of querying XML is not obvious.<br>
<br>Feedback, fixes and suggestions are highly appreciated.<br><br></div>Thanks again,<br>Giacomo<br><div><div>
<div><div><br></div></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Dec 10, 2013 at 12:45 AM, Jay McCarthy <span dir="ltr"><<a href="mailto:jay.mccarthy@gmail.com" target="_blank">jay.mccarthy@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Giacomo,<br>
<br>
I think I would do this:<br>
<br>
(define (xe->string n)<br>
(string-append* (rest (rest n))))<br>
<br>
(check-equal? (map xe->string (se-path*/list '(bands) xe))<br>
<div class="im"> '("Derek & the Dominos" "Nick Cave & the Bad Seeds"))<br>
<br>
</div>Because you want the children of "bands" and you want to turn each one<br>
into a string.<br>
<br>
<br>
On Sat, Dec 7, 2013 at 6:30 PM, Giacomo Ritucci<br>
<div class="HOEnZb"><div class="h5"><<a href="mailto:giacomo.ritucci@gmail.com">giacomo.ritucci@gmail.com</a>> wrote:<br>
> Hi Jay,<br>
><br>
> thanks for your reply.<br>
><br>
> Unfortunately I can't find a way in my code to detect that in the resulting<br>
> list from se-path*/list<br>
><br>
><br>
> '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds")<br>
><br>
> the first three elements should be actually treated as a single string and<br>
> so the last three.<br>
><br>
> Is there a common idiom in Racket to extract a list of values from an XML<br>
> collection, in a way that works with & and other entities?<br>
><br>
> Thanks in advance.<br>
><br>
><br>
> On Mon, Dec 2, 2013 at 9:27 PM, Jay McCarthy <<a href="mailto:jay.mccarthy@gmail.com">jay.mccarthy@gmail.com</a>> wrote:<br>
>><br>
>> Hi Giacomo,<br>
>><br>
>> First, the question is not really about se/list, because if you look<br>
>> at the xexpr you're giving it, the "name" node has three string<br>
>> children:<br>
>><br>
>> '(bands () (name () "Derek " "&" " the Dominos") (name () "Nick Cave "<br>
>> "&" " the Bad Seeds"))<br>
>><br>
>> And se/list* gives you these children all appended together. If you<br>
>> got the name nodes themselves, then you could concatenate their<br>
>> children.<br>
>><br>
>> Second, there real question is about why parsing XML works like that.<br>
>> If you look at this:<br>
>><br>
>> (define xs<br>
>> "<bands><name>Derek & the Dominos</name><name>Nick Cave &<br>
>> the Bad Seeds</name></bands>")<br>
>> (define x<br>
>> (read-xml/document (open-input-string xs)))<br>
>> x<br>
>><br>
>> Then you'll see that the core is that name doesn't have a single piece<br>
>> of PCDATA. It has three, one of which is an entity.<br>
>><br>
>> I don't consider this an error in the XML parser, but a consequence of<br>
>> XML entities that might not be obvious: they are their only nodes in<br>
>> the list of children of the parent node.<br>
>><br>
>> Jay<br>
>><br>
>><br>
>> On Sun, Dec 1, 2013 at 8:36 AM, Giacomo Ritucci<br>
>> <<a href="mailto:giacomo.ritucci@gmail.com">giacomo.ritucci@gmail.com</a>> wrote:<br>
>> > Hi Racket Users,<br>
>> ><br>
>> > I'm using se-path*/list to extract values from an XML collection but I<br>
>> > found<br>
>> > a strange behaviour when the extracted values contain entities.<br>
>> ><br>
>> > For example, given the following XML:<br>
>> ><br>
>> > <bands><br>
>> > <name>Derek & the Dominos</name><br>
>> > <name>Nick Cave & the Bad Seeds</name><br>
>> > </bands><br>
>> ><br>
>> > when I extract a list of band names with (se-path*/list '(name) xe) I'd<br>
>> > expect this result:<br>
>> ><br>
>> > '("Derek & the Dominos" "Nick Cave & the Bad Seeds")<br>
>> ><br>
>> > but what I actually receive is:<br>
>> ><br>
>> > '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad Seeds")<br>
>> ><br>
>> > Is this the intended behaviour? How can I overcome this and make<br>
>> > se-path*/list return one string for tag?<br>
>> ><br>
>> > Here's my test code, I'm running Racket v5.3.6 on Linux x86_64 and maybe<br>
>> > I'm<br>
>> > doing overlooking something because I'm new to Racket.<br>
>> ><br>
>> > Thank you in advance!<br>
>> ><br>
>> > Best regards,<br>
>> > Giacomo<br>
>> ><br>
>> > #lang racket<br>
>> ><br>
>> > (require xml<br>
>> > xml/path)<br>
>> ><br>
>> > (define xe (string->xexpr "<bands><name>Derek & the<br>
>> > Dominos</name><name>Nick Cave & the Bad Seeds</name></bands>"))<br>
>> ><br>
>> > (module+ test<br>
>> > (require rackunit)<br>
>> ><br>
>> > ;; what I get<br>
>> > (check-equal? (se-path*/list '(name) xe)<br>
>> > '("Derek " "&" " the Dominos" "Nick Cave " "&" " the Bad<br>
>> > Seeds"))<br>
>> ><br>
>> > ;; what I'd expect<br>
>> > (check-equal? (se-path*/list '(name) xe)<br>
>> > '("Derek & the Dominos" "Nick Cave & the Bad Seeds")))<br>
>> ><br>
>> > ____________________<br>
>> > Racket Users list:<br>
>> > <a href="http://lists.racket-lang.org/users" target="_blank">http://lists.racket-lang.org/users</a><br>
>> ><br>
>><br>
>><br>
>><br>
>> --<br>
>> Jay McCarthy <<a href="mailto:jay@cs.byu.edu">jay@cs.byu.edu</a>><br>
>> Assistant Professor / Brigham Young University<br>
>> <a href="http://faculty.cs.byu.edu/~jay" target="_blank">http://faculty.cs.byu.edu/~jay</a><br>
>><br>
>> "The glory of God is Intelligence" - D&C 93<br>
><br>
><br>
<br>
<br>
<br>
--<br>
Jay McCarthy <<a href="mailto:jay@cs.byu.edu">jay@cs.byu.edu</a>><br>
Assistant Professor / Brigham Young University<br>
<a href="http://faculty.cs.byu.edu/~jay" target="_blank">http://faculty.cs.byu.edu/~jay</a><br>
<br>
"The glory of God is Intelligence" - D&C 93<br>
</div></div></blockquote></div><br></div>