[plt-scheme] Re: matching XML types

From: Benderjg2 at aol.com (Benderjg2 at aol.com)
Date: Mon Aug 5 21:00:47 EDT 2002

In a message dated 8/5/2002 6:58:12 AM Central Daylight Time, 
markj at cloaked.freeserve.co.uk writes:

> > but for XML, you would 
> > never want to have to write the pattern in terms of the underlying data
> > structure. 
> 
> Never?

Keep in mind that I am refering here to the underlying WebIt! data 
structures.
The structures are not opaque in WebIt!, so try a simple element like 
(h4:p "a test para") in the REPL, and decide for yourself! I was perhaps 
imprecise, since in SXML, writing a pattern in terms of the underlying data
structure is exactly what you do!

I think that exposing the concrete data structure loses big when handling
XML namespaces, but it has benefits too. With SXML, I'm pretty sure one 
could, 
off-the-shelf, use PLT's match. Oleg has said recently on his list that he 
uses
the match-case built into Bigloo.

> Handling namespaces is one of the problems, I hit upon.
> 
> > I had originally thought to extend either PLT's match or the Indiana 
> match
> > to use WebIt!'s constructor's as pattern for matching XML, but in the end
> > I liked the syntax-rules style of patterns better.
> 
> Shame, I think.

Certainly it has been pointed out to me that supporting the "dots" can be
"expensive", in a way that may be unexpected to the naive user. They work
well in macros, since the pattern matching "costs" are at expansion time 
anyway.
But for matching XML, these costs are at run time. The problem is basically
that it is hard (or not possible?) to avoid either constructing 
environment-like
structures or making a second pass over some of the source elements when
the "..." are present in a pattern and template. Granting that, I like the 
syntax-rules
style of matching quite a bit.

At the same time, there will soon be an alternative available-- pattern
matching based on the regular expression pattern matching of XDuce.

> Just checking: "auxilliary nodes"?  I get a bit lost with jargon.

The SXML spec v2.1 supports an auxillary list, intended (at least in part) to 
support
extensibility beyond the current XML infoset. The syntax is (@@ ...) and may 
be
included in *top* nodes, elements and attributes. An (@@ ...) node can 
include
a *namespaces* node-- this is where it now must be. But it can also include 
"auxillary nodes", which are not currently defined in the SXML spec. Oleg 
and/or
Kirill have suggested a variety of possible uses for these-- such as a hash 
table for
quick access to attributes.

> > I could probably have used SXML as the underlying data type beneath WebIt!
> > constructor's. But I really prefer working with structures instead.
> 
> There we differ.  I dislike them.

Indeed, tastes differ. (And there are a few--though I think surprisingly 
few--technical
trade-offs between the two representations.)

> > As a "surface API" one of the benefits of WebIt! is it's treatment of XML 
> > namespaces. [...]
> 
> Yes, it does seem so, but I need to know a bit more at how you match across
> namespaces during the transformations before I'm convinced.

One can define a variety of "similar" tags, where constructors are mapped to 
expanded
names as follows:
   a:link   ==> link
   link ==> {urn:place1}:link
   b:link ==> {urn:another-place}:link

The predicate a:link? will fail to match (link "some text") and (b:link 
"another element")-- 
because both the predicates and the pattern matching system use the expanded 
names for
all tag comparisons.

Note that these constructor names are Scheme identifier names. The use of 
"a:" or "b:" 
(or indeed the absence of a prefix) is unrelated to whether this element is 
locally named
or is part of a namespace.

In WebIt! one can define these types using define-element:
  (define-element (a:link #f))
  (define-element (link urn:place))
  (define-element (b:link urn:another-place))

In constructing the unqualified name for the new element, any "prefix" on the 
constructor
name is stripped, and for each of these, the root name will be "link".

Note the #f in the definition of a:link. This is because the simplest syntax 
actually creates
a "generative type".
  (define-type (c:link))
This actually generates a new element whose name is link, but which is part 
of a unique
namespace, at least until serialized. This allows the creation of tags which 
are used only
in intermediate values in a "stylesheet", while ensuring that such tags 
cannot clash with 
the input or output of the "XML macros". At the same time, if such tags are 
intended to 
be serialized, they are still printed as local names (in this case, as 
"link").

Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20020805/ee867a7f/attachment.html>

Posted on the users mailing list.