[plt-scheme] Announce: WebIt! - An XML Collection (version 0.4)

From: Benderjg2 at aol.com (Benderjg2 at aol.com)
Date: Fri Jul 26 20:52:58 EDT 2002

Strangely Noel's reply was the first I saw of MJR's message, though
I have not read his original post as well, via the list's archive.

In a message dated 7/26/2002 10:06:51 AM Central Daylight Time, 
noelwelsh at yahoo.com writes:

> --- MJ Ray <markj at cloaked.freeserve.co.uk> wrote:
> > > The WebIt! collection supports the creation and
> > processing of XML, HTML, and 
> > > CSS using Scheme. The core of WebIt! is RS-XML, an
> > abstract datatype for XML.
> 
> I'm interested in WebIt! as a replacement for the
> hacky syntax-case transformation system I have put
> into SchemeDoc (yes MJ, I'm still working (slowly) on
> it!)

Great!

> Some docs on the schema definition part of the
> library would be nice.  I have no intention of writing
> an XML Schema for SchemeDoc just to translate it into
> Scheme!

That is one big hole right now in the library's documentation. I will
fix that this weekend.

> > I sent you some queries about this, especially
> > asking how it compares to
> > SSAX-SXML and the match.ss currently in PLT-Scheme. 

Noel is dead on with both comments. I'll add a few other points:

re: match--
A difference between my matcher and match.ss
is that I specify the pattern in terms of the data constructors,
match.ss specifies the pattern in terms of the data. With s-exprs,
you would write an xml-rules pattern for a list as (cons ,v1 (cons ,v2 '()),
rather than (,v1 ,v2). Now obviously you would never want to write
patterns for lists this way (in terms of cons calls), but for XML, you would 
never want to have to write the pattern in terms of the underlying data
structure. First, it would be less natural looking-- the underlying data
is actually a tree of structures. Second, you would expose really ugly
things, like namespace urls. I had originally thought to extend either
PLT's match or the Indiana match to use WebIt!'s constructor's as
pattern for matching XML, but in the end I liked the syntax-rules style
of patterns better.

Noel, I think, stated what is really the big philosophic difference between
SXML and WebIt!- the generated constructors versus the more dynamic
s-expr representation of SXML. But a few other comments:

SXML and WebIt! are very similar ADT's for XML- though with very
different concrete types. One thing I do not have is an equivalent of SXML's
auxillary nodes, though I have not had a need for them yet. I could probably
have used SXML as the underlying data type beneath WebIt! constructor's.
But I really prefer working with structures instead. It bothers me that once
(href "something") has been extracted from a (@ ...) node, there is no way
to tell whether it is an attribute or an element. In WebIt!, there are 
separate
structures: xml-attribute and xml-element. On the other hand, SXML can
be a much more compact representation- which could matter a lot when you 
are dealing with megabytes (or gigabytes?) of XML.

As a "surface API" one of the benefits of WebIt! is it's treatment of XML 
namespaces.
Elements and attributes are always constructed to include their expanded 
names
(e.g. {http://w3c.org/schemas/html}:a), but these inconvenient namespace uri
are hidden away behind a constructor like h4:a. Working with expanded names 
in
an s-expression-based SXML is nasty, and yet so is using namespace prefixes:
depending on the mapping of prefixes to uri's (somewhere else in the SXML 
tree),
a:tag and b:tag could in fact be the same element type. In WebIt!, 
comparisons
and matching are always in terms of the expanded names. namespace prefixes
are significant only in *output* of XML.

SSAX is another matter. It is *the* XML parser. WebIt! currently does not 
have a 
parser- though a parser was the first thing I wrote (a year and a half ago). 
The plan 
is that I will create a WebIt! instantiation of SSAX, along the lines of the 
SSAX->SXML
parser, but producing WebIt! structures. (By the way- if you are interested 
in
using WebIt! but need an XML parser, write to me. You might get this moved up
my to-do list!)

Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20020726/dd0a92b2/attachment.html>

Posted on the users mailing list.