[plt-scheme] FW: [ssax-sxml] SXSLT presentation, types, and XQuery

From: Paul Steckler (steck at ccs.neu.edu)
Date: Fri Jan 24 13:36:03 EST 2003
Previous message: [plt-scheme] Unexpected exception in read-xml
Next message: [plt-scheme] "identifier originates in a different module"
Messages sorted by: [date] [thread] [subject] [author]
Forwarded, by permission.

-- Paul

-----Original Message-----
From: ssax-sxml-admin at lists.sourceforge.net
[mailto:ssax-sxml-admin at lists.sourceforge.net] On Behalf Of
oleg at pobox.com
Sent: Thursday, January 23, 2003 10:15 PM
To: ssax-sxml at lists.sourceforge.net
Subject: [ssax-sxml] SXSLT presentation, types, and XQuery


I have just returned from PADL/POPL in New Orleans. A PADL
presentation 'SXSLT: Manipulation Language for XML' is available at

	http://pobox.com/~oleg/ftp/papers/SXSLT-talk.pdf

Quite unexpectedly, the presentation stirred a lively debate on the
eternal question of static vs. dynamic typing. Although Jerome Simeon
and Philip Wadler granted me that SXSLT is consistent with Scheme, they
kept asserting the importance of types in XML manipulation. On one
hand, they do have a point. It is indeed unfortunate that a typo in a
transformation template can result in

	<titl>text</titl>

when <title> was expected. Incidentally, XSLT is prone to the same
mistake. It should be noted however that typos of that kind are easy
to catch in SXSLT. The conversion of SXML to xML is performed by the
default rule of the universal conversion stylesheet (by the entag
function, to be precise). The function 'etgag' is in the position to
check the name of the SXML element against the list of "expected"
elements, and report an error. A SXML->LaTeX conversion uses that
trick: the conversion process warns about SXML elements that were not
handled by custom SXML rules. Granted, these checks are performed at
the time of running the transformer, rather than statically. This fact
is consistent with the dynamic typing discipline of Scheme. Some
Scheme systems however can statically check certain program properties
(e.g., that a program does not attempt to take 'car' of a
number). Generally speaking, SXSLT can be modified to facilitate flow
analyses of similar kind. I'm not quite sure that this is the matter
of high priority.

	We must stress that it doesn't appear possible to statically
verify that a transformation program will always produce valid
XML. For example, let us consider a transformation program that
combines two XML documents into one. Suppose that the content models
of both input XML documents include an attribute of a type ID. Suppose
further that the document type of the resulting document also contains
an attribute of a type ID. How to statically verify that the values of
the ID attribute in the merged document are all unique?  That doesn't
appear possible. Furthermore, XML Schema defines quite a few
constraints on character data. For example, you can require character
data to match a particular regular expression or impose a range
constraint on a numerical datum. Typing such constraints requires an
advanced dependent-type system. It is not obvious if typechecking is
decidable in such a system (the inference is almost certainly
undecidable).

	XML communication introduces a finer distinction than mere
static and dynamic typing. An error in an XML document may be detected
when:
	- compiling an XML producer
	- running an XML producer
	- parsing/validating the incoming document by a consumer
	- processing of parsed data by a consumer

The error handling in the default SXSLT rule qualifies then as a
"semi-static" typing. An error is detected when running the XML
producer -- but before the XML document is transmitted to the consumer
and before running of the consumer.

	The fact that an XML consumer may validate the incoming XML
document against its own stylesheet makes the issue even more
complicated. James Clark presented convincing arguments why a client
may need to validate an incoming document with various schemas. One of
the examples of such selective validation is processing of SOAP
requests. A SOAP processor is interested only in the envelope of the
SOAP message. Generally speaking, the SOAP processor can't even
rigorously validate the body of the SOAP request.

	It is interesting to note that the next talk after mine at
PADL was 'Type-based XML processing in Logic Programming' by Jorge
Coelho and Mario Florido (from Porto, Portugal). The authors attempted
to build a type system for XML manipulation, and to use logic
programming for type inference. The type system was designed for a DTD
content model only. Neither XML Schema nor other validity constraints
are supported. Unwittingly, the authors demonstrated why typeful XML
processing is hard. During the question and answer period, the
presenter admitted that their type system suffers from the exponential
blow-up in the presence of 'optional' XML elements.

	Philip Wadler was greatly upset that I never mentioned XQuery
in my talk. I replied that XQuery didn't appear relevant: the topic of
the talk was SXSLT rather than (S)XPath; furthermore, XQuery is still
a working draft. In hindsight, that wasn't a politically-correct
comment to make: it upset Philip Wadler even more. Philip Wadler came
to me after the talk and emphasized once again how upset he was that I
uttered XSLT every other sentence and not even once mentioned
XQuery. In atonement, I urge everyone to read the XQuery working draft
and send comments. Philip Wadler said that he and Jerome will read
them.


	At POPL, which followed PADL, Philip Wadler presented a talk
"The essence of XML". He readily admitted that XML is far from being
ideal, and S-expressions are superior. In fact, one of his slides said
that XML:
	  - Solves a problem that is not hard
	  - It does not solve it very well

Nevertheless, Philip Wadler thinks that XML is here, and programming
language researchers ought to actively participate in its
development, to bring order and rigor. Philip Wadler cited several
successes: XQuery working draft now includes formal semantics. So
does a RELAX-NG specification.

	The paper "The essence of XML" is available at
http://www.research.avayalabs.com/user/wadler/papers/xml-essence/xml-ess
ence.pdf

	The gist of the presentation was _named_ typing of XML
documents and a 'validation theorem'. In a named type system (as
opposite to the structural type system) two types are different if
they are named differently. It seems that named typing of XQuery makes
it possible to employ linear finite automata (rather than more complex
and less efficient tree automata).

	One of the interesting results of the paper is a so-called
Roundtripping: "We would like to know that if we convert an internal
value of a given type to an external value (using erasure) and then
convert the external value back to an internal value (using validation
against that type) that we again end up back where we started, so long
as the type is unambiguous. This follows immediately from the
validation theorem."

	It is important to note that in general XML Schema types do
not guarantee roundtripping. The latter property is guaranteed only
for unambiguous types. Alas, the latter are not strictly defined, which
makes the theorem a circular reasoning.

	We must again mention that the consumer and the producer of an
XML document are not obliged to use the same schema. A consumer of an
XML document might not even want to convert an external value (the
text of an XML element or character data) to an internal value (a
tree, an integer, etc). For example, if a consumer wants to check the
signature of an XML fragment, the consumer would parse the fragment
and do a 'shallow' validation to produce a canonicalized version of
the fragment.

	I wonder furthermore if an XML query language must be
typed. For example, suppose we have an XML fragment

   <div><p name="foo" style='something'>text</p></div>

of the type
      <!ELEMENT div (p)*>
      <!ATTLIST p name CDATA>

We write a (XPath-like) query 'div/p[name="foo"]'. Suppose that we
later decided to amend the document type to
      <!ELEMENT div (p)+>
      <!ELEMENT div (br?,p)+>

Incidentally, the XML fragment validates against these amended
types. We would like that our query would work without changes, and
give the same result. An XML query must be robust against the addition
of attributes to elements and against the changes that make an
optional element mandatory. All such modification change the document
type.  An XML query must therefore be polymorphic. I wonder how
tractable such polymorphism is, especially in the presence of
regular-expression types and dependent types (mandated by XML Schema).

	Obviously I need to read the XQuery working draft in detail.


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
ssax-sxml mailing list
ssax-sxml at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ssax-sxml
Posted on the users mailing list.
Previous message: [plt-scheme] Unexpected exception in read-xml
Next message: [plt-scheme] "identifier originates in a different module"
Messages sorted by: [date] [thread] [subject] [author]