[plt-scheme] accessing SXML->HTML in the latest version of sxml-tools for plt-scheme

From: Joe Marshall (jrm at ccs.neu.edu)
Date: Tue May 25 09:52:23 EDT 2004

"Anton van Straaten" <anton at appsolutions.com> writes:

> No flames, but with case-sensitivity as the default, there'll be
> NoStoppingIt.  CulturalInhibitions WontBeEnough.  It WontBeLong before
> SchemeWillBeIndistinguishableFromWikis.  But IsThatSuchABadThingReally?
> OpinionsDiffer.  Some say StudlyCaps are ReadableMixedCase.  Casting my eye
> OverThisParagraph, I'm NotSureIAgree.

I'm not beneath saying ``I told you so'' (not to Anton).

But this message is not just noise.  It turns out Henry Baker has a
solution to the problem (as always).  In his paper 

"Strategies for the Lossless Encoding of Strings as Ada Identifiers". 
   ACM Ada Letters XIII, 5 (Sep/Oct 1993), 43-47. 
   Interesting ways to translate identifiers from one programming
   language into another without losing readability or
   distinguishability.

   http://home.pipeline.com/~hbaker1/Encode.html

He describes the following scheme:
  An identifier is composed of syllables that are concatenated
  together, possibly with a concatenation character.

  Most syllables map 1-1, but certain syllables are reserved, among
  them is the quoting escape QQ, quoting uppercase QQU, quoting
  lowercase QQL, and quoting capitalization (CamelCase) QQC.  Reserved
  syllables or syllables that have non-standard case must be quoted.

  Transferring from case-sensitive to case-insensitive involves
  breaking the identifier on syllable boundaries, escaping the
  appropriate syllables, and concatenating the results.  Going the
  other direction involves breaking the identifier on syllable
  boundaries, undoing the escaping and concatenating the results.

Although the paper is for encoding Common Lisp symbols as ADA
Identifiers, we can apply the principle to encode CamelCase
identifiers as case-insensitive Scheme symbols.  Here is a simple
illustration: 

Suppose that the foreign identifiers were, for the most part,
CamelCaseIdentifiers (you want to design the mapping for the most
frequent usage).  You would first break the identifier at all places
where a lower-case letter is immediately followed by an upper case
one:  Camel Case Identifiers.  You then encode the syllables, the
default mapping being foreign capitalized directly to Scheme:  camel
case identifiers.  Then concatenate the syllables with hyphens:
camel-case-identifiers.  This would be the bulk of the mappings.

Now for the exceptions.  BlahURL would break as "Blah" "URL".  Blah
would be encoded normally, but URL, being all uppercase, would need to
be escaped with "qqu".  The resulting scheme identifier is 
blah-qqu-url

fooBar          <--->  qql-foo-bar
Qq              <--->  qq-qq
DIFFERENT_STYLE <--->  qqu-different-undr-qqu-style

The paper goes into more detail, but the upshot is that it is possible
to define a readable, one-to-one mapping that is roughly homomorphic
in length (short identifiers remain short, long ones stay long) and
can be computed by inspection by the user.  I've used this for
pathname mapping in Common Lisp and it works quite well.  A good
design here would be valuable.



Posted on the users mailing list.