[plt-scheme] accessing SXML->HTML in the latest version of sxml-tools for plt-scheme
"Anton van Straaten" <anton at appsolutions.com> writes:
> No flames, but with case-sensitivity as the default, there'll be
> NoStoppingIt. CulturalInhibitions WontBeEnough. It WontBeLong before
> SchemeWillBeIndistinguishableFromWikis. But IsThatSuchABadThingReally?
> OpinionsDiffer. Some say StudlyCaps are ReadableMixedCase. Casting my eye
> OverThisParagraph, I'm NotSureIAgree.
I'm not beneath saying ``I told you so'' (not to Anton).
But this message is not just noise. It turns out Henry Baker has a
solution to the problem (as always). In his paper
"Strategies for the Lossless Encoding of Strings as Ada Identifiers".
ACM Ada Letters XIII, 5 (Sep/Oct 1993), 43-47.
Interesting ways to translate identifiers from one programming
language into another without losing readability or
distinguishability.
http://home.pipeline.com/~hbaker1/Encode.html
He describes the following scheme:
An identifier is composed of syllables that are concatenated
together, possibly with a concatenation character.
Most syllables map 1-1, but certain syllables are reserved, among
them is the quoting escape QQ, quoting uppercase QQU, quoting
lowercase QQL, and quoting capitalization (CamelCase) QQC. Reserved
syllables or syllables that have non-standard case must be quoted.
Transferring from case-sensitive to case-insensitive involves
breaking the identifier on syllable boundaries, escaping the
appropriate syllables, and concatenating the results. Going the
other direction involves breaking the identifier on syllable
boundaries, undoing the escaping and concatenating the results.
Although the paper is for encoding Common Lisp symbols as ADA
Identifiers, we can apply the principle to encode CamelCase
identifiers as case-insensitive Scheme symbols. Here is a simple
illustration:
Suppose that the foreign identifiers were, for the most part,
CamelCaseIdentifiers (you want to design the mapping for the most
frequent usage). You would first break the identifier at all places
where a lower-case letter is immediately followed by an upper case
one: Camel Case Identifiers. You then encode the syllables, the
default mapping being foreign capitalized directly to Scheme: camel
case identifiers. Then concatenate the syllables with hyphens:
camel-case-identifiers. This would be the bulk of the mappings.
Now for the exceptions. BlahURL would break as "Blah" "URL". Blah
would be encoded normally, but URL, being all uppercase, would need to
be escaped with "qqu". The resulting scheme identifier is
blah-qqu-url
fooBar <---> qql-foo-bar
Qq <---> qq-qq
DIFFERENT_STYLE <---> qqu-different-undr-qqu-style
The paper goes into more detail, but the upshot is that it is possible
to define a readable, one-to-one mapping that is roughly homomorphic
in length (short identifiers remain short, long ones stay long) and
can be computed by inspection by the user. I've used this for
pathname mapping in Common Lisp and it works quite well. A good
design here would be valuable.