[plt-scheme] Identifying at-exp string literals [and 1 more messages]

From: Eli Barzilay (eli at barzilay.org)
Date: Mon Sep 21 10:36:04 EDT 2009

[I intended to write some quick thing, and it turned to be much
longer...]


On Sep 21, Dave Gurnell wrote:
> 
> I'm looking for a way to write a macro that can distinguish string
> literals that do and don't come from at-expression boilerplate. For
> example, in the following code:
> 
>      (define b "b")
> 
>      @list{a @b c}
> 
> I'd like to be able to differentiate "a " and " c" from "b".

IIUC, you need to just walk over the subexpressions of `xml', and wrap
every non-literal-string expression with an encoder.  You can do this
with a simple macro:

  #lang at-exp scheme
  (require (for-syntax scheme/list))
  (define-syntax (wrap stx)
    (syntax-case stx ()
      [(_ x) (if (string? (syntax-e #'x)) #'x #'(list 'encode x))]))
  (define-syntax-rule (xml x ...) (list 'xml (wrap x) ...))
  (define zz "to-be-encoded")
  @xml{1 2 @zz 3 4}

But if you want to be picky and deal with @xml[datums]{text} and avoid
encoding literal strings in the datums, then you can use the scribble
syntax properties for that:

  #lang at-exp scheme
  (require (for-syntax scheme/list))
  (define-syntax (xml stx)
    (define p (syntax-property stx 'scribble))
    (unless (and (pair? p) (eq? 'form (car p)))
      (raise-syntax-error 'xml "must be used in @-form only" stx))
    ;; p is (list 'form <number-of-[]-datum-exprs> <number-of-{}-text-exprs>)
    (syntax-case stx ()
      [(_ x ...)
       (let-values ([(datums texts)
                     (split-at-right (syntax->list #'(x ...)) (caddr p))])
         #`(list 'xml
                 #, at datums
                 #,@(map (lambda (t)
                           (if (string? (syntax-e t)) t #`(list 'encode #,t)))
                         texts)))]))
  (define zz "to-be-encoded")
  @xml["foo"]{1 2 @zz 3 4}

And perhaps Ryan's new library make this less painful.


However,

On Sep 21, Matthew Flatt wrote:
> 
> To me, distinguishing those cases goes against the spirit of
> @-notation. I think it's a great benefit that S-exprs and @-notation
> are interchangeable.

I think that Matthew's "spirit of @-notation" phrasing is a little too
weak...  The thing is that the advantage of the @-form notation is
that it's all just plain scheme in the end.  Using the scribble
properties as above is therefore somewhat similar to using the
'paren-shape property to make [...] be something other than function
application, or using the syntax wrapper capability to make
(foo bar . (blah)) expand to (apply foo bar (blah)).  In both cases
it's cute that you can do so -- and might be appropriate for a DSL
where it's expected to have some non-schemish syntax, but it doesn't
really work well for a library to do the same.

So I think that finding some other to decide what to encode and what
not to would be much better.

I don't know what the typical use patterns for the mirrors library is,
but I don't think that I'd use it...  It looks to me like it would
suffer from the same problem that xexprs suffer -- where the implicit
quoting means that you can make errors that will not be caught until
you get to the client of the generated xml.  (My standard example here
is that for years the PLT pages had lots of <quote>nbsp</quote> --
which were a result of `(foo () ...stuff... 'nbsp ...stuff...).)  I
think that it also goes against the "spirit of xml" (if there is such
a thing) -- which gives the impression that it wants to be a *real*
language, at least in the sense of having bindings and namespaces, not
with the rest of the xml confusion (separation of code and data and
similar junk).

My "recent" solution for this (I've only been playing with it for
about 10 years...) is to define all of the (x)html tags as functions
that generate tags by the same name.  The first round of this attempt
used "self-quoting functions" that would translate the forms into
xexprs -- roughly like this (ignoring attributes for simplicity):

  (define (p . xs) (cons 'p xs))

I've went through about 3 overhauls since then (with the sources for
the plt-scheme.org pages being my main test playground in the last ~4
years), but the idea is still the same.  I think that this works much
better than the xexpr approach (and from my reading of the mirror
docs, it's somewhere on the way up from xexprs, but still too close to
them IMO) --

* Tags need to be defined functions -- so no typos.  Some people think
  that it's a hassle to define them as functions, but in addition to a
  simle macro that can be used as (deftag foo) I also expose the
  underlying tag constructor so you can use some
  (make-tag 'whatever ...) for some random tag.

* You can now do some namespace management using modules, like
  providing different names.  This can nicely be a mirror of some xml
  namespaces (where you can even use the `foo:' prefixes).

* Because they're functions, you can add more functions that behave as
  an alias to some existing tag (so you can get your own layer of
  semantic-oriented functions on top of the limited set of html tags),
  or a function that evaluates to new combinations of tags:

    (define (mailto address)
      (a href: (list "mailto:" email) (tt email)))

* This can be very useful when you're switching formats.  For example,
  at some point I switched from some-html-version to some-newer-one,
  and it helped to define `center' as a function that generates a
  `div' tag with a center alignment.

* You can obviously bake more xml knowledge in -- from contracts or
  types that will enforce dtd requirements all the way up to a macro
  that will read a dtd, parse it, and generate the tag definitions
  with their contracts/types.

* Maybe more relevant to what you're talking about, you have control
  over each tag, so you can have different processing for different
  tags.  For example, in my current course web pages, I do the
  latex-like translations that scribble does with things like ``this''
  -- but I arrange for this to not happen inside functions that render
  code.  (I also used this in the past to encode the way some html
  tags should be rendered (eg, when adding whitespace doesn't matter),
  when I had the illusion that it's possible to render readable HTML
  sources...)

* A related point -- since I'm always defining some function that
  generates these bindings, I can easily add common customization
  keywords that can provide further control over things like encoding
  of the body etc.  A minor subtle point here is that I found it more
  convenient to use scheme keywords for these, but use a different set
  of keywords (eg, the above `href:') for attributes -- mostly because
  I don't trust browsers to treat them as real attributes (with only
  one value for each, and with no differences based on their order).

  I won't even be surprised if the xml spec allows different attribute
  order to have different meaning, or to allow multiple values for a
  single attribute.  (If anyone knows something concrete about this,
  and if you survived this far, I'd be happy if you can tell me.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the users mailing list.