[racket] JSON module: why symbols for object keys? lists for arrays?

From: Eli Barzilay (eli at barzilay.org)
Date: Thu Jun 6 10:55:52 EDT 2013

[Late reply, since the other thread reminded me of this.  Might be
irrelevant for your actual decisions by now...]

On April 22nd, Erik Pearson wrote:
> Hi Eli,
> Wow, thanks for the great feedback.
> On Mon, Apr 22, 2013 at 3:21 PM, Eli Barzilay <eli at barzilay.org> wrote:
> > General comment: unless you have an explicit goal of supporting
> > Mustache templates as-is, you should consider doing things in
> > plain Racket.  IMO it works far better than the pile of half-baked
> > hacks that is Mustache (at least AFAICS).  (I can go on for pages
> > on this point, so I'll avoid it unless anyone's interested...)
> I have a some extensive web sites built on Mustache-style
> templates. I say style, because it is more based on the original
> ctemplate syntax from Google, and avoids some of the Mustache
> limitations or Handlebars extensions. It is very practical, in my
> experience, and easy enough to implement (since it sticks with the
> very simple syntax.)

There are two different possibilities here: (a) you have some existing
mustache material that needs to be supported as-is.  If this is the
case, then there is definitely a good point in implementing it.  But
then there's (b) you have some need for some "simple templating" and
mustache must be it if so many people use it -- and if that's the case
then I really don't buy it.

The reason for that should be obvious in the Racket world, where we
try to do as much work as possible in the form of a proper language
instead of some semi-DSL-hack like regexp-replacing "{{\\w*}}" with
strings.  This is something that I tell students over and over again,
even though most of them don't remember it (and I tell them that most
of them won't) some do and I've even had a few come back to me
sometime later and tell me horror stories of creating bad DSLs and the
amount of generated grief...  The thing is that it starts simple as
that regexp-replacing, but soon enough you want to add more
functionality, so you do so, bits by bits, and you end up with a
language.  Usually a very bad one.  In contrast, if you *start* with
an existing language -- and the choice doesn't matter here, JS would
do just as well -- then all you need is just use the language.  For
example, it took me a few seconds to find this gem in the handlebars

    {{#list people}}{{firstName}} {{lastName}}{{/list}}

with some code that implements that "list helper" -- compare that with
the scribble/text way (which is the same thing that gets used in the
server) of just using plain racket:

    @(for/list ([x some-list])
        @list{@dict-ref[some-list 'first] @dict-ref[some-list 'last]})

or scribble/html which adds html-tag functions:

    @ul{@(for/list ([x some-list])
           @li{@dict-ref[some-list 'first] @dict-ref[some-list 'last]})}

On a shallow look, I painfully realize that many mustache users will
cry about how much more complicated this is -- conveniently forgetting
the actualy implementation of that helper, but also missing the fact
that because it's a generic language, there's nothing that prevents me
from making this into a helper --

    (define (dict-list->ul l)
      @ul{@(for/list ([x l])
            @li{@dict-ref[some-list 'first] @dict-ref[some-list 'last]})})

and the template becomes


Even though the difference seem small, it's really conceptually huge:
you always have a general language, so you can immediately do whatever
it is that you can do with Racket itself.  No reason to resort to
additional regexp hacks like {{{..}}} vs {{..}}, or {{foo.bar.baz}}
which is re-implementing JS-like syntax instead of treating it as JS,
eg, the additional hacks of {{../foo.bar}}, {{./foo.bar}},
{{this/foo.bar}}, and {{this.foo.bar}}.  (When you get to use these it
should be painfully clear that you're using a new language, only one
that is probably going to be different enough from JS that a
near-future bite seems inevitable.)

Here's another example, continuing down that page:

  * You can register a helper with the Handlebars.registerHelper

Q: what happens when two bits of code register a helper by the same
name?  That's a rhetorical question -- I'm sure that the answer is
pretty obvious.  But here's the thing: the same question looks very
different in the Racket context -- while it's possible to do such a
similar registration thing (ie, mutate a dictionary), most sane code
won't do that, and this is a result of a design that was literally the
subject of a few phd works -- which means that instead of "Idono, if
it works for me then who cares?" you get something that people spent a
ton of time designing.

> >> For me there is also the increase in complexity when translating
> >> from JSON to jsexpr -- when components of JSON are translated
> >> into different objects in the host language it is yet another
> >> thing to remember, multiple forms of symbols, another layer of
> >> coding.  [...]
> >
> > I'm not following what it is that is more complicated here.  The
> > fact that there are different concrete syntaxes for the same
> > symbols is not different from strings which have the same issue.
> > But either way, this shouldn't be an issue since you shouldn't
> > care about the actual JSON source and just use the values that you
> > read it as.  (So I'm guessing that I'm missing something here.)
> On the face of it, json defines strings, so does Racket, strings are
> used as values, strings are used as keys, why mess that up? Using
> symbols to me makes it more complicated in a couple of ways.

It would be helpful if you can show some examples where it's making
things more difficult.  (That's a real question.  For example, you
might run into some need to use `symbol->string' because some JSON
source gives you names that actually encode some substructure in
them.  But I imagine that these would be extremely rare cases.)

> What is the impact of putting arbitrary user data into the symbol
> space?  Performance? Symbol table exhaustion? Some other
> interference with program logic due to symbol corruption? I don't
> know.

The choice of symbols is the same as its use for identifier names in
source code.  It's just a string-like type that is more convenient
when it's used to identify things -- and this seems to be the
intention in JS dictionaries.  It makes sense to use strings in the
JSON *representation*, since there it's a simplification, but it
doesn't make sense in its use.  It's similar to how you can refer to
these things in JS as x.y instead of being forced to use x["y"].

(Note related to the above parenthetical comment: I imagine those case
to be rare in exactly the same way that having to use x["y"] is rare.)

> Unless there is a good reason, why bother with introducing these
> unknowns? Maybe this is a CL bias on my part?

Not at all -- this design decision should apply to all Lisps --
probably even a bit *more* for ones that are not in the Scheme

> For another, the representation of symbols in the reader can be
> either single quote for simple symbols, or bar-delimited for more
> complicated ones. This makes creation of json literals in racket a
> bit of a pain. In general, I don't see a good reason to conflate
> strings with symbols in this case. (I do recognise the familiarity
> of symbols for hash keys in Scheme, tho.)

That's something that I didn't understand back then too.  You get the
same thing with strings:

    -> "foo bar"
    "foo bar"
    -> "foo\40bar"
    "foo bar"
    -> "foo\x20bar"
    "foo bar"
    -> "foo\u0020bar"
    "foo bar"
    -> "foo\U00000020bar"
    "foo bar"
    -> "\146\157\x6f\u0020\
    "foo bar"
    -> #<<MEH
    foo bar
    "foo bar"

but that shouldn't be a problem for any code, I think.

> >> There is a similar issue with lists being used to represent JSON
> >> arrays, over the more obvious choice of Racket vector. Maybe this is
> >> because there are more core functions for dealing with lists
> >> compared to the limited number for vectors (for data processing type
> >> things like foldl). I suppose it is YMMV depending on how you use
> >> the data.  Random element access, simple iteration, or more complex
> >> folding, etc.
> >
> > Here too, I think that the vague intention is "some ordered list of
> > values", so it makes sense to use he datatype that is most common in
> > your language.  In JS this is obviously arrays, and in all Lisps I
> > think that lists make more sense.  For most cases I think that the
> > performance consideration is irrelevant anyway, since the lists would
> > be very short, and since you usually view them as a list rather as a
> > random-access ordered mapping.  If you get to a point where the cost
> > of lists makes a difference, then my guess is that using JSON can
> > become questionable -- and in rare cases that it does make sense
> > (perhaps because some upstream you don't control), some streaming API
> > as Neil mentioned can make more sense.
> Yeah, I can see that, and it doesn't really make much of a difference.
> In the CL implementation, I sometimes pine for an array to be a list.
> But it works through an api, mostly, so that detail is not normally
> important. To implement bidirectional translation between native types
> to and json types, though, it is something to consider. For instance,
> if you use alists for objects, then it is sensible to pick vectors for
> arrays, so that you can do simple type matching. With objects
> represented as hash tables, list is available. Of course, types
> specifically designed for json make this moot. (But a native
> representation is always useful to have.)

I'm not following the "native" point here...

> > The reason for the parameterized null value was that the original
> > code used the #\null character to represent nulls, which is
> > something that I viewed as a bad type pun...  So I left in a
> > parameter and an argument to make it easy to use it for easy
> > porting if needed.
> In my CL implementation, I opted for representing null, true, and
> false as :NULL, :TRUE, and :FALSE, to avoid any conflation between
> json and lisp. Is annoyingly easy to introduce nil (in CL) or null,
> #f, #t from functions which naturally return these values, most often
> from functions which return null or #f upon failure. Using explicit
> symbols is rarely annoying, and never confusing.

Yeah, it's a common tradeoff -- using Racket types makes it easier to
write code, but risks getting bugs when you get such a value by
mistake.  At the other extreme, define some `js' constructor, and now
you deal with (js foo) throughout the whole tree of values (wrapping
booleans, lists, strings, and dicts) -- but you're also practically
eliminating all chances of such pun-related-errors.

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the users mailing list.