[racket] JSON module: why symbols for object keys? lists for arrays?

From: Eli Barzilay (eli at barzilay.org)
Date: Mon Apr 22 18:21:56 EDT 2013

Two hours ago, Erik Pearson wrote:
> Hi,
> 
> I've just starting playing with JSON in racket. The goal is to use
> JSON + Mustache-like templates to generate content.

General comment: unless you have an explicit goal of supporting
Mustache templates as-is, you should consider doing things in plain
Racket.  IMO it works far better than the pile of half-baked hacks
that is Mustache (at least AFAICS).  (I can go on for pages on this
point, so I'll avoid it unless anyone's interested...)


> I've done this lots with CL and Javascript, now trying the Racket
> approach. I'm wondering about a couple of design decisions for the
> JSON module. For one, I was a bit surprised but not shocked to see
> json object property keys implemented as symbols. I assume it is for
> performance reasons? I know there is be a trade off between the cost
> of converting strings to symbols, the efficiency of symbol-based eq
> hash tables vs string equal hash tables, etc.

I'm not responsible for the original representation decisions -- I
just borrowed the ones made by Dave except for the `#\null' thing he
used -- so I can't speak for him.  However, it seems to me that these
decisions make sense given that it's almost obvious that the intention
is to use hash tables with some symbol-like domain for keys.  So I
don't think that there were any efficiency considerations, just an
attempt to use the most natural representation given the vague spec.


> For me there is also the increase in complexity when translating
> from JSON to jsexpr -- when components of JSON are translated into
> different objects in the host language it is yet another thing to
> remember, multiple forms of symbols, another layer of coding.  [...]

I'm not following what it is that is more complicated here.  The fact
that there are different concrete syntaxes for the same symbols is not
different from strings which have the same issue.  But either way,
this shouldn't be an issue since you shouldn't care about the actual
JSON source and just use the values that you read it as.  (So I'm
guessing that I'm missing something here.)


> There is a similar issue with lists being used to represent JSON
> arrays, over the more obvious choice of Racket vector. Maybe this is
> because there are more core functions for dealing with lists
> compared to the limited number for vectors (for data processing type
> things like foldl). I suppose it is YMMV depending on how you use
> the data.  Random element access, simple iteration, or more complex
> folding, etc.

Here too, I think that the vague intention is "some ordered list of
values", so it makes sense to use he datatype that is most common in
your language.  In JS this is obviously arrays, and in all Lisps I
think that lists make more sense.  For most cases I think that the
performance consideration is irrelevant anyway, since the lists would
be very short, and since you usually view them as a list rather as a
random-access ordered mapping.  If you get to a point where the cost
of lists makes a difference, then my guess is that using JSON can
become questionable -- and in rare cases that it does make sense
(perhaps because some upstream you don't control), some streaming API
as Neil mentioned can make more sense.

And speaking about this streaming interface: I had considered it too,
but I think that for most uses it is an overkill.  The idea is
basically a fully parameterized parser with used defined functions for
the constructors -- and taking it further you get the functions to be
more like continuations that can dictate how parsing proceeds, or even
the ability to use different constructors at different levels etc.
It's tempting in its generality, but I chose the simpler code since if
the super-general thing becomes needed enough, it's easy to implement
it and maintain the current functionality as the default case.


> Anyway, I was hoping the authors or associates could comment on
> these design decisions. A related topic is whether the approach of
> the JSON module to allow specification of implementation for NULL,
> for instance, could be extended to Objects and Arrays. On the other
> hand, maybe it is better to fork a new JSON module with different
> and specific implementation details, either for personal use or as
> part of the standard library (it takes about 5 minutes to make the
> necessary changes.)

The reason for the parameterized null value was that the original code
used the #\null character to represent nulls, which is something that
I viewed as a bad type pun...  So I left in a parameter and an
argument to make it easy to use it for easy porting if needed.

When I did that, I obviously thought about doing the same for lists
and hashes (eg, it creates immutable hashes, and in some cases you'd
want a mutable one) -- but parameterizing these makes the efficiency
question be more important, since the code would be collecting lists
of things just to send them to user-provided constructors.  So
obviously my next thought is to use cons-like constructors, and
instead of getting a head + rest thing make them get a head +
callback-to-get-the-rest, and that leads to the same streaming thing.
And at this point I stopped and went with the simpler-yet-practical-
engouh thing.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the users mailing list.