[racket] JSON module: why symbols for object keys? lists for arrays?

From: Erik Pearson (erik at adaptations.com)
Date: Mon Apr 22 19:54:47 EDT 2013

Hi Eli,

Wow, thanks for the great feedback.

On Mon, Apr 22, 2013 at 3:21 PM, Eli Barzilay <eli at barzilay.org> wrote:
> Two hours ago, Erik Pearson wrote:
>> Hi,
>>
>> I've just starting playing with JSON in racket. The goal is to use
>> JSON + Mustache-like templates to generate content.
>
> General comment: unless you have an explicit goal of supporting
> Mustache templates as-is, you should consider doing things in plain
> Racket.  IMO it works far better than the pile of half-baked hacks
> that is Mustache (at least AFAICS).  (I can go on for pages on this
> point, so I'll avoid it unless anyone's interested...)

I have a some extensive web sites built on Mustache-style templates. I
say style, because it is more based on the original ctemplate syntax
from Google, and avoids some of the Mustache limitations or Handlebars
extensions. It is very practical, in my experience, and easy enough to
implement (since it sticks with the very simple syntax.)

It is also very usable when combined with json: It is easy to "walk"
them in parallel, and to reason about how the template rendering
should behave in the presence some particular valid json data.

>
>
>> I've done this lots with CL and Javascript, now trying the Racket
>> approach. I'm wondering about a couple of design decisions for the
>> JSON module. For one, I was a bit surprised but not shocked to see
>> json object property keys implemented as symbols. I assume it is for
>> performance reasons? I know there is be a trade off between the cost
>> of converting strings to symbols, the efficiency of symbol-based eq
>> hash tables vs string equal hash tables, etc.
>
> I'm not responsible for the original representation decisions -- I
> just borrowed the ones made by Dave except for the `#\null' thing he
> used -- so I can't speak for him.  However, it seems to me that these
> decisions make sense given that it's almost obvious that the intention
> is to use hash tables with some symbol-like domain for keys.  So I
> don't think that there were any efficiency considerations, just an
> attempt to use the most natural representation given the vague spec.
>
>
>> For me there is also the increase in complexity when translating
>> from JSON to jsexpr -- when components of JSON are translated into
>> different objects in the host language it is yet another thing to
>> remember, multiple forms of symbols, another layer of coding.  [...]
>
> I'm not following what it is that is more complicated here.  The fact
> that there are different concrete syntaxes for the same symbols is not
> different from strings which have the same issue.  But either way,
> this shouldn't be an issue since you shouldn't care about the actual
> JSON source and just use the values that you read it as.  (So I'm
> guessing that I'm missing something here.)

On the face of it, json defines strings, so does Racket, strings are
used as values, strings are used as keys, why mess that up? Using
symbols to me makes it more complicated in a couple of ways. What is
the impact of putting arbitrary user data into the symbol space?
Performance? Symbol table exhaustion? Some other interference with
program logic due to symbol corruption? I don't know. Unless there is
a good reason, why bother with introducing these unknowns? Maybe this
is a CL bias on my part? For another, the representation of symbols in
the reader can be either single quote for simple symbols, or
bar-delimited for more complicated ones. This makes creation of json
literals in racket a bit of a pain. In general, I don't see a good
reason to conflate strings with symbols in this case. (I do recognise
the familiarity of symbols for hash keys in Scheme, tho.)

>
>
>> There is a similar issue with lists being used to represent JSON
>> arrays, over the more obvious choice of Racket vector. Maybe this is
>> because there are more core functions for dealing with lists
>> compared to the limited number for vectors (for data processing type
>> things like foldl). I suppose it is YMMV depending on how you use
>> the data.  Random element access, simple iteration, or more complex
>> folding, etc.
>
> Here too, I think that the vague intention is "some ordered list of
> values", so it makes sense to use he datatype that is most common in
> your language.  In JS this is obviously arrays, and in all Lisps I
> think that lists make more sense.  For most cases I think that the
> performance consideration is irrelevant anyway, since the lists would
> be very short, and since you usually view them as a list rather as a
> random-access ordered mapping.  If you get to a point where the cost
> of lists makes a difference, then my guess is that using JSON can
> become questionable -- and in rare cases that it does make sense
> (perhaps because some upstream you don't control), some streaming API
> as Neil mentioned can make more sense.

Yeah, I can see that, and it doesn't really make much of a difference.
In the CL implementation, I sometimes pine for an array to be a list.
But it works through an api, mostly, so that detail is not normally
important. To implement bidirectional translation between native types
to and json types, though, it is something to consider. For instance,
if you use alists for objects, then it is sensible to pick vectors for
arrays, so that you can do simple type matching. With objects
represented as hash tables, list is available. Of course, types
specifically designed for json make this moot. (But a native
representation is always useful to have.)

>
> And speaking about this streaming interface: I had considered it too,
> but I think that for most uses it is an overkill.  The idea is
> basically a fully parameterized parser with used defined functions for
> the constructors -- and taking it further you get the functions to be
> more like continuations that can dictate how parsing proceeds, or even
> the ability to use different constructors at different levels etc.
> It's tempting in its generality, but I chose the simpler code since if
> the super-general thing becomes needed enough, it's easy to implement
> it and maintain the current functionality as the default case.

>
>
>> Anyway, I was hoping the authors or associates could comment on
>> these design decisions. A related topic is whether the approach of
>> the JSON module to allow specification of implementation for NULL,
>> for instance, could be extended to Objects and Arrays. On the other
>> hand, maybe it is better to fork a new JSON module with different
>> and specific implementation details, either for personal use or as
>> part of the standard library (it takes about 5 minutes to make the
>> necessary changes.)
>
> The reason for the parameterized null value was that the original code
> used the #\null character to represent nulls, which is something that
> I viewed as a bad type pun...  So I left in a parameter and an
> argument to make it easy to use it for easy porting if needed.

In my CL implementation, I opted for representing null, true, and
false as :NULL, :TRUE, and :FALSE, to avoid any conflation between
json and lisp. Is annoyingly easy to introduce nil (in CL) or null,
#f, #t from functions which naturally return these values, most often
from functions which return null or #f upon failure. Using explicit
symbols is rarely annoying, and never confusing.

>
> When I did that, I obviously thought about doing the same for lists
> and hashes (eg, it creates immutable hashes, and in some cases you'd
> want a mutable one) -- but parameterizing these makes the efficiency
> question be more important, since the code would be collecting lists
> of things just to send them to user-provided constructors.  So
> obviously my next thought is to use cons-like constructors, and
> instead of getting a head + rest thing make them get a head +
> callback-to-get-the-rest, and that leads to the same streaming thing.
> And at this point I stopped and went with the simpler-yet-practical-
> engouh thing.

Thanks for that. It keeps this module clean, small, coherent so that
even I can comprehend it!

Erik.

>
> --
>           ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                     http://barzilay.org/                   Maze is Life!

Posted on the users mailing list.