[plt-scheme] question about symbol data type implementation in mzscheme

From: Joe Marshall (jrm at ccs.neu.edu)
Date: Mon Nov 22 10:23:09 EST 2004

"Matt Dawson" <mdawson at mninter.net> writes:

> I have a very broad question related to the implementation of the symbol
> type in the inards of scheme interpreters and compilers. 

> Here is the problem:
> Everything was going great until I started really thinking about the
> symbol data type. My original idea was just to have a pre-defined
> code-word for every symbol need in the interface. For example the symbol
> "red" might get assigned the code-word 105. This encoding would be
> hard-coded into both the sensor firmware and the host software that
> talks to the sensor. After a while I realized that this approach did not
> scale well. There ends up being a large number of symbols. There has to
> be as symbol for every class-name, method-name, and enumeration value.
> Managing the encodings for a single type of device would be tedious. If
> I wanted to add a second type of device it would be a nightmare. The
> symbol encoding would have to be managed globally across the entire set
> of device classes. 
> I'm looking for a solution to this problem. One approach I've been
> thinking about is to have dynamicly assigned code words. There would
> have to be an initial negotiation between the host computer and the
> device to come up with a common set of symbol encodings. For example the
> host computer could ask the decice the provide all of the ascii
> representation of all of the symbols needed to communicate and also the
> code-word corresponding to each one. Suppose the host was talking to two
> different devices simultaneously. The first device might use the
> code-word 105 for the symbol "red" while the second would use the
> code-word 432. I'm not very excited about this solution. If the hsot
> computer were talking to 23 different devices simultaneously it would
> have to maintain 23 different symbol mappings.
> This same problem must come up in the implementation of scheme
> interpreters and compilers. 

Yes, it does.

> I don't really know but my guess is that
> inside mzscheme at runtime there is a single instance of a symbol like
> "red" and that a symbol value in a list like (1 red 17) is just a
> pointer back to this instance. To compare two symbols it is just
> necessary to compare the pointer values. Now according to the
> documentation "MzScheme's units are ... separately compilable and
> reusable components". How is a symbol like "red" represented in a
> separately compiled unit. How is this encoding mapped back to a run-time
> symbol value when the unit is loaded.

In most Scheme or Lisp systems, there is a table that holds the unique
instance of each symbol (in Common Lisp, there are multiple tables).
In a separate compilation unit (an as-yet-to-be-loaded file), a symbol
will be referred to by its literal name.  When the file is loaded, the
literal name is looked up and the reference is patched up to refer to
the unique symbol in the table.  This may be done by the Scheme or
Lisp reader if you are loading source code, or it may be done via some
other mechanism if you have some sort of separately compiled unit.

> I would appreciate advice or even just an article reference that would
> help me solve this problem in an elegant manner. 

I can think of a number of approaches, but they all come down to the
same thing you mentioned:  some negotiation between the two devices.

But do you really need to do this?  By creating a canonical symbol in
the host machine, you are really assigning a unique address to the
symbol.  In other words, every (non-numeric) object in the Scheme
world has a 4-byte address that distinguishes it from all the others.
When you transfer a symbol by `code-word' you need to send a unique
value *in addition to* something that indicates that you are sending a
code word rather than, say, an integer.  At the very least, you're
talking 3 or more bytes.  You won't save much over sending the literal
ascii if your symbols are things like "red".  Firewire is pretty fast,
so you may spend more time encoding and decoding than you would gain
from sending fewer bytes --- your throughput might actually decrease.
In addition, if you spy on the packets as they go by, you'll be able
to read them.  This can help with debugging.

If it does make sense to encode the symbols, then either the device
will need to remember the agreed-upon code (it will need RAM), or the
host will have to remember the device's encoding.  In either case,
you'd want to have a special packet type that says `What is the code
for "red"?' or `What does code 2392 mean?'  If the sensor has only
firmware, then there is little choice but for the host to keep
multiple tables.

Posted on the users mailing list.