[plt-scheme] question about symbol data type implementation in mzscheme

From: Matt Dawson (mdawson at mninter.net)
Date: Sat Nov 20 18:31:57 EST 2004

I have a very broad question related to the implementation of the symbol
type in the inards of scheme interpreters and compilers. This may seem
way off topic and I apologize for that. It seemed to me that the
participants in this mailing list would have the answers I'm looking
for.

Here is the problem background.

I'm developing an industrial automation protcol for IEEE
1394.(Firewire). My company makes "sensors" for the electronics assembly
industry. We have a new project where we want to use1394. Unfortunately
we couldn't find a suitable high level automation protcol for 1394 so we
are rolling our own. I had been playing around with mzscheme for about 2
years. I started thinking about how nice it would be to just send a
scheme expressions to a device and have the device evaluate it. A
"command" to the device would be a scheme expression. The "response"
would be result produced by evaluation. I have succeed in putting
together a workable protocol based on this idea. The language has a very
small number of forms stolen from the mzscheme OOP model. Objects can be
created using the (new ..) form. Methods can be invoked using the (send
..) form. Variables are created using a (define ...) form. The language
does not support functions of any kind. The protocol is not tightly
coupled to mzscheme or any other functional language. I'm planning on
using C++ to implement both the sensor firmware and the host protocol
software. 

Here is a simple example to give you the flavor of the language.

(send sensor serial-number) 

There is a predefined variable "sensor" whose value refers to the root
"sensor" object. The "serial-number" method is used to retrieve the
device's serial-number. There are other objects associated with the
sensor object. There are methods that allow you to navigate from the
sensor object to these other objects.

Here is the problem:

Everything was going great until I started really thinking about the
symbol data type. My original idea was just to have a pre-defined
code-word for every symbol need in the interface. For example the symbol
"red" might get assigned the code-word 105. This encoding would be
hard-coded into both the sensor firmware and the host software that
talks to the sensor. After a while I realized that this approach did not
scale well. There ends up being a large number of symbols. There has to
be as symbol for every class-name, method-name, and enumeration value.
Managing the encodings for a single type of device would be tedious. If
I wanted to add a second type of device it would be a nightmare. The
symbol encoding would have to be managed globally across the entire set
of device classes. 

I'm looking for a solution to this problem. One approach I've been
thinking about is to have dynamicly assigned code words. There would
have to be an initial negotiation between the host computer and the
device to come up with a common set of symbol encodings. For example the
host computer could ask the decice the provide all of the ascii
representation of all of the symbols needed to communicate and also the
code-word corresponding to each one. Suppose the host was talking to two
different devices simultaneously. The first device might use the
code-word 105 for the symbol "red" while the second would use the
code-word 432. I'm not very excited about this solution. If the hsot
computer were talking to 23 different devices simultaneously it would
have to maintain 23 different symbol mappings.

This same problem must come up in the implementation of scheme
interpreters and compilers. I don't really know but my guess is that
inside mzscheme at runtime there is a single instance of a symbol like
"red" and that a symbol value in a list like (1 red 17) is just a
pointer back to this instance. To compare two symbols it is just
necessary to compare the pointer values. Now according to the
documentation "MzScheme's units are ... separately compilable and
reusable components". How is a symbol like "red" represented in a
separately compiled unit. How is this encoding mapped back to a run-time
symbol value when the unit is loaded.

I would appreciate advice or even just an article reference that would
help me solve this problem in an elegant manner. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20041120/2e6dbabf/attachment.html>

Posted on the users mailing list.