[plt-scheme] Re: creating new readers

From: Jon Rafkind (workmin at ccs.neu.edu)
Date: Tue Oct 28 00:13:55 EDT 2008

>> Any chance you could enlighten the rest of us?
>>
>>   
> Heh, I was pretty sure someone would want to see the solution. I won't 
> get to it tonight, but I'll post an example tommorow morning or so.
I attempted to create a new language in mzscheme through the #lang 
interface and this is a summary of my experience.

New languages seem to consist of 2 things: a reader and a base language. 
Of course that is not strictly true, you can use mzscheme's reader and a 
new base language or your own reader and mzscheme's base language but I 
was trying not to re-use mzscheme.

The directory structure should be something like this:
  newlang/
  * lang/
  * * reader.ss
  * main.ss

Where reader.ss is your reader and main.ss is the base language. The 
reader should export two functions: `read' and `read-syntax'. `read' 
returns a datum while `read-syntax' returns a syntax object. I'm not 
sure what `read' is supposed to do other than just convert the result of 
`read-syntax' to a datum so thats all I had it doing.

(define (read port)
  (syntax->datum (read-syntax #f port)))

So then I defined `read-syntax' to invoke my custom parser and wrap the 
resulting syntax object in a module expression.

(define (read-syntax name port)
   (let* ((p-name (object-name port))
          (name (if (path? p-name)
                  (let-values (((base name dir?) (split-path p-name)))
                    (string->symbol (path->string (path-replace-suffix 
name #""))))
                  'page)))
     #`(module #,name newlang #,(custom-parser))))

The `name' object will be the name of the file that has the #lang in it, 
otherwise it will be 'page if there is no name for the current file such 
as when editing a file in drscheme without having saved it.

The full reader so far is

--
#lang scheme/base

(provide (rename-out (my-read read)
                     (my-read-syntax read-syntax)))

(define (my-read port)
  (syntax->datum (my-read-syntax #f port)))

(define (my-read-syntax name port)
  (let* ((p-name (object-name port))
         (name (if (path? p-name)
                 (let-values (((base name dir?) (split-path p-name)))
                   (string->symbol (path->string (path-replace-suffix 
name #""))))
                 'page)))
    #`(module #,name scheme #,(foo))))

(define (foo)
  (let ((x '(begin
              (define q 1)
              (+ q 2))))
    (datum->syntax #f x #f)))
--

As was mentioned on the mailing list, I was using #'(begin (define q 1) 
(+ q 2)) in (foo) which would not produce the right syntax location 
because that syntax object would use mzscheme's reader and the source 
locations would have reader.ss as the source module.

There is already an extensible reader module I can use so that I don't 
have to wrap the code in a module myself. The library does some other 
fancy things which I don't fully understand, suffice to say that its 
"good" to use it. That module is syntax/module-reader. I think it must 
be used with the old (module ...) code instead of #lang.

--
(module reader syntax/module-reader
        mk

#:read my-read
#:read-syntax my-read-syntax
#:whole-body-readers? #t

(define (my-read port)
  (syntax->datum (my-read-syntax #f port)))

(define (my-read-syntax name port)
  (list (foo)))

(define (foo)
  (let ((x '(begin
              (foobar q 1)
              (+ q 2))))
    (datum->syntax #f x #f)))
)
--

This is much simpler because syntax/module-reader will figure out the 
module name for me. I think the reason that #:read works is because 
syntax/module-reader's #%module-begin does something which passes those 
parameters to an internal function, but I could be wrong. Anyway if 
#:read is given then so much #:read-syntax. Those just name the provided 
functions to use. #:whole-body-readers? tells the reader that 
read-syntax will return a list of expressions. If #:whole-body-readers? 
is #f (it defaults to #f) then the reader will keep calling read-syntax 
until eof is returned. If you have a custom parser that does not return 
eof then #:whole-body-readers? is good to use.

Now you can write a file that has #lang newlang in it. Well almost, 
first the new language directory has to be in the collects path. This 
defaults to ~/.plt-scheme/<version>/collects and <plt-dir>/collects. If 
you are just developing then it is useful to add your directory to the 
plt collects path.

mylang $ export PLTCOLLECTS=$PLTCOLLECTS:`pwd`/..
mylang $ mzscheme x.ss
OR
mylang $ ln -s . mylang
mylang $ export PLTCOLLECTS=$PLTCOLLECTS:.
mylang $ mzscheme x.ss

It is important to add $PLTCOLLECTS as part of the path because of the 
way that PLTCOLLECTS is parsed. If you just do
$ export PLTCOLLECTS=`pwd`/..

Then the default places to look (mentioned above) *will not* be searched.



Posted on the users mailing list.