[plt-scheme] parser-tools basics

From: Richard Cobbe (cobbe at ccs.neu.edu)
Date: Fri Oct 21 09:54:22 EDT 2005

Reposting to list rather than just Dave this time....

On Thu, Oct 20, 2005 at 09:05:02PM -0400, Dave Herman wrote:
> I've started writing a tiny little parser and it's resulting in an 
> inexplicable error. Can someone smack me upside the head and tell me 
> what I'm doing wrong?
> I've included the source below... the lexer seems to correctly lex the 
> token "foo", the parser correctly recognizes it as a TOKEN token, and 
> then it raises an error. I must be missing something obvious.

No, actually, you're missing something fairly subtle.  :-)

Your scanner definition is fine, so I'll snip it.

>   (define cookie-parser
>     (parser
>      (start Token)
>      (end NEWLINE EOF)
>      (tokens Operators ValueTokens)
>      (error (lambda (token-ok? token-name token-value start-pos end-pos)
>               (error (string->immutable-string
>                       (format "error: (token-~a ~v) [~a, ~a)"
>                               token-name
>                               token-value
>                               (position-offset start-pos)
>                               (position-offset end-pos))))))
>      (src-pos)
>      (grammar
>        (Token
>         [(TOKEN) $1]))))

This line is the problem:

>   (cookie-parser (lambda () (cookie-lexer (open-input-string "foo"))))

Here's the correct line:

    (cookie-parser (let ([p (open-input-string "foo")])
                     (lambda () (cookie-lexer p))))

Your parser expects the start symbol Token to be followed by either
NEWLINE or EOF, and the parser actually checks this by calling the
scanner thunk (the thunk you pass to cookie-parser) again after the
start symbol finishes.

The problem is, the parser library expects the internal state of the
input port to keep track of the next character to be read.  Your scanner
thunk, though, creates a new port each time you call it, so the parser
sees what appears to be an infinite stream of "foo"s.  If you create the
port just once, then the scanner and parser see EOF where you expect.

This would have been made a whole lot easier if the parser created a
decent error message.  And in this case, it knows exactly what tokens it
expects, so that's a reasonable request.


Posted on the users mailing list.