[racket] lexer priority

From: Jon Zeppieri (zeppieri at gmail.com)
Date: Thu Jul 24 01:02:03 EDT 2014

Your example string is "\n; BB#0;\n"
So, I'd expect the lexer to match:
- whitespace
- line-comment

Yes, `block-comment` matches, but `line-comment

On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana
<mangpo at eecs.berkeley.edu> wrote:
> Hi,
>
> I try to write a lexer and parser, but I cannot figure out how to set
> priority to lexer's tokens. My simplified lexer (shown below) has only 2
> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK
> appears first in the lexer, but when I parse something that matches BLOCK,
> it always matches to COMMENT instead. Below is my program. In this
> particular example, I expect to get a BLOCK token, but I get COMMENT token
> instead. If I comment out  (line-comment (token-COMMENT lexeme)) in the
> lexer, I then get the BLOCK token.
>
> Can anyone tell me how to work around this issue? I can only find this in
> the documentation
> "When multiple patterns match, a lexer will choose the longest match,
> breaking ties in favor of the rule appearing first."
>
> #lang racket
>
> (require parser-tools/lex
>          (prefix-in re- parser-tools/lex-sre)
>          parser-tools/yacc)
>
> (define-tokens a (BLOCK COMMENT))
> (define-empty-tokens b (EOF))
>
> (define-lex-trans number
>   (syntax-rules ()
>     ((_ digit)
>      (re-: (uinteger digit)
>            (re-? (re-: "." (re-? (uinteger digit))))))))
>
> (define-lex-trans uinteger
>   (syntax-rules ()
>     ((_ digit) (re-+ digit))))
>
> (define-lex-abbrevs
>   (block-comment (re-: "; BB#" number10 ":"))
>   (line-comment (re-: ";" (re-* (char-complement #\newline)) #\newline))
>   (digit10 (char-range "0" "9"))
>   (number10 (number digit10)))
>
> (define my-lexer
>   (lexer-src-pos
>    (block-comment (token-BLOCK lexeme))
>    (line-comment (token-COMMENT lexeme))
>    (whitespace   (position-token-token (my-lexer input-port)))
>    ((eof) (token-EOF))))
>
> (define my-parser
>   (parser
>    (start code)
>    (end EOF)
>    (error
>     (lambda (tok-ok? tok-name tok-value start-pos end-pos)
>       (raise-syntax-error 'parser
>  (format "syntax error at '~a' in src l:~a c:~a"
>  tok-name
>  (position-line start-pos)
>  (position-col start-pos)))))
>    (tokens a b)
>    (src-pos)
>    (grammar
>     (unit ((BLOCK) $1)
>           ((COMMENT) $1))
>     (code ((unit) (list $1))
>           ((unit code) (cons $1 $2))))))
>
> (define (lex-this lexer input)
>   (lambda ()
>     (let ([token (lexer input)])
>       (pretty-display token)
>       token)))
>
> (define (ast-from-string s)
>   (let ((input (open-input-string s)))
>     (ast input)))
>
> (define (ast input)
>   (my-parser (lex-this my-lexer input)))
>
> (ast-from-string "
> ; BB#0:
> ")
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>

Posted on the users mailing list.