[racket] lexer priority
Your example string is "\n; BB#0;\n"
So, I'd expect the lexer to match:
- whitespace
- line-comment
Yes, `block-comment` matches, but `line-comment
On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana
<mangpo at eecs.berkeley.edu> wrote:
> Hi,
>
> I try to write a lexer and parser, but I cannot figure out how to set
> priority to lexer's tokens. My simplified lexer (shown below) has only 2
> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK
> appears first in the lexer, but when I parse something that matches BLOCK,
> it always matches to COMMENT instead. Below is my program. In this
> particular example, I expect to get a BLOCK token, but I get COMMENT token
> instead. If I comment out (line-comment (token-COMMENT lexeme)) in the
> lexer, I then get the BLOCK token.
>
> Can anyone tell me how to work around this issue? I can only find this in
> the documentation
> "When multiple patterns match, a lexer will choose the longest match,
> breaking ties in favor of the rule appearing first."
>
> #lang racket
>
> (require parser-tools/lex
> (prefix-in re- parser-tools/lex-sre)
> parser-tools/yacc)
>
> (define-tokens a (BLOCK COMMENT))
> (define-empty-tokens b (EOF))
>
> (define-lex-trans number
> (syntax-rules ()
> ((_ digit)
> (re-: (uinteger digit)
> (re-? (re-: "." (re-? (uinteger digit))))))))
>
> (define-lex-trans uinteger
> (syntax-rules ()
> ((_ digit) (re-+ digit))))
>
> (define-lex-abbrevs
> (block-comment (re-: "; BB#" number10 ":"))
> (line-comment (re-: ";" (re-* (char-complement #\newline)) #\newline))
> (digit10 (char-range "0" "9"))
> (number10 (number digit10)))
>
> (define my-lexer
> (lexer-src-pos
> (block-comment (token-BLOCK lexeme))
> (line-comment (token-COMMENT lexeme))
> (whitespace (position-token-token (my-lexer input-port)))
> ((eof) (token-EOF))))
>
> (define my-parser
> (parser
> (start code)
> (end EOF)
> (error
> (lambda (tok-ok? tok-name tok-value start-pos end-pos)
> (raise-syntax-error 'parser
> (format "syntax error at '~a' in src l:~a c:~a"
> tok-name
> (position-line start-pos)
> (position-col start-pos)))))
> (tokens a b)
> (src-pos)
> (grammar
> (unit ((BLOCK) $1)
> ((COMMENT) $1))
> (code ((unit) (list $1))
> ((unit code) (cons $1 $2))))))
>
> (define (lex-this lexer input)
> (lambda ()
> (let ([token (lexer input)])
> (pretty-display token)
> token)))
>
> (define (ast-from-string s)
> (let ((input (open-input-string s)))
> (ast input)))
>
> (define (ast input)
> (my-parser (lex-this my-lexer input)))
>
> (ast-from-string "
> ; BB#0:
> ")
>
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
>