[racket] lexer priority

From: Mangpo Phitchaya Phothilimthana (mangpo at eecs.berkeley.edu)
Date: Thu Jul 24 00:46:56 EDT 2014

Hi,

I try to write a lexer and parser, but I cannot figure out how to set
priority to lexer's tokens. My simplified lexer (shown below) has only 2
tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK
appears first in the lexer, but when I parse something that matches BLOCK,
it always matches to COMMENT instead. Below is my program. In this
particular example, I expect to get a BLOCK token, but I get COMMENT token
instead. If I comment out  (line-comment (token-COMMENT lexeme)) in the
lexer, I then get the BLOCK token.

Can anyone tell me how to work around this issue? I can only find this in
the documentation
"When multiple patterns match, a lexer will choose the longest match,
breaking ties in favor of the rule appearing first."

#lang racket

(require parser-tools/lex
         (prefix-in re- parser-tools/lex-sre)
         parser-tools/yacc)

(define-tokens a (BLOCK COMMENT))
(define-empty-tokens b (EOF))

(define-lex-trans number
  (syntax-rules ()
    ((_ digit)
     (re-: (uinteger digit)
           (re-? (re-: "." (re-? (uinteger digit))))))))

(define-lex-trans uinteger
  (syntax-rules ()
    ((_ digit) (re-+ digit))))

(define-lex-abbrevs
  (block-comment (re-: "; BB#" number10 ":"))
  (line-comment (re-: ";" (re-* (char-complement #\newline)) #\newline))
  (digit10 (char-range "0" "9"))
  (number10 (number digit10)))

(define my-lexer
  (lexer-src-pos
   (block-comment (token-BLOCK lexeme))
   (line-comment (token-COMMENT lexeme))
   (whitespace   (position-token-token (my-lexer input-port)))
   ((eof) (token-EOF))))

(define my-parser
  (parser
   (start code)
   (end EOF)
   (error
    (lambda (tok-ok? tok-name tok-value start-pos end-pos)
      (raise-syntax-error 'parser
  (format "syntax error at '~a' in src l:~a c:~a"
  tok-name
  (position-line start-pos)
  (position-col start-pos)))))
   (tokens a b)
   (src-pos)
   (grammar
    (unit ((BLOCK) $1)
          ((COMMENT) $1))
    (code ((unit) (list $1))
          ((unit code) (cons $1 $2))))))

(define (lex-this lexer input)
  (lambda ()
    (let ([token (lexer input)])
      (pretty-display token)
      token)))

(define (ast-from-string s)
  (let ((input (open-input-string s)))
    (ast input)))

(define (ast input)
  (my-parser (lex-this my-lexer input)))

(ast-from-string "
; BB#0:
")
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20140723/e9727da8/attachment.html>

Posted on the users mailing list.