[racket] lexer priority

From: Jon Zeppieri (zeppieri at gmail.com)
Date: Thu Jul 24 01:05:10 EDT 2014

Sorry, I sent that early by mistake. More below:

On Thu, Jul 24, 2014 at 1:02 AM, Jon Zeppieri <zeppieri at gmail.com> wrote:
> Your example string is "\n; BB#0;\n"
> So, I'd expect the lexer to match:
> - whitespace
> - line-comment
>
> Yes, `block-comment` matches, but `line-comment'

... gives the longer match, because it includes the newline at the
end, whereas `block-comment` will not match that newline. Since the
ending newline will be taken care of by the whitespace rule, perhaps
you could simply remove the final newline from the `line-comment`
definition? It will still match everything up to (but not including)
the newline.

-Jon




>
> On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana
> <mangpo at eecs.berkeley.edu> wrote:
>> Hi,
>>
>> I try to write a lexer and parser, but I cannot figure out how to set
>> priority to lexer's tokens. My simplified lexer (shown below) has only 2
>> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK
>> appears first in the lexer, but when I parse something that matches BLOCK,
>> it always matches to COMMENT instead. Below is my program. In this
>> particular example, I expect to get a BLOCK token, but I get COMMENT token
>> instead. If I comment out  (line-comment (token-COMMENT lexeme)) in the
>> lexer, I then get the BLOCK token.
>>
>> Can anyone tell me how to work around this issue? I can only find this in
>> the documentation
>> "When multiple patterns match, a lexer will choose the longest match,
>> breaking ties in favor of the rule appearing first."
>>
>> #lang racket
>>
>> (require parser-tools/lex
>>          (prefix-in re- parser-tools/lex-sre)
>>          parser-tools/yacc)
>>
>> (define-tokens a (BLOCK COMMENT))
>> (define-empty-tokens b (EOF))
>>
>> (define-lex-trans number
>>   (syntax-rules ()
>>     ((_ digit)
>>      (re-: (uinteger digit)
>>            (re-? (re-: "." (re-? (uinteger digit))))))))
>>
>> (define-lex-trans uinteger
>>   (syntax-rules ()
>>     ((_ digit) (re-+ digit))))
>>
>> (define-lex-abbrevs
>>   (block-comment (re-: "; BB#" number10 ":"))
>>   (line-comment (re-: ";" (re-* (char-complement #\newline)) #\newline))
>>   (digit10 (char-range "0" "9"))
>>   (number10 (number digit10)))
>>
>> (define my-lexer
>>   (lexer-src-pos
>>    (block-comment (token-BLOCK lexeme))
>>    (line-comment (token-COMMENT lexeme))
>>    (whitespace   (position-token-token (my-lexer input-port)))
>>    ((eof) (token-EOF))))
>>
>> (define my-parser
>>   (parser
>>    (start code)
>>    (end EOF)
>>    (error
>>     (lambda (tok-ok? tok-name tok-value start-pos end-pos)
>>       (raise-syntax-error 'parser
>>  (format "syntax error at '~a' in src l:~a c:~a"
>>  tok-name
>>  (position-line start-pos)
>>  (position-col start-pos)))))
>>    (tokens a b)
>>    (src-pos)
>>    (grammar
>>     (unit ((BLOCK) $1)
>>           ((COMMENT) $1))
>>     (code ((unit) (list $1))
>>           ((unit code) (cons $1 $2))))))
>>
>> (define (lex-this lexer input)
>>   (lambda ()
>>     (let ([token (lexer input)])
>>       (pretty-display token)
>>       token)))
>>
>> (define (ast-from-string s)
>>   (let ((input (open-input-string s)))
>>     (ast input)))
>>
>> (define (ast input)
>>   (my-parser (lex-this my-lexer input)))
>>
>> (ast-from-string "
>> ; BB#0:
>> ")
>>
>> ____________________
>>   Racket Users list:
>>   http://lists.racket-lang.org/users
>>

Posted on the users mailing list.