<div dir="ltr">Hi,<div><br></div><div>I try to write a lexer and parser, but I cannot figure out how to set priority to lexer's tokens. My simplified lexer (shown below) has only 2 tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK appears first in the lexer, but when I parse something that matches BLOCK, it always matches to COMMENT instead. Below is my program. In this particular example, I expect to get a BLOCK token, but I get COMMENT token instead. If I comment out (line-comment (token-COMMENT lexeme)) in the lexer, I then get the BLOCK token.</div>
<div><br></div><div>Can anyone tell me how to work around this issue? I can only find this in the documentation</div><div><span style="color:rgb(0,0,0);font-family:Charter,serif;font-size:18px;line-height:24.780000686645508px">"When multiple patterns match, a lexer will choose the longest match, breaking ties in favor of the rule appearing first."</span><div>
<br></div><div><div>#lang racket</div><div><br></div><div>(require parser-tools/lex</div><div> (prefix-in re- parser-tools/lex-sre)</div><div> parser-tools/yacc)</div><div><br></div><div>(define-tokens a (BLOCK COMMENT))</div>
<div>(define-empty-tokens b (EOF))</div><div><br></div><div>(define-lex-trans number</div><div> (syntax-rules ()</div><div> ((_ digit)</div><div> (re-: (uinteger digit)</div><div> (re-? (re-: "." (re-? (uinteger digit))))))))</div>
<div><br></div><div>(define-lex-trans uinteger</div><div> (syntax-rules ()</div><div> ((_ digit) (re-+ digit))))</div><div><br></div><div>(define-lex-abbrevs</div><div> (block-comment (re-: "; BB#" number10 ":"))</div>
<div> (line-comment (re-: ";" (re-* (char-complement #\newline)) #\newline))</div><div> (digit10 (char-range "0" "9"))</div><div> (number10 (number digit10)))</div><div><br></div><div>(define my-lexer</div>
<div> (lexer-src-pos</div><div> (block-comment (token-BLOCK lexeme))</div><div> (line-comment (token-COMMENT lexeme))</div><div> (whitespace (position-token-token (my-lexer input-port)))</div><div> ((eof) (token-EOF))))</div>
<div><br></div><div>(define my-parser</div><div> (parser</div><div> (start code)</div><div> (end EOF)</div><div> (error</div><div> (lambda (tok-ok? tok-name tok-value start-pos end-pos)</div><div> (raise-syntax-error 'parser</div>
<div><span class="" style="white-space:pre"> </span> (format "syntax error at '~a' in src l:~a c:~a"</div><div><span class="" style="white-space:pre"> </span> tok-name</div><div><span class="" style="white-space:pre"> </span> (position-line start-pos)</div>
<div><span class="" style="white-space:pre"> </span> (position-col start-pos)))))</div><div> (tokens a b)</div><div> (src-pos)</div><div> (grammar</div><div> (unit ((BLOCK) $1)</div><div> ((COMMENT) $1))</div>
<div> (code ((unit) (list $1))</div><div> ((unit code) (cons $1 $2))))))</div><div><br></div><div>(define (lex-this lexer input)</div><div> (lambda ()</div><div> (let ([token (lexer input)])</div><div> (pretty-display token)</div>
<div> token)))</div><div><br></div><div>(define (ast-from-string s)</div><div> (let ((input (open-input-string s)))</div><div> (ast input)))</div><div><br></div><div>(define (ast input)</div><div> (my-parser (lex-this my-lexer input)))</div>
<div><br></div><div>(ast-from-string "</div><div>; BB#0:</div><div>")</div></div></div></div>