[racket] lexer priority
Sorry, I sent that early by mistake. More below:
On Thu, Jul 24, 2014 at 1:02 AM, Jon Zeppieri <zeppieri at gmail.com> wrote:
> Your example string is "\n; BB#0;\n"
> So, I'd expect the lexer to match:
> - whitespace
> - line-comment
>
> Yes, `block-comment` matches, but `line-comment'
... gives the longer match, because it includes the newline at the
end, whereas `block-comment` will not match that newline. Since the
ending newline will be taken care of by the whitespace rule, perhaps
you could simply remove the final newline from the `line-comment`
definition? It will still match everything up to (but not including)
the newline.
-Jon
>
> On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana
> <mangpo at eecs.berkeley.edu> wrote:
>> Hi,
>>
>> I try to write a lexer and parser, but I cannot figure out how to set
>> priority to lexer's tokens. My simplified lexer (shown below) has only 2
>> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK
>> appears first in the lexer, but when I parse something that matches BLOCK,
>> it always matches to COMMENT instead. Below is my program. In this
>> particular example, I expect to get a BLOCK token, but I get COMMENT token
>> instead. If I comment out (line-comment (token-COMMENT lexeme)) in the
>> lexer, I then get the BLOCK token.
>>
>> Can anyone tell me how to work around this issue? I can only find this in
>> the documentation
>> "When multiple patterns match, a lexer will choose the longest match,
>> breaking ties in favor of the rule appearing first."
>>
>> #lang racket
>>
>> (require parser-tools/lex
>> (prefix-in re- parser-tools/lex-sre)
>> parser-tools/yacc)
>>
>> (define-tokens a (BLOCK COMMENT))
>> (define-empty-tokens b (EOF))
>>
>> (define-lex-trans number
>> (syntax-rules ()
>> ((_ digit)
>> (re-: (uinteger digit)
>> (re-? (re-: "." (re-? (uinteger digit))))))))
>>
>> (define-lex-trans uinteger
>> (syntax-rules ()
>> ((_ digit) (re-+ digit))))
>>
>> (define-lex-abbrevs
>> (block-comment (re-: "; BB#" number10 ":"))
>> (line-comment (re-: ";" (re-* (char-complement #\newline)) #\newline))
>> (digit10 (char-range "0" "9"))
>> (number10 (number digit10)))
>>
>> (define my-lexer
>> (lexer-src-pos
>> (block-comment (token-BLOCK lexeme))
>> (line-comment (token-COMMENT lexeme))
>> (whitespace (position-token-token (my-lexer input-port)))
>> ((eof) (token-EOF))))
>>
>> (define my-parser
>> (parser
>> (start code)
>> (end EOF)
>> (error
>> (lambda (tok-ok? tok-name tok-value start-pos end-pos)
>> (raise-syntax-error 'parser
>> (format "syntax error at '~a' in src l:~a c:~a"
>> tok-name
>> (position-line start-pos)
>> (position-col start-pos)))))
>> (tokens a b)
>> (src-pos)
>> (grammar
>> (unit ((BLOCK) $1)
>> ((COMMENT) $1))
>> (code ((unit) (list $1))
>> ((unit code) (cons $1 $2))))))
>>
>> (define (lex-this lexer input)
>> (lambda ()
>> (let ([token (lexer input)])
>> (pretty-display token)
>> token)))
>>
>> (define (ast-from-string s)
>> (let ((input (open-input-string s)))
>> (ast input)))
>>
>> (define (ast input)
>> (my-parser (lex-this my-lexer input)))
>>
>> (ast-from-string "
>> ; BB#0:
>> ")
>>
>> ____________________
>> Racket Users list:
>> http://lists.racket-lang.org/users
>>