[racket] lexer priority

From: Mangpo Phitchaya Phothilimthana (mangpo at eecs.berkeley.edu)
Date: Thu Jul 24 01:11:43 EDT 2014

That's not ideal because if there is white space after BB#0:, it will match
COMMENT again. Is there a better way to do this?


On Wed, Jul 23, 2014 at 10:05 PM, Jon Zeppieri <zeppieri at gmail.com> wrote:

> Sorry, I sent that early by mistake. More below:
>
> On Thu, Jul 24, 2014 at 1:02 AM, Jon Zeppieri <zeppieri at gmail.com> wrote:
> > Your example string is "\n; BB#0;\n"
> > So, I'd expect the lexer to match:
> > - whitespace
> > - line-comment
> >
> > Yes, `block-comment` matches, but `line-comment'
>
> ... gives the longer match, because it includes the newline at the
> end, whereas `block-comment` will not match that newline. Since the
> ending newline will be taken care of by the whitespace rule, perhaps
> you could simply remove the final newline from the `line-comment`
> definition? It will still match everything up to (but not including)
> the newline.
>
> -Jon
>
>
>
>
> >
> > On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana
> > <mangpo at eecs.berkeley.edu> wrote:
> >> Hi,
> >>
> >> I try to write a lexer and parser, but I cannot figure out how to set
> >> priority to lexer's tokens. My simplified lexer (shown below) has only 2
> >> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK
> >> appears first in the lexer, but when I parse something that matches
> BLOCK,
> >> it always matches to COMMENT instead. Below is my program. In this
> >> particular example, I expect to get a BLOCK token, but I get COMMENT
> token
> >> instead. If I comment out  (line-comment (token-COMMENT lexeme)) in the
> >> lexer, I then get the BLOCK token.
> >>
> >> Can anyone tell me how to work around this issue? I can only find this
> in
> >> the documentation
> >> "When multiple patterns match, a lexer will choose the longest match,
> >> breaking ties in favor of the rule appearing first."
> >>
> >> #lang racket
> >>
> >> (require parser-tools/lex
> >>          (prefix-in re- parser-tools/lex-sre)
> >>          parser-tools/yacc)
> >>
> >> (define-tokens a (BLOCK COMMENT))
> >> (define-empty-tokens b (EOF))
> >>
> >> (define-lex-trans number
> >>   (syntax-rules ()
> >>     ((_ digit)
> >>      (re-: (uinteger digit)
> >>            (re-? (re-: "." (re-? (uinteger digit))))))))
> >>
> >> (define-lex-trans uinteger
> >>   (syntax-rules ()
> >>     ((_ digit) (re-+ digit))))
> >>
> >> (define-lex-abbrevs
> >>   (block-comment (re-: "; BB#" number10 ":"))
> >>   (line-comment (re-: ";" (re-* (char-complement #\newline)) #\newline))
> >>   (digit10 (char-range "0" "9"))
> >>   (number10 (number digit10)))
> >>
> >> (define my-lexer
> >>   (lexer-src-pos
> >>    (block-comment (token-BLOCK lexeme))
> >>    (line-comment (token-COMMENT lexeme))
> >>    (whitespace   (position-token-token (my-lexer input-port)))
> >>    ((eof) (token-EOF))))
> >>
> >> (define my-parser
> >>   (parser
> >>    (start code)
> >>    (end EOF)
> >>    (error
> >>     (lambda (tok-ok? tok-name tok-value start-pos end-pos)
> >>       (raise-syntax-error 'parser
> >>  (format "syntax error at '~a' in src l:~a c:~a"
> >>  tok-name
> >>  (position-line start-pos)
> >>  (position-col start-pos)))))
> >>    (tokens a b)
> >>    (src-pos)
> >>    (grammar
> >>     (unit ((BLOCK) $1)
> >>           ((COMMENT) $1))
> >>     (code ((unit) (list $1))
> >>           ((unit code) (cons $1 $2))))))
> >>
> >> (define (lex-this lexer input)
> >>   (lambda ()
> >>     (let ([token (lexer input)])
> >>       (pretty-display token)
> >>       token)))
> >>
> >> (define (ast-from-string s)
> >>   (let ((input (open-input-string s)))
> >>     (ast input)))
> >>
> >> (define (ast input)
> >>   (my-parser (lex-this my-lexer input)))
> >>
> >> (ast-from-string "
> >> ; BB#0:
> >> ")
> >>
> >> ____________________
> >>   Racket Users list:
> >>   http://lists.racket-lang.org/users
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20140723/11c88f8e/attachment.html>

Posted on the users mailing list.