[racket] lexer priority

From: Mangpo Phitchaya Phothilimthana (mangpo at eecs.berkeley.edu)
Date: Thu Jul 24 12:55:16 EDT 2014

That works. Thank you!


On Wed, Jul 23, 2014 at 10:37 PM, Jon Zeppieri <zeppieri at gmail.com> wrote:

> On Thu, Jul 24, 2014 at 1:11 AM, Mangpo Phitchaya Phothilimthana
> <mangpo at eecs.berkeley.edu> wrote:
> > That's not ideal because if there is white space after BB#0:, it will
> match
> > COMMENT again. Is there a better way to do this?
>
> Factor out the difference?
>
> (line-comment (re-: (re-& (re-: ";" (re-* (char-complement #\newline)))
>                             (complement (re-: block-comment any-string)))
>                        #\newline))
>
>
> >
> >
> > On Wed, Jul 23, 2014 at 10:05 PM, Jon Zeppieri <zeppieri at gmail.com>
> wrote:
> >>
> >> Sorry, I sent that early by mistake. More below:
> >>
> >> On Thu, Jul 24, 2014 at 1:02 AM, Jon Zeppieri <zeppieri at gmail.com>
> wrote:
> >> > Your example string is "\n; BB#0;\n"
> >> > So, I'd expect the lexer to match:
> >> > - whitespace
> >> > - line-comment
> >> >
> >> > Yes, `block-comment` matches, but `line-comment'
> >>
> >> ... gives the longer match, because it includes the newline at the
> >> end, whereas `block-comment` will not match that newline. Since the
> >> ending newline will be taken care of by the whitespace rule, perhaps
> >> you could simply remove the final newline from the `line-comment`
> >> definition? It will still match everything up to (but not including)
> >> the newline.
> >>
> >> -Jon
> >>
> >>
> >>
> >>
> >> >
> >> > On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana
> >> > <mangpo at eecs.berkeley.edu> wrote:
> >> >> Hi,
> >> >>
> >> >> I try to write a lexer and parser, but I cannot figure out how to set
> >> >> priority to lexer's tokens. My simplified lexer (shown below) has
> only
> >> >> 2
> >> >> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT.
> BLOCK
> >> >> appears first in the lexer, but when I parse something that matches
> >> >> BLOCK,
> >> >> it always matches to COMMENT instead. Below is my program. In this
> >> >> particular example, I expect to get a BLOCK token, but I get COMMENT
> >> >> token
> >> >> instead. If I comment out  (line-comment (token-COMMENT lexeme)) in
> the
> >> >> lexer, I then get the BLOCK token.
> >> >>
> >> >> Can anyone tell me how to work around this issue? I can only find
> this
> >> >> in
> >> >> the documentation
> >> >> "When multiple patterns match, a lexer will choose the longest match,
> >> >> breaking ties in favor of the rule appearing first."
> >> >>
> >> >> #lang racket
> >> >>
> >> >> (require parser-tools/lex
> >> >>          (prefix-in re- parser-tools/lex-sre)
> >> >>          parser-tools/yacc)
> >> >>
> >> >> (define-tokens a (BLOCK COMMENT))
> >> >> (define-empty-tokens b (EOF))
> >> >>
> >> >> (define-lex-trans number
> >> >>   (syntax-rules ()
> >> >>     ((_ digit)
> >> >>      (re-: (uinteger digit)
> >> >>            (re-? (re-: "." (re-? (uinteger digit))))))))
> >> >>
> >> >> (define-lex-trans uinteger
> >> >>   (syntax-rules ()
> >> >>     ((_ digit) (re-+ digit))))
> >> >>
> >> >> (define-lex-abbrevs
> >> >>   (block-comment (re-: "; BB#" number10 ":"))
> >> >>   (line-comment (re-: ";" (re-* (char-complement #\newline))
> >> >> #\newline))
> >> >>   (digit10 (char-range "0" "9"))
> >> >>   (number10 (number digit10)))
> >> >>
> >> >> (define my-lexer
> >> >>   (lexer-src-pos
> >> >>    (block-comment (token-BLOCK lexeme))
> >> >>    (line-comment (token-COMMENT lexeme))
> >> >>    (whitespace   (position-token-token (my-lexer input-port)))
> >> >>    ((eof) (token-EOF))))
> >> >>
> >> >> (define my-parser
> >> >>   (parser
> >> >>    (start code)
> >> >>    (end EOF)
> >> >>    (error
> >> >>     (lambda (tok-ok? tok-name tok-value start-pos end-pos)
> >> >>       (raise-syntax-error 'parser
> >> >>  (format "syntax error at '~a' in src l:~a c:~a"
> >> >>  tok-name
> >> >>  (position-line start-pos)
> >> >>  (position-col start-pos)))))
> >> >>    (tokens a b)
> >> >>    (src-pos)
> >> >>    (grammar
> >> >>     (unit ((BLOCK) $1)
> >> >>           ((COMMENT) $1))
> >> >>     (code ((unit) (list $1))
> >> >>           ((unit code) (cons $1 $2))))))
> >> >>
> >> >> (define (lex-this lexer input)
> >> >>   (lambda ()
> >> >>     (let ([token (lexer input)])
> >> >>       (pretty-display token)
> >> >>       token)))
> >> >>
> >> >> (define (ast-from-string s)
> >> >>   (let ((input (open-input-string s)))
> >> >>     (ast input)))
> >> >>
> >> >> (define (ast input)
> >> >>   (my-parser (lex-this my-lexer input)))
> >> >>
> >> >> (ast-from-string "
> >> >> ; BB#0:
> >> >> ")
> >> >>
> >> >> ____________________
> >> >>   Racket Users list:
> >> >>   http://lists.racket-lang.org/users
> >> >>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20140724/30fbc25f/attachment.html>

Posted on the users mailing list.