[racket] lexer priority

From: Jon Zeppieri (zeppieri at gmail.com)
Date: Thu Jul 24 01:37:06 EDT 2014

On Thu, Jul 24, 2014 at 1:11 AM, Mangpo Phitchaya Phothilimthana
<mangpo at eecs.berkeley.edu> wrote:
> That's not ideal because if there is white space after BB#0:, it will match
> COMMENT again. Is there a better way to do this?

Factor out the difference?

(line-comment (re-: (re-& (re-: ";" (re-* (char-complement #\newline)))
                            (complement (re-: block-comment any-string)))
                       #\newline))


>
>
> On Wed, Jul 23, 2014 at 10:05 PM, Jon Zeppieri <zeppieri at gmail.com> wrote:
>>
>> Sorry, I sent that early by mistake. More below:
>>
>> On Thu, Jul 24, 2014 at 1:02 AM, Jon Zeppieri <zeppieri at gmail.com> wrote:
>> > Your example string is "\n; BB#0;\n"
>> > So, I'd expect the lexer to match:
>> > - whitespace
>> > - line-comment
>> >
>> > Yes, `block-comment` matches, but `line-comment'
>>
>> ... gives the longer match, because it includes the newline at the
>> end, whereas `block-comment` will not match that newline. Since the
>> ending newline will be taken care of by the whitespace rule, perhaps
>> you could simply remove the final newline from the `line-comment`
>> definition? It will still match everything up to (but not including)
>> the newline.
>>
>> -Jon
>>
>>
>>
>>
>> >
>> > On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana
>> > <mangpo at eecs.berkeley.edu> wrote:
>> >> Hi,
>> >>
>> >> I try to write a lexer and parser, but I cannot figure out how to set
>> >> priority to lexer's tokens. My simplified lexer (shown below) has only
>> >> 2
>> >> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK
>> >> appears first in the lexer, but when I parse something that matches
>> >> BLOCK,
>> >> it always matches to COMMENT instead. Below is my program. In this
>> >> particular example, I expect to get a BLOCK token, but I get COMMENT
>> >> token
>> >> instead. If I comment out  (line-comment (token-COMMENT lexeme)) in the
>> >> lexer, I then get the BLOCK token.
>> >>
>> >> Can anyone tell me how to work around this issue? I can only find this
>> >> in
>> >> the documentation
>> >> "When multiple patterns match, a lexer will choose the longest match,
>> >> breaking ties in favor of the rule appearing first."
>> >>
>> >> #lang racket
>> >>
>> >> (require parser-tools/lex
>> >>          (prefix-in re- parser-tools/lex-sre)
>> >>          parser-tools/yacc)
>> >>
>> >> (define-tokens a (BLOCK COMMENT))
>> >> (define-empty-tokens b (EOF))
>> >>
>> >> (define-lex-trans number
>> >>   (syntax-rules ()
>> >>     ((_ digit)
>> >>      (re-: (uinteger digit)
>> >>            (re-? (re-: "." (re-? (uinteger digit))))))))
>> >>
>> >> (define-lex-trans uinteger
>> >>   (syntax-rules ()
>> >>     ((_ digit) (re-+ digit))))
>> >>
>> >> (define-lex-abbrevs
>> >>   (block-comment (re-: "; BB#" number10 ":"))
>> >>   (line-comment (re-: ";" (re-* (char-complement #\newline))
>> >> #\newline))
>> >>   (digit10 (char-range "0" "9"))
>> >>   (number10 (number digit10)))
>> >>
>> >> (define my-lexer
>> >>   (lexer-src-pos
>> >>    (block-comment (token-BLOCK lexeme))
>> >>    (line-comment (token-COMMENT lexeme))
>> >>    (whitespace   (position-token-token (my-lexer input-port)))
>> >>    ((eof) (token-EOF))))
>> >>
>> >> (define my-parser
>> >>   (parser
>> >>    (start code)
>> >>    (end EOF)
>> >>    (error
>> >>     (lambda (tok-ok? tok-name tok-value start-pos end-pos)
>> >>       (raise-syntax-error 'parser
>> >>  (format "syntax error at '~a' in src l:~a c:~a"
>> >>  tok-name
>> >>  (position-line start-pos)
>> >>  (position-col start-pos)))))
>> >>    (tokens a b)
>> >>    (src-pos)
>> >>    (grammar
>> >>     (unit ((BLOCK) $1)
>> >>           ((COMMENT) $1))
>> >>     (code ((unit) (list $1))
>> >>           ((unit code) (cons $1 $2))))))
>> >>
>> >> (define (lex-this lexer input)
>> >>   (lambda ()
>> >>     (let ([token (lexer input)])
>> >>       (pretty-display token)
>> >>       token)))
>> >>
>> >> (define (ast-from-string s)
>> >>   (let ((input (open-input-string s)))
>> >>     (ast input)))
>> >>
>> >> (define (ast input)
>> >>   (my-parser (lex-this my-lexer input)))
>> >>
>> >> (ast-from-string "
>> >> ; BB#0:
>> >> ")
>> >>
>> >> ____________________
>> >>   Racket Users list:
>> >>   http://lists.racket-lang.org/users
>> >>
>
>

Posted on the users mailing list.