<div dir="ltr">That works. Thank you!</div><div class="gmail_extra"> <div class="gmail_quote">On Wed, Jul 23, 2014 at 10:37 PM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com" target="_blank">zeppieri@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On Thu, Jul 24, 2014 at 1:11 AM, Mangpo Phitchaya Phothilimthana <<a href="mailto:mangpo@eecs.berkeley.edu">mangpo@eecs.berkeley.edu</a>> wrote: > That's not ideal because if there is white space after BB#0:, it will match > COMMENT again. Is there a better way to do this? </div>Factor out the difference? (line-comment (re-: (re-& (re-: ";" (re-* (char-complement #\newline))) (complement (re-: block-comment any-string))) #\newline)) <div class="HOEnZb"><div class="h5"> > > > On Wed, Jul 23, 2014 at 10:05 PM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com">zeppieri@gmail.com</a>> wrote: >> >> Sorry, I sent that early by mistake. More below: >> >> On Thu, Jul 24, 2014 at 1:02 AM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com">zeppieri@gmail.com</a>> wrote: >> > Your example string is "\n; BB#0;\n" >> > So, I'd expect the lexer to match: >> > - whitespace >> > - line-comment >> > >> > Yes, `block-comment` matches, but `line-comment' >> >> ... gives the longer match, because it includes the newline at the >> end, whereas `block-comment` will not match that newline. Since the >> ending newline will be taken care of by the whitespace rule, perhaps >> you could simply remove the final newline from the `line-comment` >> definition? It will still match everything up to (but not including) >> the newline. >> >> -Jon >> >> >> >> >> > >> > On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana >> > <<a href="mailto:mangpo@eecs.berkeley.edu">mangpo@eecs.berkeley.edu</a>> wrote: >> >> Hi, >> >> >> >> I try to write a lexer and parser, but I cannot figure out how to set >> >> priority to lexer's tokens. My simplified lexer (shown below) has only >> >> 2 >> >> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK >> >> appears first in the lexer, but when I parse something that matches >> >> BLOCK, >> >> it always matches to COMMENT instead. Below is my program. In this >> >> particular example, I expect to get a BLOCK token, but I get COMMENT >> >> token >> >> instead. If I comment out (line-comment (token-COMMENT lexeme)) in the >> >> lexer, I then get the BLOCK token. >> >> >> >> Can anyone tell me how to work around this issue? I can only find this >> >> in >> >> the documentation >> >> "When multiple patterns match, a lexer will choose the longest match, >> >> breaking ties in favor of the rule appearing first." >> >> >> >> #lang racket >> >> >> >> (require parser-tools/lex >> >> (prefix-in re- parser-tools/lex-sre) >> >> parser-tools/yacc) >> >> >> >> (define-tokens a (BLOCK COMMENT)) >> >> (define-empty-tokens b (EOF)) >> >> >> >> (define-lex-trans number >> >> (syntax-rules () >> >> ((_ digit) >> >> (re-: (uinteger digit) >> >> (re-? (re-: "." (re-? (uinteger digit)))))))) >> >> >> >> (define-lex-trans uinteger >> >> (syntax-rules () >> >> ((_ digit) (re-+ digit)))) >> >> >> >> (define-lex-abbrevs >> >> (block-comment (re-: "; BB#" number10 ":")) >> >> (line-comment (re-: ";" (re-* (char-complement #\newline)) >> >> #\newline)) >> >> (digit10 (char-range "0" "9")) >> >> (number10 (number digit10))) >> >> >> >> (define my-lexer >> >> (lexer-src-pos >> >> (block-comment (token-BLOCK lexeme)) >> >> (line-comment (token-COMMENT lexeme)) >> >> (whitespace (position-token-token (my-lexer input-port))) >> >> ((eof) (token-EOF)))) >> >> >> >> (define my-parser >> >> (parser >> >> (start code) >> >> (end EOF) >> >> (error >> >> (lambda (tok-ok? tok-name tok-value start-pos end-pos) >> >> (raise-syntax-error 'parser >> >> (format "syntax error at '~a' in src l:~a c:~a" >> >> tok-name >> >> (position-line start-pos) >> >> (position-col start-pos))))) >> >> (tokens a b) >> >> (src-pos) >> >> (grammar >> >> (unit ((BLOCK) $1) >> >> ((COMMENT) $1)) >> >> (code ((unit) (list $1)) >> >> ((unit code) (cons $1 $2)))))) >> >> >> >> (define (lex-this lexer input) >> >> (lambda () >> >> (let ([token (lexer input)]) >> >> (pretty-display token) >> >> token))) >> >> >> >> (define (ast-from-string s) >> >> (let ((input (open-input-string s))) >> >> (ast input))) >> >> >> >> (define (ast input) >> >> (my-parser (lex-this my-lexer input))) >> >> >> >> (ast-from-string " >> >> ; BB#0: >> >> ") >> >> >> >> ____________________ >> >> Racket Users list: >> >> <a href="http://lists.racket-lang.org/users" target="_blank">http://lists.racket-lang.org/users</a> >> >> > > </div></div></blockquote></div> </div>