<div dir="ltr">That works. Thank you!</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jul 23, 2014 at 10:37 PM, Jon Zeppieri <span dir="ltr"><<a href="mailto:zeppieri@gmail.com" target="_blank">zeppieri@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On Thu, Jul 24, 2014 at 1:11 AM, Mangpo Phitchaya Phothilimthana<br>
<<a href="mailto:mangpo@eecs.berkeley.edu">mangpo@eecs.berkeley.edu</a>> wrote:<br>
> That's not ideal because if there is white space after BB#0:, it will match<br>
> COMMENT again. Is there a better way to do this?<br>
<br>
</div>Factor out the difference?<br>
<br>
(line-comment (re-: (re-& (re-: ";" (re-* (char-complement #\newline)))<br>
(complement (re-: block-comment any-string)))<br>
#\newline))<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
><br>
><br>
> On Wed, Jul 23, 2014 at 10:05 PM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com">zeppieri@gmail.com</a>> wrote:<br>
>><br>
>> Sorry, I sent that early by mistake. More below:<br>
>><br>
>> On Thu, Jul 24, 2014 at 1:02 AM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com">zeppieri@gmail.com</a>> wrote:<br>
>> > Your example string is "\n; BB#0;\n"<br>
>> > So, I'd expect the lexer to match:<br>
>> > - whitespace<br>
>> > - line-comment<br>
>> ><br>
>> > Yes, `block-comment` matches, but `line-comment'<br>
>><br>
>> ... gives the longer match, because it includes the newline at the<br>
>> end, whereas `block-comment` will not match that newline. Since the<br>
>> ending newline will be taken care of by the whitespace rule, perhaps<br>
>> you could simply remove the final newline from the `line-comment`<br>
>> definition? It will still match everything up to (but not including)<br>
>> the newline.<br>
>><br>
>> -Jon<br>
>><br>
>><br>
>><br>
>><br>
>> ><br>
>> > On Thu, Jul 24, 2014 at 12:46 AM, Mangpo Phitchaya Phothilimthana<br>
>> > <<a href="mailto:mangpo@eecs.berkeley.edu">mangpo@eecs.berkeley.edu</a>> wrote:<br>
>> >> Hi,<br>
>> >><br>
>> >> I try to write a lexer and parser, but I cannot figure out how to set<br>
>> >> priority to lexer's tokens. My simplified lexer (shown below) has only<br>
>> >> 2<br>
>> >> tokens BLOCK, and COMMENT. BLOCK is in fact a subset of COMMENT. BLOCK<br>
>> >> appears first in the lexer, but when I parse something that matches<br>
>> >> BLOCK,<br>
>> >> it always matches to COMMENT instead. Below is my program. In this<br>
>> >> particular example, I expect to get a BLOCK token, but I get COMMENT<br>
>> >> token<br>
>> >> instead. If I comment out (line-comment (token-COMMENT lexeme)) in the<br>
>> >> lexer, I then get the BLOCK token.<br>
>> >><br>
>> >> Can anyone tell me how to work around this issue? I can only find this<br>
>> >> in<br>
>> >> the documentation<br>
>> >> "When multiple patterns match, a lexer will choose the longest match,<br>
>> >> breaking ties in favor of the rule appearing first."<br>
>> >><br>
>> >> #lang racket<br>
>> >><br>
>> >> (require parser-tools/lex<br>
>> >> (prefix-in re- parser-tools/lex-sre)<br>
>> >> parser-tools/yacc)<br>
>> >><br>
>> >> (define-tokens a (BLOCK COMMENT))<br>
>> >> (define-empty-tokens b (EOF))<br>
>> >><br>
>> >> (define-lex-trans number<br>
>> >> (syntax-rules ()<br>
>> >> ((_ digit)<br>
>> >> (re-: (uinteger digit)<br>
>> >> (re-? (re-: "." (re-? (uinteger digit))))))))<br>
>> >><br>
>> >> (define-lex-trans uinteger<br>
>> >> (syntax-rules ()<br>
>> >> ((_ digit) (re-+ digit))))<br>
>> >><br>
>> >> (define-lex-abbrevs<br>
>> >> (block-comment (re-: "; BB#" number10 ":"))<br>
>> >> (line-comment (re-: ";" (re-* (char-complement #\newline))<br>
>> >> #\newline))<br>
>> >> (digit10 (char-range "0" "9"))<br>
>> >> (number10 (number digit10)))<br>
>> >><br>
>> >> (define my-lexer<br>
>> >> (lexer-src-pos<br>
>> >> (block-comment (token-BLOCK lexeme))<br>
>> >> (line-comment (token-COMMENT lexeme))<br>
>> >> (whitespace (position-token-token (my-lexer input-port)))<br>
>> >> ((eof) (token-EOF))))<br>
>> >><br>
>> >> (define my-parser<br>
>> >> (parser<br>
>> >> (start code)<br>
>> >> (end EOF)<br>
>> >> (error<br>
>> >> (lambda (tok-ok? tok-name tok-value start-pos end-pos)<br>
>> >> (raise-syntax-error 'parser<br>
>> >> (format "syntax error at '~a' in src l:~a c:~a"<br>
>> >> tok-name<br>
>> >> (position-line start-pos)<br>
>> >> (position-col start-pos)))))<br>
>> >> (tokens a b)<br>
>> >> (src-pos)<br>
>> >> (grammar<br>
>> >> (unit ((BLOCK) $1)<br>
>> >> ((COMMENT) $1))<br>
>> >> (code ((unit) (list $1))<br>
>> >> ((unit code) (cons $1 $2))))))<br>
>> >><br>
>> >> (define (lex-this lexer input)<br>
>> >> (lambda ()<br>
>> >> (let ([token (lexer input)])<br>
>> >> (pretty-display token)<br>
>> >> token)))<br>
>> >><br>
>> >> (define (ast-from-string s)<br>
>> >> (let ((input (open-input-string s)))<br>
>> >> (ast input)))<br>
>> >><br>
>> >> (define (ast input)<br>
>> >> (my-parser (lex-this my-lexer input)))<br>
>> >><br>
>> >> (ast-from-string "<br>
>> >> ; BB#0:<br>
>> >> ")<br>
>> >><br>
>> >> ____________________<br>
>> >> Racket Users list:<br>
>> >> <a href="http://lists.racket-lang.org/users" target="_blank">http://lists.racket-lang.org/users</a><br>
>> >><br>
><br>
><br>
</div></div></blockquote></div><br></div>