I've been playing around with parser-tools and am having difficulty expressing the following language:<div><br></div><div>"remember <alias> is <email>"</div><div>"remember <fact>"</div>
<div><br></div><div>where <alias> is any string that does not contain the word 'is', <email> is a well-formed email address and <fact> is any string that does not match the previous constraints.</div>
<div><br></div><div>Here's (stripped down) version of what I have so far:</div><div><div><div><font class="Apple-style-span" face="'courier new', monospace">#lang racket</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace">(require parser-tools/lex</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> parser-tools/yacc</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> (prefix-in : parser-tools/lex-sre))</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">(define-lex-abbrevs</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (atext (:+ (:or alphabetic (:/ #\0 #\9) (char-set "!#$%&'*+-/=?^_`{|}~"))))</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> (dot-atom (:: atext (:* #\. atext))))</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">(define-tokens toy-tokens (addr-spec alias fact))</font></div><div><font class="Apple-style-span" face="'courier new', monospace">(define-empty-tokens empty-toy-tokens (eof REMEMBER IS))</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">(define toy-lexer</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (lexer-src-pos</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> ; Consume whitespace</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ((:or #\tab #\space) (return-without-pos (toy-lexer input-port)))</font></div>
<div><span class="Apple-style-span" style="font-family:'courier new',monospace"> </span></div><div><font class="Apple-style-span" face="'courier new', monospace"> ; Email addresses</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ((:: dot-atom #\@ dot-atom) (token-addr-spec lexeme))</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ; Commands</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ("remember" 'REMEMBER)</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> ("is" 'IS)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> </font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ; ??? what to lex here ???</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> ((complement (:: any-string "is" any-string)) (token-alias lexeme))</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (any-string (token-fact lexeme))))</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">(define toy-parser</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (parser</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> (tokens toy-tokens empty-toy-tokens)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (start start)</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> (end eof)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (error (lambda (a b c d e) (display (format "~a ~a ~a ~a ~a" a b c</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> (position-offset d)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (position-offset e)))))</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> (src-pos)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> </font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (grammar</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> (start (() #f)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ((REMEMBER alias IS addr-spec) `(alias ,$2 ,$4))</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> ((REMEMBER fact) `(fact ,$2))))))</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">; test</font></div><div><font class="Apple-style-span" face="'courier new', monospace">(define (test str)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (let ((p (open-input-string str)))</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> (port-count-lines! p)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> (toy-parser (lambda () (toy-lexer p)))))</font></div>
</div></div><div><br></div><div>The problem I'm having is that the 'fact' lexer rule always matches without giving a chance for the other rules to attempt a match. Perhaps it is my ignorance with BNF. Can this language be expressed in this way? An alternative I've thought of is to create a lexer rule to just match "remember" then pass the port to another lexer that tries to look for "is" or (eof) and munge the result into a token. Alternatively I could try to regex the <alias>, <email> or <fact> clauses out and parse them separately, but I'd like to compose this toy parser into a larger one if possible. Yet I feel there is a simple technique here that I've missed in my ignorance. Any ideas?</div>
<div>Many thanks, Simon.</div><div><br></div><div><br></div>