[racket] Question about parser-tools/lex

From: Danny Yoo (dyoo at hashcollision.org)
Date: Thu Oct 18 15:56:20 EDT 2012

> ;; Test 3
> (check-exn exn:fail? (lambda () (collect-tokens "4a")))
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> I thought that, given the way the NUM and ID tokens are defined (resp. only digits, only letters), the third test should pass...it does not.

Ok, good.  So we want test 3 to fail because there should be some kind
of delimiter between the number token and the rest.  One direct way we
can do this is peek into the port and see if it has a delimiter
immediately following the tokenized content.  We can amend your
tokenizer to:

;; Add syntax/readerr to the list of requires:
(require syntax/readerr)
;; ...

(define sample-lexer
   [(eof) 'EOF]
   [whitespace (sample-lexer input-port)]
   [(:+ alphabetic)
      (assert-delimiter-follows! lexeme input-port)
      (token-ID (string->symbol lexeme)))]
   [(:+ numeric)
      (assert-delimiter-follows! lexeme input-port)
      (token-NUM (string->number lexeme)))]))

;; check that there's a whitespace or eof coming up in the input-port.
(define (assert-delimiter-follows! lexeme ip)
  (define next-char (peek-char ip))
  (unless (or (eof-object? next-char)
              (char-whitespace? next-char))
    (define-values (line column position) (port-next-location ip))
    (raise-read-error (format "expected delimiter after ~e, but I see
~e" lexeme (string next-char))
                      (object-name ip)
                      line column position 1)))

There may be a more direct way to express this within the
parser-tools/lex library.  But since we have general power in each of
the lexer actions, we can do this too.

Hope this helps!

Posted on the users mailing list.