[racket] Question about parser-tools/lex

From: Danny Yoo (dyoo at hashcollision.org)
Date: Thu Oct 18 14:05:19 EDT 2012

> The first two tests behave as expected but I would have expected the third
> to fail. I understand that it does not, given the way the lexer is
> implemented, but I cannot figure out how to change this behavior.

Can you make those expectations explicit?  I do not know what you want
the value or behavior to be from the test expressions below.


You can use the rackunit test suite library
(http://docs.racket-lang.org/rackunit/) to make these expectations
more explicit.


For example, let's say that we add a require to rackunit as well as a
small helper function to pull all the tokens out of a string:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(require rackunit)

;; collect-tokens: string -> (listof token)
;; Grabs all the tokens we can out of the tokenization of instr.
(define (collect-tokens instr)
  (call-with-input-string
   instr
   (lambda (ip)
     (define producer (lambda () (sample-lexer ip)))
     (for/list ([token (in-producer producer 'EOF)])
       token))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


With this setup, we can express tests such as:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; TEST 1
(check-exn exn:fail? (lambda () (collect-tokens "*")))

;; TEST 2
(check-equal? (collect-tokens "4 a") (list (token-NUM 4) (token-ID 'a)))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


These assert what we want the behavior to be in a way that allows for
automatic testing.  If we break the behavior of the tokenizer, these
tests will yell at us.  Otherwise, they'll stay silent.


What do you you want the behavior of the lexer to be when it hits
"4a"?  I'm being serious when I say I do not know!  It could either be
an error, or maybe you want an ID with '4a as its content.  Or maybe
you want two separate tokens.  Which do you want?

Posted on the users mailing list.