[racket] Question about parser-tools/lex

From: Philippe Mechaï (philippe.mechai at gmail.com)
Date: Thu Oct 18 14:57:06 EDT 2012

On Thu, Oct 18, 2012 at 12:05:19PM -0600, Danny Yoo wrote:
> > The first two tests behave as expected but I would have expected the third
> > to fail. I understand that it does not, given the way the lexer is
> > implemented, but I cannot figure out how to change this behavior.
> 
> Can you make those expectations explicit?  I do not know what you want
> the value or behavior to be from the test expressions below.
> 
> 
> You can use the rackunit test suite library
> (http://docs.racket-lang.org/rackunit/) to make these expectations
> more explicit.
> 
> 
> For example, let's say that we add a require to rackunit as well as a
> small helper function to pull all the tokens out of a string:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (require rackunit)
> 
> ;; collect-tokens: string -> (listof token)
> ;; Grabs all the tokens we can out of the tokenization of instr.
> (define (collect-tokens instr)
>   (call-with-input-string
>    instr
>    (lambda (ip)
>      (define producer (lambda () (sample-lexer ip)))
>      (for/list ([token (in-producer producer 'EOF)])
>        token))))
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> 
> With this setup, we can express tests such as:
> 
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> ;; TEST 1
> (check-exn exn:fail? (lambda () (collect-tokens "*")))
> 
> ;; TEST 2
> (check-equal? (collect-tokens "4 a") (list (token-NUM 4) (token-ID 'a)))
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 
> 
> These assert what we want the behavior to be in a way that allows for
> automatic testing.  If we break the behavior of the tokenizer, these
> tests will yell at us.  Otherwise, they'll stay silent.
> 
> 
> What do you you want the behavior of the lexer to be when it hits
> "4a"?  I'm being serious when I say I do not know!  It could either be
> an error, or maybe you want an ID with '4a as its content.  Or maybe
> you want two separate tokens.  Which do you want?

Hi Danny,

First of all, thank you for your very complete answer.

I am sorry I was not clear enough, things seemed obvious to me given the attached sample lexer.
Note that the lexer I wrote is more complicated and I found that it didn't behave properly when I started writing unit tests, so I wrote a minimal example to exhibit this behavior that seems strange to me.

Anyway, back to your questions and using your testing code, this is the behavior I expect:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Test 1
(check-exn exn:fail? (lambda () (collect-tokens "*")))

;; Test 2
(check-equal? (collect-tokens "4 a") (list (token-NUM 4) (token-ID 'a)))

;; Test 3
(check-exn exn:fail? (lambda () (collect-tokens "4a")))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

I thought that, given the way the NUM and ID tokens are defined (resp. only digits, only letters), the third test should pass...it does not.

Thanks again for your time.

Regards,
Philippe Mechaï

Posted on the users mailing list.