[racket] testing student programs

From: Eli Barzilay (eli at barzilay.org)
Date: Sat Oct 16 15:18:47 EDT 2010

40 minutes ago, Todd O'Bryan wrote:
> I know this has come up on the list before, and I've reread those
> threads but am little confused.
> [...]

> It seems like if I use make-module-evaluator,

For student languages it might be better to use `make-evaluator'.
(With `make-module-evaluator' you should use something like "#lang
htdp/bsl", but IIRC it's not completely the same.)

Get your sandbox up with:

    (define e (make-evaluator '(special beginner) "
      (define (volume-of-solid length width height)
        (* length width height))
      (check-expect (volume-of-solid 2 3 4) 24)
      (check-expect (volume-of-solid 3 5 7) 105)

But also note that the `check-expect' expressions are not being
executed.  (They're done in a way that makes it very hard to run them,
and I don't think that there's a known easy way for that.)

You can also use a path that points to a file

    (define e (make-evaluator '(special beginner)
                              (string->path "/some/path")))

or just plain s-expressions.

> I'm stuck in the context of the original student program--that is,
> Beginning Student Language, without the ability to accumulate a
> score or use constructs that aren't defined in BSL.

You shouldn't do your accounting inside the sandbox -- think about it
as a way to restrict a piece of code, but if you run code in it, then
your code isn't safe.  For example, what if the student code redefines
`+'?  (I know that in the student languages you can't redefine things,
but it's generally not a good idea.)

So just do the evaluation in the student's context, and the rest

    > (define score 0)
    > (when (= (e '(volume-of-solid 3 4 5)) 60) (set! score (add1 score)))
    > (when (= (e '(volume-of-solid 10 5 4)) 200) (set! score (add1 score)))
    > score

You can also do something like this:

    > (define volume-of-solid (e 'volume-of-solid))
    > (when (= (volume-of-solid 3 4 5) 60) ...)

but that means that you're running the body of the student function
outside of the sandbox -- so it can now eat up your memory etc.

One more note: this works because the result of these expressions are
numbers, and numbers are the same inside and outside the sandbox.
This becomes an issue when you're dealing with structs -- the
sandboxed environment will have its own idea for these structs from
the outside, so you need to specify sharing of the relevant modules if
you're dealing with these cases.  For example, if you want to check
posn results, etc.  Yet another alternative is to do the comparison
inside the sandbox:

    (when (e '(= (volume-of-solid 3 4 5) 60)) (set! score (add1 score)))

which will work for structs too -- but that again depends on `=' doing
the expected thing in the sandbox.

> Obviously, there's a place here for a really nice macro-based
> testing harness that checks for errors in each student function
> call, lets you assign points for each test, etc., but I have to
> figure out how to get the definitions I want to test safely into a
> context that lets me write the code to evaluate them.

The sandbox tests do something similar to that:


and note how the testing needs to switch into evaluation in the
sandbox and outside of it.

But for just the kind of test counting as you do above, I suspect that
this is more in the direction of what you want:

    > (define tests '([(volume-of-solid 3 4 5)   60]
                      [(volume-of-solid 10 5 4) 200]))
    > (for/fold ([score 0]) ([test (in-list tests)])
        (+ score (if (equal? (e (car test)) (cadr test)) 1 0)))

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the users mailing list.