[plt-scheme] Getting started with Scheme: How can I improve my code for generating prime numbers?

From: Eli Barzilay (eli at barzilay.org)
Date: Sun Aug 2 15:18:58 EDT 2009

On Aug  1, SiWi wrote:
> Hello PLT Scheme community,
> this is my first post here, so please excuse any mistakes.  I'm 16
> years old and so far my programming experience has mainly been in
> Python.
> To learn about functional programming, I just started with Scheme.
> For training I tried to convert some prime number generator code
> from Python to Scheme.
> It's using a sieve algorithm.
> The problem is that my Scheme code is about 1000-10000 times slower
> than my Python code.
> I would greatly appreciate it if you could give me some hints on how
> to improve my code.

You have several problems in your code, which are a result of
translating the Python code into a more Scheme-ish version.

A few things that I've noticed can all be seen in this expression that
you wrote:

  (append final (list (+ (* 2 i) 3)))

* "Python lists" are really not the same as "Scheme lists".  In
  Scheme, this refers to a linked list, where each item is a "cons
  cell" which holds a value and a pointer to the next item in the
  list.  This makes certain operations much faster.  For example,
  adding a value at the front of an existing list is a constant time
  operation (since it has a pointer to the input list).  It is also
  very natural for a whole bunch of problems that programmers run
  into, and it is easy to iterate over directly.

  In Python, the lists are really more like arrays.  They are faster
  for things like random-access of some value (you just look at the
  given offset in the array), but if you want to add an item at the
  front of the "list", you need to move everything down.  It is also
  inconvenient to iterate on (you can't do it directly, you need to
  use an index as a counter), and it is natural in cases where you
  know that you need a chunk of values, usually of some fixed size
  (some languages allow these arrays to vary dynamically, or even be
  able to use any value for an index).

* In your python code, you just create one such vector, then change,
  and finally collect the results into the output.  In Scheme, you
  keep creating new lists -- which is very expensive.  Think about it
  this way: your `list-bool' function takes the current sieve as an
  input, *copies* it to a new one while changing some bits to true.
  This is obviously much slower.

* Finally, I didn't talk about adding an item at the end of a list.
  With lists, this (append some-list (list new-value)) is very
  expensive because you need to copy the whole list to have a
  different last value.  If you do this 10000 times, then each of
  these steps creates a new copy of the list so far, which is a lot of
  work for the GC.  But this was not really necessary -- looking at
  your python code, you only add an item at the front of the list.

Just to measure the speed difference, I added this line to your code:

  print(len(primes(100000000)))

And then I did a direct translation of your code to PLT Scheme:

  #lang scheme/base
  (define (primes n)
    (define root  (expt n 0.5))
    (define len   (/ (- (+ n 1) 3) 2))
    (define sieve (make-vector len #t))
    (for ([i (in-range len)])
      (define x (+ (* i 2) 3))
      (unless (> x root)
        (when (vector-ref sieve i)
          (for ([j (in-range (+ (/ (- (* x x) x) 2) i) len x)])
            (vector-set! sieve j #f)))))
    (cons 2 (for/list ([x (in-range len)] #:when (vector-ref sieve x))
              (+ (* x 2) 3))))
  (length (primes 100000000))

This might be fitting on some cases, but it's mostly not an example of
typical Scheme code.  Anyway, the timings I'm getting on my machine
are

  python:   28.64s user 2.94s system 99% cpu 31.891 total
  mzscheme: 10.03s user 0.53s system 99% cpu 10.562 total

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the users mailing list.