[plt-scheme] Re: Poacher turned gamekeeper
Hi Philippos,
On Tue, Nov 10, 2009 at 07:12:44PM -0800, Philippos Apolinarius wrote:
> I think that I cannot answer your question, because I never
> understood how Lisp can be so fast.
The important thing to remember about the performance of
computer programs (and therefore benchmarks) is that the
computer itself generally continues to work at exactly the same
rate, in terms of instructions per second. To be sure, there's
some variability these days, due to the complexity of memory
hierarchies and instruction-level parallelism, and temperature
regulation, but it is so, by and large. So the thing that
makes one program slower than another is that it is making the
processor execute a bunch of instructions that are extraneous
to the process of actually computing the result. If you find
that Lisp is faster than some other language for a particular
class of problem, then that will almost certainly be because it
is doing less unnecessary work. Finding out how and why can be
interesting.
> SBCL looks like Matlab
> or Python, but it is much faster. In fact, it beats C, if one
> uses a lot of data structures, pass functions around and build
> and eval expressions. In "When Lisp is fasten than C", Genetic
> and Evolutionary Computations, 2006, Svingen claims that Lisp
> is so fast because it compiles expressions on the fly. My
> question is: Why Python or Matlab cannot do the same?
The simple answer is that they're not the same kind of language.
Python is (mostly) a simple interpreter. Matlab is a simple
interpreter wrapped around highly optimized BLAS and LAPAC
Fortran Matrix libraries (or at least it was at one stage: I
suspect that there's more Java involved, these days.) Python
has an eval, but the eval'd code is still just interpreted.
Matlab also has an eval (I think), but that's also working on
the surface, interpreted language. Matlab is the wrong language
to use unless your problem has a lot of heavy lifting in the
matrix/vector functions.
That's not to say that this is necessarily the case: both might
(and will probably) change, over time. Python has iron Python
that operates by compiling to MSIL (.NET), and so runs at native
code speeds, perhaps. There is also PyPy, which can compile
some classes of program. Matlab seems to be moving towards
integration of its language with Java, so I would expect that
to make use of the Java hot-spot compiler at some stage. (Of
course the Matlab language has other issues that might make it
slower than you would like in any given instance...)
You might want to try some of your genetic programming work in
Clojure, too: I believe that it dynamically compiles functions
to JVM bytecode, which will then be compiled to native code by
hotspot.
> As I told you, I really don't know why Lisp, having a feel
> of an interpreted language, can be so fast, almost as fast
> as batch compiled languages.
The lisps that you've been using, CMUCL and SBCL *are*
native-code compilers. Everything that you eval is first
compiled and then executed as a binary program. What's
more, SBCL has a lot of type-inference (ML/Hascall style) in
its compilation function, so it doesn't have to do as much
dynamic dispatch as you might expect. Both interpreter loops
and dynamic dispatch are examples of instruction execution
superfluous to the task at hand (although usually there for good
reasons none the less).
> I also don't understand why
> Python cannot have a compiler that generates code as fast as
> SBCL. Why Python compiler writers don't examine the SBCL code,
> and do something similar?
That's easy: they don't want to. Python's primary creators
aren't interested in the performance of their language, yet.
They are only interested in the language itself. It is still
young, and the community (and Guido) is still working out how
they want to express their idioms. There is a lot of syntax in
Python... It is an exercise in style, amongst other things.
> 2 --- It seems that Lisp is around a long time. Lisp compiler
> writers had time to write very good optimizers and incremental
> compilers.
This is certainly true.
> 3 --- Lisp may be very good for certain kinds of problems,
> like number crunching, computer algebra, and optimizing
> incremental compilers.
That is likely true too.
> My problem in beating Lisp is that most programs I am working
> with build expressions and eval them, and have a lot of
> occasion for parallelism. Clean and JHC don't have an eval
> function, i.e., I cannot build and function, compile it on the
> fly, and eval it.
Not many languages do. It is possible that genetic programming
is in fact a perfect fit to Lisp.
Oh: another language that now has dynamic compilation of
generated functions is (I think) Javascript. You might want to
keep your eye on that, too.
> ; to be fair, Clean has dynamic link, which
> helps, but it is not as simple to use as SBCL eval. Besides
> this, parallel algorithms are much easier in Lisp than in
> Clean or JHC. Well, it is easy to program parallel algorithms
> in GHC, but GHC is much slower than JHC. You must understand
> that while Will Roger can distribute his Bigloo program in one
> hundred machines, all I have is a machine with two quadricore
> processors. Besides this, Bigloo is much faster than GHC. I
> believe that JHC is faster than Bigloo, but it does not have
> parallel processing.
For that kind of thing, you might want to look at erlang and
termite (a library for Gambit scheme that provides erlang-like
threading and communication facilities.)
> Clean and Haskell (JHC) beats Lisp in one aspect. A genetic
> programming system that compiled into 52m of SBCL generated
> code, compiles in less than 2m of JHC generated code. Besides
> this, JHC code is only 4 times slower than the equivalent SBCL
> code.
That's a very common trade-off: compilation time verses
execution time. That's why most compilers have control knobs
to tell them how hard to work on optimizing the compiled code,
compared to just running it. After all, the act of compilation
is a whole pile of instructions that aren't directly computing
the desired result, but they may well be reducing the number of
instructions that are executed once it eventually gets to the
task of computing the desired result. You have to have a sense
of how much work is involved in the result computation verses
the amount involved in the compilation (and, indeed, the amount
of human-time involved in writing the program in the first
place: it is the latter factor that has the Python programmers
all excited: they are mostly concerned with programs that take
much longer to write than they do to run.)
Good luck with your studies! You seem to have access to some
very cool and interesting problems. That is a very good
starting point.
Cheers,
--
Andrew