[racket-dev] read-line performance problem

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Wed Nov 2 20:54:25 EDT 2011

Racket can do this somewhat faster, but I suggest any effort be focused 
on improvements that are also relevant to substantial programs, and not 
on trying to compete on Perl one-liners and poor benchmarks.

Details follow...

Trying this 'benchmark' on a 700MB log file (just Linux "dmesg" output, 
duplicated many times), I saw somewhat comparable numbers with Racket 
5.x as those on Stackoverflow.  (This was on Linux on an old 2GHz 
laptop, no swap space, and the kernel had cached the 700MB in RAM 
buffer, so it was just Racket pegging a CPU core at 100%.)

Using a "regexp-match" was significantly faster than "read-bytes-line", 
but I'm sure still slower than the other languages mentioned.

The process size stayed at 40MB total (shared libraries and 
everything).  It looked like there were near-constant quick GC cycles.  
GC tuning might help?

This would be a more useful benchmark if it required actually doing 
something plausible with the allocations, rather than immediately 
throwing them away and doing no actual processing.  I suspect Racket 
would perform relatively better on something closer to a real-world task.

Were I writing high-performance I/O code, I might use 
"read-bytes-avail!", to try to reduce allocations.  Of course, 
sys-admins would not be doing this for quick scripting Perl-like tasks.  
(Were we to max out what we can do with GC tuning and optimizations, we 
could always try making a minilanguage for this traditional Perl-like 
task, which optimized away some allocations, such as by allocating only 
text that we use.)

Matching Perl I/O performance would be nice, but I'm not disappointed if 
nobody does.  Perl was originally developed for pretty much this exact 
task (i.e., going through a line-oriented text-ish data file, applying a 
regexp to each line) and to be fast even on a 16MHz 4MB Sun 3/50 of over 
20 years ago.

Also, I think we discussed this a while ago (perhaps when making the 
few-liner examples for the new Web site), but I think that nobody will 
win over any Perl programmers by trying to get their language to do 
20-year-old Perl one-liners.  This program is a handful of characters in 
Perl, and telling people that they could be typing "lambda" and 
parentheses and such instead, and wouldn't that be so much better, makes 
one look like a crazy person.  Focus on things that are *not* Perl 
one-liners, but are substantial programs -- especially ones that benefit 
from syntactic extension, functional-ish programming, and 
maintainability -- since that's where Racket becomes a smart tool of 
smart people, and where Perl becomes a burden of crazy people.

With that in mind, from a PR perspective, if a Perl-type person asks 
you, "What does this Perl one-liner look like in Racket?", the preferred 
responses are: (1) "That task looks like what Perl is good at"; (2) do 
as politicians do, and answer the question that you wish you had been 
asked; (3) pretend to speak only Swahili and to not understand the question.

Sam Tobin-Hochstadt wrote at 11/02/2011 07:14 PM:
> On StackOverflow [1], someone reported that Racket's I/O performance
> on large files was substantially worse than other languages for a
> simple task.  I haven't yet tried it on a similarly large volume of
> data, but I did see a performance difference relative to Chicken for
> large but not huge files, and Ryan seems to have gotten similar
> results.
> [1] http://stackoverflow.com/questions/7946745/i-o-performance-in-mzscheme


Posted on the dev mailing list.