[racket-dev] read-line performance problem
Racket can do this somewhat faster, but I suggest any effort be focused
on improvements that are also relevant to substantial programs, and not
on trying to compete on Perl one-liners and poor benchmarks.
Details follow...
Trying this 'benchmark' on a 700MB log file (just Linux "dmesg" output,
duplicated many times), I saw somewhat comparable numbers with Racket
5.x as those on Stackoverflow. (This was on Linux on an old 2GHz
laptop, no swap space, and the kernel had cached the 700MB in RAM
buffer, so it was just Racket pegging a CPU core at 100%.)
Using a "regexp-match" was significantly faster than "read-bytes-line",
but I'm sure still slower than the other languages mentioned.
The process size stayed at 40MB total (shared libraries and
everything). It looked like there were near-constant quick GC cycles.
GC tuning might help?
This would be a more useful benchmark if it required actually doing
something plausible with the allocations, rather than immediately
throwing them away and doing no actual processing. I suspect Racket
would perform relatively better on something closer to a real-world task.
Were I writing high-performance I/O code, I might use
"read-bytes-avail!", to try to reduce allocations. Of course,
sys-admins would not be doing this for quick scripting Perl-like tasks.
(Were we to max out what we can do with GC tuning and optimizations, we
could always try making a minilanguage for this traditional Perl-like
task, which optimized away some allocations, such as by allocating only
text that we use.)
Matching Perl I/O performance would be nice, but I'm not disappointed if
nobody does. Perl was originally developed for pretty much this exact
task (i.e., going through a line-oriented text-ish data file, applying a
regexp to each line) and to be fast even on a 16MHz 4MB Sun 3/50 of over
20 years ago.
Also, I think we discussed this a while ago (perhaps when making the
few-liner examples for the new Web site), but I think that nobody will
win over any Perl programmers by trying to get their language to do
20-year-old Perl one-liners. This program is a handful of characters in
Perl, and telling people that they could be typing "lambda" and
parentheses and such instead, and wouldn't that be so much better, makes
one look like a crazy person. Focus on things that are *not* Perl
one-liners, but are substantial programs -- especially ones that benefit
from syntactic extension, functional-ish programming, and
maintainability -- since that's where Racket becomes a smart tool of
smart people, and where Perl becomes a burden of crazy people.
With that in mind, from a PR perspective, if a Perl-type person asks
you, "What does this Perl one-liner look like in Racket?", the preferred
responses are: (1) "That task looks like what Perl is good at"; (2) do
as politicians do, and answer the question that you wish you had been
asked; (3) pretend to speak only Swahili and to not understand the question.
Sam Tobin-Hochstadt wrote at 11/02/2011 07:14 PM:
> On StackOverflow [1], someone reported that Racket's I/O performance
> on large files was substantially worse than other languages for a
> simple task. I haven't yet tried it on a similarly large volume of
> data, but I did see a performance difference relative to Chicken for
> large but not huge files, and Ryan seems to have gotten similar
> results.
>
> [1] http://stackoverflow.com/questions/7946745/i-o-performance-in-mzscheme
--
http://www.neilvandyke.org/