[racket] James Swaine, now a PhD

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Wed Apr 23 13:52:00 EDT 2014

James successfully defended his dissertation today. Below is a taste.

Congrats, James!


Many modern high-level or scripting languages are implemented around
an interpretive run-time system, often with a JIT compiler. Examples
include the Racket runtime system, the Parrot virtual machine, and the
virtual machines underlying Perl, Python, Ruby, and other
productivity-oriented languages. These runtime systems are often the
result of many man-years of effort, and they have been carefully tuned
for capability, functionality, correctness, and performance.

For the most part, such runtime systems have not been designed to
support parallelism on multiple processors. Even when a language
supports constructs for concurrency, they are typically implemented
through co-routines or OS-level threads that are constrained to
execute one at a time.

This limitation has become a serious issue, as it is clear that
exploiting parallelism is essential to harnessing performance in
future processor generations. Whether computer architects envision the
future as involving homogeneous or heterogeneous multicores, and with
whatever form of memory coherence or consistency model, the common
theme is that the future is parallel and that language implementations
must adapt. The essential problem is making the language
implementation safe for low-level parallelism, i.e., ensuring that
even when two threads are modifying internal data structures at the
same time, the runtime system behaves correctly.

One approach to enabling parallelism would be to allow existing
concurrency constructs to run in parallel, and to rewrite or revise
the runtime system to carefully employ locking or explicit
communication. Experience with that approach, as well as the
persistence of the global interpreter lock in implementations for
Python and Ruby, suggests that such a conversion is extremely
difficult to perform correctly. Based on the even longer history of
experience in parallel systems, one would also expect the result to
scale poorly as more and more processors become available. The
alternative of simply throwing out the current runtime and
re-designing and implementing it around a carefully designed
concurrency model is no better, as it would require discarding years
or decades of effort in building an effective system, and this
approach also risks losing much of the language’s momentum as the
developers are engaged in tasks with little visible improvement for a
long period.

This dissertation investigates a new technique for parallelizing
runtime systems, called slow-path barricading. The technique is based
on the observation that the core of many programs – and particularly
the part that runs fast sequentially and could benefit most from
parallelism – involves relatively few side effects with respect to the
language implementation’s internal state. Thus, instead of wholesale
conversion of the runtime system to support arbitrary concurrency, we
add language constructs that focus and restrict concurrency where the
implementation can easily support it.

Specifically, the set of primitives in a language implementation is
partitioned into safe (for parallelism) and unsafe categories. The
programmer is then given a mechanism to start a parallel task; as long
as the task sticks to safe operations, it stays in the so-called fast
path of the implementation and thus is safe for parallelism. As soon
as the computation hits a barricade, the runtime system suspends the
computation until the operation can be handled in the more general,
purely sequential part of the runtime system.

Although the programming model allows only a subset of language
operations to be executed in parallel, this subset roughly corresponds
to the set of operations that the programmer already knows (or should
know) to be fast in sequential code. Thus, a programmer who is
reasonably capable of writing fast programs in the language already
possesses the knowledge to write a program that avoids unsafe
operations—and one that therefore exhibits good scaling for
parallelism. Furthermore, this approach enables clear feedback to the
programmer about when and how a program uses unsafe operations.

Thesis: We can incrementally add effective parallel programming
primitives and tool support to legacy sequential runtime systems with
a modest investment of effort.

Posted on the users mailing list.