[plt-scheme] to define, or to let

From: Anton van Straaten (anton at appsolutions.com)
Date: Thu Mar 25 13:47:07 EST 2004

Paul Graunke wrote:
> but it is undecidable which "programs" are correct
> (or which syntactic thingies are actually programs depending
> on how you look at it).  This is really bad IMHO.

I agree with this in principle, but there are limits to what I'm willing to
accept to eliminate it.  And I think the situation with R5RS is particularly
tricky, because it's not defining a single language.

> > C and C++ programmers have done this for decades, and it's
> > really easy to avoid writing incorrect code in this respect.
>
> and they still drag their knuckles on the ground when they walk.
> I don't want to emulate them.

That might be called an ad-gorillanem argument.  I don't think it has much
bearing in this case.
The handling of this issue in C is not one of the areas that has led to
problematic software in practice, afaict, regardless of its apparent
semantic ugliness.  If anything, it leads to greater reliability in
practice.

> > because side effect ordering in any expression becomes acceptable,
> > so you can't tell when they belong and when they don't.
>
> They always belong.

They're always legal, but they may be erroneous with respect to the
program's specification.  A fixed and useful evaluation order gives
erroneous statements an unambiguous semantics, makes them indistinguishable
from correct statements, and makes them harder to detect.

> Adding an extra annotation for a static analysis tool that will tell
> you which expressions are effects free (err order independent) would be
> nice.  FX, or Leroy's PoPL 1999 paper, or some (unpublished?) work by
> Matthias and Andrew Write are all ways of (trying to in some cases)
> sort this out.  Of course, whether the code will actually perform
> a side effect is not the same as whether a static approximation thinks
> it might perform a side effect.

In Scheme, I don't see any problem with doing some checks dynamically.
Having an annotation would take care of many cases statically, and the
remainder could be caught dynamically, just like type errors.

> > 1.  This language now has a well-defined meaning for every expression,
> Yes.
> > even incorrect ones (since you may still incorrectly try to use the
> > weird-order constructs to try to sequence effects, which is almost
> > certainly incorrect).
>
> I would not call ordering effects incorrect.

I'm saying that if constructs intended to express order-independent code are
used to implement code that's actually order-depependent, that would be
incorrect.

> I want to minimize the time I think about trivial crap.  Hence design
> patterns, making the code structure follow the structure of the data,
> or any other routine knee-jerk decision making process I can use to
> write code with less thinking time is a win.  I would pick one order
> and stick with it.  I would pick the simplest, clearest order.

How much of your code depends on effects ordering, relatively speaking?  An
assumption I'm making is that it's relatively little - particularly because
things like I/O should be layered, and are not usually an issue in core
code.

Fixing a useful order for the entire program, to deal with something that's
only an issue in a relatively small proportion of cases, is unfortunate.
This is especially so because in those few cases, we want to pay close
attention to what's happening - see the material Bradd posted from Code
Complete for a description of why.

> No.  I agree with your desire to express that you expect your code
> to be evaluation order independent.  I argue, though, that neither
> fixing some weird order, nor letting the compiler change the behavior
> at whim provide this.

Removing the ability to express that expectation isn't good enough, either.

I think it's fine for a specific Scheme implementation, though, only because
the standard still specifies that order-dependent code should not be written
in function applications, LET, etc.  If you want to ignore that, you're
doing something implementation dependent, and I think that's a good barrier
to have.  If you want that feature in every implementation, you don't really
want Scheme, you want Python, perhaps.

I wouldn't be making this case if we were talking about programming in
Python, which is inherently imperative - you can't blink without mutating
something to communicate to the next sequentially executed line of code.
Scheme is a different sort of language, though.

> I've seen another graduate student struggle for hours tracking down
> a bug due to switching from mzscheme to larceny.  The right hand sides
> of a letrec are evaluated in a different order, which caused the bug.
> You may argue the code is incorrect, but if there is no reasonable way
> to detect correct (I think you mean valid or well-formed or something.)
> code, then I claim the system is unreasonable.

I think this lays the blame for that problem in the wrong place.  If letrec
and letrec* (or letrec and letrec-) were clearly distinguished, then the
grad student should have used the one which gave the desired ordering.  But
because of the widespread practice of being able to rely on letrec being
letrec*, an incompatibility has arisen between implementations which do &
don't follow the standard.

Similar confusion is likely to arise if people are encouraged to ignore the
issue of order-independence in LET and function application, contrary to the
standard.

> > I'm talking about using sequencing constructs when that's what
> > you need, and not using them when you don't need them.  If you
> > depend on sequence in an expression which does not guarantee a
> > useful sequence, that's an error.
>
> So we can eliminate all such errors by guaranteeing a useful sequence.

You don't eliminate all such errors.  You eliminate some of them - the ones
where order dependence was intentional - and you hide some of them - the
ones where order dependence was unintentional.  If this  alleged solution
really eliminated all such errors, there'd be nothing to discuss.  However,
what the solution actually does is remove the information used to (a) detect
such errors, (b) discourage programmers from making such errors, and (c)
communicate important information about side-effect management.

> > If you use side effects in such expressions, it should be obvious by
> > simple inspection that they don't have any complicated consequences.
>
> I don't know how to look at a call to a higher order function passed in
> as an argument and tell if it has side effects or not.

That seems to be a shortcoming in the language.  Even if it can't be
determined statically, with appropriate design, violations could certainly
be detected dynamically.  A runtime failure for something like this would be
no worse than a runtime failure due to a type mismatch.

> I'm skeptical of C/C++ programmers doing their laundry correctly.

I learned C++ and began developing commercial systems with it sometime in
the mid-80s.  I developed a couple of commercial shrinkwrap products with
it, among other things.  But you're right, I'm not very good at doing my
laundry - it tends to pile up until everything's dirty and I have nothing to
wear.  Luckily, I work from home, so I can mostly work around that.

Will switching to a semantics with an unambiguous order of evaluation help
me with this?

> I would like to be able to tell if the syntax I write is a well-defined
> program or whether the compiler will arbitrarily do different random
> things when a customer runs the code as opposed to when I test it.

I agree, let's get rid of the random ambiguous stuff, just not at the cost
of lost information about code that has dangerous side-effect dependencies.

> If pinning down an order is the only way I can pin down the meaning
> of a program, then that is what must be done.  Otherwise the behavior
> is arbitrary, unspecified non-sense.  Sorry.

I don't think it is the only way.  It's an expedient way.  I think
expedience is fine for a language implementation.  I don't think it's so
great for a language standard.

> If the issue is compilation optimizations, I'm not sympathetic.
> I'm glad to hear this does not appear to be your main concern.
> If you want a way to express an invariant that you think should
> hold, that's great.  Using either a static approximation to the
> check or a dynamic safety check would be fine.  I don't see any
> computationally feasible way to check that a program behaves the
> same way under all sequential interleavings of function argument
> evaluation.  That is a really messy problem.

I agree, the focus should be on a different problem: how to give the
language better knowledge of when side effects are being used.

> I see nothing gained by using an ill defined language.  I do not buy
> your argument that anything is "checked" this way. I do not see any
> tradeoff.

The reason you don't buy my argument is you're thinking of it purely from
the semantic and machine perspective.  The checking in this case is done on
the human side (see the table in my earlier reply to Robby).  Your solution
reduces the ability of humans to perform these checks, but offers no other
way to detect the real errors in question.

Put another way, you're focusing on a large class of errors which I'm saying
have proven unimportant in practice, and you want to eliminate them and
replace them with a smaller class of errors which are harder to detect in
practice.  They're harder to detect because they've been disguised by making
code containing that kind of error indistinguishable from correct code, by
removing the information that would help to highlight them.

> P.S. Sorry if I come off sounding annoyed.  I just really want to
> be able to look at code and figure out what it means/does and I'm
> not able to do that without a semantics (and a decision procedure
> for whether or not what I'm staring at is a well-formed program).

No apology needed.  You want semantic certainty, I'm pointing out that the
restricted language you're using to get it has punted on an important issue.
I can see how that could be annoying.  :)

Anton



Posted on the users mailing list.