[racket] in praise of if's mandatory else clause
On Mon, May 30, 2011 at 08:01:46PM -0400, Neil Van Dyke wrote:
> Hendrik Boom wrote at 05/30/2011 06:58 PM:
>> On Mon, May 30, 2011 at 04:58:00PM -0400, Neil Van Dyke wrote:
>>
>>> * Do very expensive farming of system to detect places where
>>> programmers did copy&paste reuse, when for maintainability (and
>>> perhaps code footprint) we'd prefer that the code be generalized.
>>> I'm pretty sure that there is a programming practice that involves
>>> the train of thought "this problem A is similar to problem B that I
>>> have seen before, so I will copy the code for A and modify it to do
>>> B", and some programmers do this a lot more than others do. The
>>> funniest I've seen was a construction, "(if BOOLEAN-VARIABLE
>>> HUGE-BLOCK-OF-CODE-1 HUGE-BLOCK-OF-CODE-2)", where ediff eventually
>>> showed that the two huge blocks of code differed only a single
>>> Boolean constant, equal to "BOOLEAN-VARIABLE". More commonly, this
>>> takes the form of a copy&pasted procedure within the same module,
>>> multiple definitions from one module pasted into another (which may
>>> not be modified), or an entire module cloned as a starting point. A
>>> checking tool for this would also be useful for identifying
>>> generalization opportunities throughout code that wasn't
>>> copy&paste'd, such as two procedures that coincidentally turned out
>>> almost the same, or a code pattern that is used widely and could be
>>> a macro. I think there's a PhD in there, unless it's already been
>>> mostly done.
>>>
>>
>> Have a look at Dick Grune's (www.dickgrune.com) similarity tester
>> (http://www.dickgrune.com/Programs/similarity_tester/).
>>
>
> Thanks. If I read correctly, I think this paper describes a heuristic
> similarity metric, crafted to detect copying of small introductory
> student programming assignments.
>
> I imagine that a rough similarity metric like this might be used to
> speed up more expensive precise partial structural matching of chunks of
> code in large systems, to first find promising-looking general areas to
> target for the more expensive matching. I think that the expensive
> structural matching is necessary, so that you could generate complete
> suggested code improvements programmatically, and also to weed out some
> false-positives found by your heuristic.
Dick told me once that it had been used to detect candidates for code
factoring. I don't know to what extent this was automated.
I may recall incorrectly, but I seem to remember that Bill Wulf's BLISS
compiler, way back in the 70's, did automatic subroutine detection. It
optimised for code size, so as well as inlining small subroutines,
recognising duplicate code and outlining it was useful.
>
> One exercise that I would find interesting is to look at examples of
> ``duplicate'' code in corpora of real-world software systems, and try to
> characterize those examples in a way useful for crafting this fast
> metric.
But doing this in generated code is qualitatively different from doing
it to source code. Source code has to be comprehensible after
transformation.
-- hendrik