[racket] in praise of if's mandatory else clause

From: Hendrik Boom (hendrik at topoi.pooq.com)
Date: Tue May 31 09:40:28 EDT 2011

On Mon, May 30, 2011 at 08:01:46PM -0400, Neil Van Dyke wrote:
> Hendrik Boom wrote at 05/30/2011 06:58 PM:
>> On Mon, May 30, 2011 at 04:58:00PM -0400, Neil Van Dyke wrote:
>>   
>>> * Do very expensive farming of system to detect places where 
>>> programmers  did copy&paste reuse, when for maintainability (and 
>>> perhaps code  footprint) we'd prefer that the code be generalized.  
>>> I'm pretty sure  that there is a programming practice that involves 
>>> the train of thought  "this problem A is similar to problem B that I 
>>> have seen before, so I  will copy the code for A and modify it to do 
>>> B", and some programmers do  this a lot more than others do. The 
>>> funniest I've seen was a  construction, "(if BOOLEAN-VARIABLE 
>>> HUGE-BLOCK-OF-CODE-1  HUGE-BLOCK-OF-CODE-2)", where ediff eventually 
>>> showed that the two huge  blocks of code differed only a single 
>>> Boolean constant, equal to  "BOOLEAN-VARIABLE".  More commonly, this 
>>> takes the form of a copy&pasted  procedure within the same module, 
>>> multiple definitions from one module  pasted into another (which may 
>>> not be modified), or an entire module  cloned as a starting point.  A 
>>> checking tool for this would also be  useful for identifying 
>>> generalization opportunities throughout code that  wasn't 
>>> copy&paste'd, such as two procedures that coincidentally turned  out 
>>> almost the same, or a code pattern that is used widely and could be   
>>> a macro.   I think there's a PhD in there, unless it's already been   
>>> mostly done.
>>>     
>>
>> Have a look at Dick Grune's (www.dickgrune.com) similarity tester  
>> (http://www.dickgrune.com/Programs/similarity_tester/).
>>   
>
> Thanks.  If I read correctly, I think this paper describes a heuristic  
> similarity metric, crafted to detect copying of small introductory  
> student programming assignments.
>
> I imagine that a rough similarity metric like this might be used to  
> speed up more expensive precise partial structural matching of chunks of  
> code in large systems, to first find promising-looking general areas to  
> target for the more expensive matching.  I think that the expensive  
> structural matching is necessary, so that you could generate complete  
> suggested code improvements programmatically, and also to weed out some  
> false-positives found by your heuristic.

Dick told me once that it had been used to detect candidates for code 
factoring.  I don't know to what extent this was automated.

I may recall incorrectly, but I seem to remember that Bill Wulf's BLISS 
compiler, way back in the 70's, did automatic subroutine detection.  It 
optimised for code size, so as well as inlining small subroutines, 
recognising duplicate code and outlining it was useful.

>
> One exercise that I would find interesting is to look at examples of  
> ``duplicate'' code in corpora of real-world software systems, and try to  
> characterize those examples in a way useful for crafting this fast 
> metric.

But doing this in generated code is qualitatively different from doing 
it to source code.  Source code has to be comprehensible after 
transformation.

-- hendrik


Posted on the users mailing list.