[racket] in praise of if's mandatory else clause

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Mon May 30 20:01:46 EDT 2011

Hendrik Boom wrote at 05/30/2011 06:58 PM:
> On Mon, May 30, 2011 at 04:58:00PM -0400, Neil Van Dyke wrote:
>   
>> * Do very expensive farming of system to detect places where programmers  
>> did copy&paste reuse, when for maintainability (and perhaps code  
>> footprint) we'd prefer that the code be generalized.  I'm pretty sure  
>> that there is a programming practice that involves the train of thought  
>> "this problem A is similar to problem B that I have seen before, so I  
>> will copy the code for A and modify it to do B", and some programmers do  
>> this a lot more than others do. The funniest I've seen was a  
>> construction, "(if BOOLEAN-VARIABLE HUGE-BLOCK-OF-CODE-1  
>> HUGE-BLOCK-OF-CODE-2)", where ediff eventually showed that the two huge  
>> blocks of code differed only a single Boolean constant, equal to  
>> "BOOLEAN-VARIABLE".  More commonly, this takes the form of a copy&pasted  
>> procedure within the same module, multiple definitions from one module  
>> pasted into another (which may not be modified), or an entire module  
>> cloned as a starting point.  A checking tool for this would also be  
>> useful for identifying generalization opportunities throughout code that  
>> wasn't copy&paste'd, such as two procedures that coincidentally turned  
>> out almost the same, or a code pattern that is used widely and could be  
>> a macro.   I think there's a PhD in there, unless it's already been  
>> mostly done.
>>     
>
> Have a look at Dick Grune's (www.dickgrune.com) similarity tester 
> (http://www.dickgrune.com/Programs/similarity_tester/).
>   

Thanks.  If I read correctly, I think this paper describes a heuristic 
similarity metric, crafted to detect copying of small introductory 
student programming assignments.

I imagine that a rough similarity metric like this might be used to 
speed up more expensive precise partial structural matching of chunks of 
code in large systems, to first find promising-looking general areas to 
target for the more expensive matching.  I think that the expensive 
structural matching is necessary, so that you could generate complete 
suggested code improvements programmatically, and also to weed out some 
false-positives found by your heuristic.

One exercise that I would find interesting is to look at examples of 
``duplicate'' code in corpora of real-world software systems, and try to 
characterize those examples in a way useful for crafting this fast metric.

-- 
http://www.neilvandyke.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20110530/84156f4e/attachment.html>

Posted on the users mailing list.