[racket] Size matters

From: Sean Kanaley (skanaley at gmail.com)
Date: Sat Jun 8 16:47:19 EDT 2013

I was coincidentally just thinking about this.  I'm experimenting with 
terse names myself right now.

I believe the general rule for consistent pain/gain is that the favoring 
of the short name is proportional to both how frequently it's used and 
how obvious it is.  For-loops in C had better default to using "i", 
because saying "index" isn't helpful.  If one doesn't understand how a 
for loop works so that "i" is somehow unclear in its purpose, one has no 
chance of understanding the code anyway. Similarly, Racket 
comprehensions can use "x" seeing as the type of the list should already 
be known or the whole thing is incomprehensible in the first place!  
Expanding from this, there is a sort of scope at work one might call 
"meaning or type scope".  The "x" bound to elements of a list is 
necessarily an element typed to whatever that list is made up of (except 
for heterogeneous lists, see below).  If that in turn is a structure, 
something bound to some component of that structure is whatever /that/ 
type is.  If the list is "things-to-print" then any name is already 
redundant because we know x is a "thing-to-print".  Similarly, One can 
bind "p" to the result of a struct access inside another when accessing 
a point in space from the other.  For the code to work /at all/, p must 
be of type "point", so it's easy to remember.  The only reason it's even 
named is because the programmer decided he/she needed to re-use that 
name, so rather than type the whole accessing code to get to "p", he/she 
provided a name.  The very fact that it was named in this manner 
suggests the need for immediate reuse and thus ease of remembrance.  
This would fail to be true if it /wasn't/ reused immediately, in the 
static/lexical sense, but then it's for all intents and purposes a 
global variable!  It's either local and not confusing or /not /local and 
/confusing./  The latter case implies either poor code or the need for 
an actual descriptive name.  That is, do not use "i" for something that 
is used at 100-line intervals in your code!  Might forget that "i" is 
bound and end up shadowing it inadvertently, or in my experience, the 
far worse case that happens when you /un/shadow after changing the local 
"i", but perhaps not everywhere needed, and weird results ensue 
depending on which one is scoped.  Either avoid sharing a variable 
between what is hopefully multiple functions if the references are that 
far apart (making it global-ish), shorten the functions, or, if really 
necessary to have this sharing, give it a real name, like 
super-important-reference-that-i-can't-figure-out-how-to-keep-local-cause-i'm-dumb 
(Racket provides define-values, let, and an object system, all of which 
share locally to multiple functions.).  In summary, if short names are 
confusing the code was already bad!

*About heterogeneous lists.  Sometimes, one wants the equivalent of 
multiple return values without the inconvenience, like a list of list of 
3 things that are associated instead of 3 "unzipped" lists that now have 
to be bound with (let-values ([( ... which is rather heavy, and 
recombined by zipping them.  This someone might not care to define a 
struct for this list of 3 things to make a nice typed homogeneous list 
for simple utility functions that do 2 or 3 things in parallel, but as 
mentioned, can't simply return 2 or 3 things because a list cannot 
contain multiple values directly.  It's not so bad to just remember that 
car = some type and cadr = some other type and caddr = last type.  Or 
maybe I should try harder and come up with a better solution.  Accessing 
by position is dangerous anyway, so the concision gained by (match-let 
([( or car/cadr/caddr is perhaps roughly cancelled by the need to modify 
the underlying representation, requiring a rewrite of every single 
by-position access.  SICP always recommended procedures for accessing.  
So yes, there's a balance between concision, readability, 
comprehensibility, modifiability, etc.  But I believe the balance is in 
favor of helping the programmer read and understand the code, which to 
me means concision, killing three birds with one stone.

On 06/08/2013 03:49 PM, Eli Barzilay wrote:
> (Possible irrelevant rambling, but I generally am very aware of my
> code formatting, and this looked like a good point to highlight.  It's
> still probably picking on something that not many people care about as
> much as I do...)
>
> The style guide has a section labeled "size matters" -- it also has
> another section on names, encouraging using readable names.  Both of
> these are perfectly reasonable goals, but they can pull you in
> different directions, and I ran into an example that demonstrates it
> in a very nice way.  There is also Yaron's quote about favoring
> readers, which is IMO more important than both of these -- since they
> are just sub-points aiming towards that goal.
>
> I have recently experimented more with taking code compactness more
> seriously.  I still keep my own code under 80 characters, which I know
> many people don't like, but on the other hand, I always try to fill
> the lines as much as possible while maintaining readability.  The idea
> is that the principle of favoring readers means that it should be easy
> to read code -- and lines that are too long, or suffer from rightward
> drift, or names that are too long are all things that delay reading.
> I can also see the recent tendency to use `define' where possible as
> something that goes to this end too: prefer using the same kind of
> binding construct to make it easier to read code.
>
> So to get to my point, there's the decision of what name to use for
> your identifiers some people (*ahem*) stick to an always-readable
> names, to the point of not using names like `i', `n', and `x'.  I'm on
> the complete other side of this -- I strongly prefer these names
> because they make code shorted and therefore easier to read.  The same
> goes for the style guide's recommendation to name intermediate values
> instead of nesting function calls -- it's obviously a line that the
> author should decide on (exaggerated examples on both sides are easy,
> and contribute nothing), and I tend to go further with nested calls
> than defining intermediates.  (This same point applies to naming
> functions vs using lambda expressions.)
>
> My point here is that using longer names and binding intermediates can
> explain the code better, but it's easy to carry this over to hurting
> overall readability.  The example that made me post this is below (no
> need to look up who wrote the original code, it's irrelevant since it
> *is* following many of the style's guidelines).  The first chunk is
> the original code, the second is my rewrite.  There are some obvious
> things like using `match' to make the code more concise and more
> readable, but note that the `cond' expression in the loop is very hard
> to read in the first version -- the descriptive names are long enough
> that the overall structure of the loop is not obvious.
>
> In my revision, the much shorter names are less readable, but on the
> flip side they allow laying out the code in a way that makes it much
> more obvious, and since I started this by just doing mechanical
> transformations, it was surprising to see that what the shrunk code is
> doing becomes very clear.  This clarity has the obvious advantages,
> which is why it's so important to "favor readers".
>
> As a sidenote -- there are two named intermediates in this code that I
> didn't get rid of: on one hand losing them won't save space (and
> therefore I consider removing them as something that would only
> obfuscate things), and on the other hand the result would be nesting
> the verbose computation into a place where it is not relevant.  (And
> yes, there'd be an advantage for having an infix syntax here, IMO...)
>
> (In any case, sorry for the noise.  I'll avoid further replies...)
>
> -------------------------------------------------------------------------------
>
> (define (fill-sack (items items) (volume-left 0.25) (weight-left 25) (sack null) (sack-value 0))
>    (if (null? items)
>        (values (list sack) sack-value)
>        (let* ((item (first items))
>               (item-wgt (item-weight item))
>               (max-q-wgt (floor (/ weight-left item-wgt)))
>               (item-vol (item-volume item))
>               (max-q-vol (floor (/ volume-left item-vol))))
>          (for/fold
>              ((best-sacks (list sack))
>               (best-sack-value sack-value))
>            ((qty (in-range 0 (add1 (min max-q-vol max-q-wgt)))))
>            (let-values (((inner-best-sacks inner-best-sack-value)
>                          (fill-sack (cdr items)
>                                     (- volume-left (* qty item-vol))
>                                     (- weight-left (* qty item-wgt))
>                                     (cons (cons qty item) sack)
>                                     (+ sack-value (* qty (item-value item))))))
>              (cond
>                [(> inner-best-sack-value best-sack-value)
>                 (values inner-best-sacks inner-best-sack-value)]
>                [(= inner-best-sack-value best-sack-value)
>                 (values (append best-sacks inner-best-sacks) inner-best-sack-value)]
>                [else (values best-sacks best-sack-value)]))))))
>
> -------------------------------------------------------------------------------
>
> (define (fill-sack items volume-left weight-left sack sack-value)
>    (match items
>      ['() (values (list sack) sack-value)]
>      [(cons (and (item _ _ item-val weight volume) item) items)
>       (define max-q-wgt (floor (/ weight-left weight)))
>       (define max-q-vol (floor (/ volume-left volume)))
>       (for/fold ([best (list sack)] [best-val sack-value])
>                 ([n (exact-round (add1 (min max-q-vol max-q-wgt)))])
>         (define-values [best* best-val*]
>           (fill-sack items
>                      (- volume-left (* n volume))
>                      (- weight-left (* n weight))
>                      (cons (cons n item) sack)
>                      (+ sack-value (* n item-val))))
>         (cond [(> best-val* best-val) (values best* best-val*)]
>               [(= best-val* best-val) (values (append best best*) best-val*)]
>               [else                   (values best best-val)]))]))
>
> -------------------------------------------------------------------------------
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130608/c0a261f2/attachment-0001.html>

Posted on the users mailing list.