[racket] Size matters

From: Tim Brown (mightyfoo at gmail.com)
Date: Mon Jun 10 08:18:56 EDT 2013

Eli, Matthias, thanks to both of you for your comments on this code (for
whose creation I am responsible / to blame). I hope I'm not too out of
order addressing your responses together in this one email.

I thought I was doing pretty well when I changed the original "style" to
struct item from a whole load of c[ad]+r calls :-)

On 09/06/13 18:54, Matthias Felleisen wrote:
> I have taken the liberty to rewrite the code according to the Style
> gide, since I am the author of this document:

I've found this (and I have seen it before):
http://www.ccs.neu.edu/home/matthias/Style/style/index.html
but is it included in the Racket documentation?
If so, where?

> (define (fill-sack (items items) (volume-left 0.25) (weight-left 25) (sack null) (sack-value 0))
> I'd probably lift the initialization constants out to the top-level.

Style seems to demand:

> Similarly, if your function or method consumers two (or more) optional
> parameters, keyword arguments are a must.

So 5 optional parameters must have keywords. Becomes:

(define (fill-sack #:items (items items)
                    #:volume-left (volume-left 0.25)
                    #:weight-left (weight-left 25)
                    #:sack (sack null)
                    #:sack-value(sack-value 0))
...)

However, none of the parameters are actually optional (the specific
problem is expressed in the defaults); and it's clearer expressing those
with:
   (fill-sack items 0.25 25 null 0) ; stated problem

So, thanks for your "honesty" Matthias, but you're being too kind :-)

>    (cond
>      [(null? items) (values (list sack) sack-value)]
>      [else
>       (define item (first items))
>       ...
>       (define max-q-vol (floor (/ volume-left volume)))
          ^^^^^^
Ooh, I didn't realise I could put define in at this level.
I thought it was just top-level and the top-level inside a define.

>       (for/fold ((best-sacks (list sack)) (best-sack-value sack-value))
>         ((qty (in-range 0 (add1 (min max-q-vol max-q-wgt)))))
>         (define-values (best-sack* best-sack-value*)

The <identifier>* notation... is that Style?

To me, '*' always suggests "multiply" (pronounced "multiplee") or
"extended"; e.g. regexp-match and regexp-match*. (I remember a discussion
about the prime character (′) as a "value_n+1 of" suffix. That's a
massively impractical character to use, but I don't think anyone had
landed on a standard suffix by then.

> Now I understand that the three lines of extracting the content of the
> first struct are painful. While I understand and agree with the match here,
> I have also experimented with a struct-open construct for my latest

I have a habit of forgetting match (even though it is the single most
out-of-hole-digging construct I have found in a PL so far).

struct-open seems not to be generally available in racket ATM, so I'll
leave it to mature a little.

I assume that it is a "binding" construct.

I wonder whether I'd rather see it enclosing its lexical scope (like a
let). This is a reason I sometimes prefer let over define. Although it's
the very act of let enclosing its scope that causes its rightward drift.

> (define (fill-sack.v2 (items items) (volume-left 0.25) (weight-left 25) (sack null) (sack-value 0))
>     xxxxx
>       xxxxxxx xxxxxx xxxxxxx xxxxx xxxxx xxxxxxxxxxxx
>       xxxxx
>        xxxxxxx xxxxx xxxxxx xxxxxxx
>        xxxxxxxxxxxx xxxx xxxxx xxxxxx xxxxx xxxxxxx
>        xxxxxxxxx xxxxxxxxxxxx xxxxx xxxxxx xxxxxxxxxxxxxxxx xxxxxxxxxxxx
>          ((qty (in-range 0 (add1 (min (floor (/ weight-left weight)) (floor (/ volume-left volume)))))))
>          xxxxxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxxxxxxxx
>            xxxxxxxxxx xxxxx xxxxxx
>                       xx xxxxxxxxxxx xx xxx xxxxxxxx
>                       xx xxxxxxxxxxx xx xxx xxxxxxxx
>                       xxxxx xxxxx xxx xxxxxx xxxxx
>                       xx xxxxxxxxxx xx xxx xxxxxxxxx
>          xxxxx
>            xxx xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx
>             xxxxxxx xxxxxxxxxx xxxxxxxxxxxxxxxxxx
>            xxx xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx
>             xxxxxxx xxxxxxx xxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxxxxxxxxx
>            xxxxx xxxxxxx xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxx

> I am also less hesitant than you to go to 102 columns and, while this
> is a judgement call, I would probably not name the two floor expressions
> ifthey fit. So there, that's the code according to the Style.

I just experimented with the "shape" of the code above. The lines > 80
characters are shown literally, the others are x'ed out.

1. The two lines that exceed 80 characters are the over-long function
    declaration (which we have addressed before and reduces to 68
    characters). And the in-range loop.

    Because you have inlined the (floor (/ ..)) calls, and allowed the line
    to run towards 102 characters. It seems to me that you've disturbed the
    readability of the code. My eye has to stop and pull itself right to
    read that line.

    If the whole function (or even more lines of it) were 102 characters
    wide, that might not have given me such a problem. But with the new
    shape you may have added some significance to what I don't consider to
    be such a significant expression.

    This is not to do with size, but to do with shape.

2. The longest x'ed out line is 72 characters long (coincidentally one of
    the "conservative" line lengths quoted in the Style guide).

Is there a 102nd (or in Eli's case, 80th) "margin bar" (red line) in
DrRacket to warn coders of the Style width of their code? Or is the only
hint a manually drawn ;;----... ruler (which might be useful in the
"Insert" menu)?

> On Jun 8, 2013, at 3:49 PM, Eli Barzilay wrote:

>> (Possible irrelevant rambling, but I generally am very aware of my
>> code formatting, and this looked like a good point to highlight.  It's
>> still probably picking on something that not many people care about as
>> much as I do...)

Not at all irrelevant. I'm not so well practised in Racket Style, but I
have a much more developed style in C, which maximises (as best as I can)
readability, and is absolutely necessary for maintainability. TBH,
maintainable software is usually my primary objective; readability is a
necessary means to that end.

>> The style guide has a section labeled "size matters" -- it also has
>> another section on names, encouraging using readable names.  Both of
>> these are perfectly reasonable goals, but they can pull you in
>> different directions, and I ran into an example that demonstrates it
>> in a very nice way.  There is also Yaron's quote about favoring
>> readers, which is IMO more important than both of these -- since they
>> are just sub-points aiming towards that goal.

For maintenance purposes, I would rather put a little eye-strain (with
over-long identifiers) and boredom (with over-many identifiers) in the way
of my reader than have them need to refactor the code to introduce longer
identifier names and new bindings because they simply cannot digest it.

And of course I'd prefer perfect comprehension without eye-strain, boredom
or headaches than any of that!

I keep thinking back in this respect to what Matthias has done folding the
(floor (/ ...)) expressions directly into the in-range. How complex would
this expression have to be before it was broken out into new variable?

[I say this even though the max-q-wgt line is neither commented, nor is
the variable particularly well named]

>> I have recently experimented more with taking code compactness more
>> seriously.  I still keep my own code under 80 characters, which I know
>> many people don't like, but on the other hand, I always try to fill
>> the lines as much as possible while maintaining readability.  The idea

Maybe it's the way my eye is trained, but I agree that 60/70/80 characters
is more easily read than 102 -- but that might be taste.

>> I can also see the recent tendency to use `define' where possible as
>> something that goes to this end too: prefer using the same kind of
>> binding construct to make it easier to read code.

Is this driven by consistency of binding or stopping rightward drift?
The Style guide seems to state that rightward drift is the driver.

>> So to get to my point, there's the decision of what name to use for
>> your identifiers some people (*ahem*) stick to an always-readable
>> names, to the point of not using names like `i', `n', and `x'.  I'm on
>> the complete other side of this -- I strongly prefer these names
>> because they make code shorted and therefore easier to read.

The question is... how many vowels can you take from a word while keeping
it rcgnzbl? :-)

In the example, you rename qty to n. In part, I agree with you that short
names need not be meaningless... but one reason I had qty in the first
place was that I thought I would need another loop variable somewhere in
the function. One i, one n and one x are about the limit for short variable
names (and quite often all I ever need). In C, I find that if I have a
nested loop with a:

for(i = ...) {
   for(j = ...) {
     ...
   }
}

I run a very high risk of confusing one of my i's for a j, an m for an n
or a p for a q. Generally, since these things come in pairs of similar
looking and sounding letters, it's possible to impair readability with
these short variable names.

>> The example that made me post this is below (no need to look up who
>> wrote the original code, it's irrelevant since it

[ No need cos he reads this list and he'll pop his head above the parapet
   anyway :-) ]

>> *is* following many of the style's guidelines)

>> In my revision, the much shorter names are less readable, but on the
>> flip side they allow laying out the code in a way that makes it much
>> more obvious, and since I started this by just doing mechanical
>> transformations, it was surprising to see that what the shrunk code is
>> doing becomes very clear.  This clarity has the obvious advantages,
>> which is why it's so important to "favor readers".

The structure of the loop code is still, strangely, not what I'd expect. I
don't know what I would have expected, and the code is definitely shaped
in a way that reflects the structure of the loop. It's just not quite what
I thought I'd see.

>> (define (fill-sack items volume-left weight-left sack sack-value)
>>   (match items
>>     ['() (values (list sack) sack-value)]
>>     [(cons (and (item _ _ item-val weight volume) item) items)
>>      (define max-q-wgt (floor (/ weight-left weight)))
>>      (define max-q-vol (floor (/ volume-left volume)))
>>      (for/fold ([best (list sack)] [best-val sack-value])
>>                ([n (exact-round (add1 (min max-q-vol max-q-wgt)))])
>>        (define-values [best* best-val*]
>>          (fill-sack items
>>                     (- volume-left (* n volume))
>>                     (- weight-left (* n weight))
>>                     (cons (cons n item) sack)
>>                     (+ sack-value (* n item-val))))
>>        (cond [(> best-val* best-val) (values best* best-val*)]
>>              [(= best-val* best-val) (values (append best best*) best-val*)]
>>              [else                   (values best best-val)]))]))

As an aside; is there any fundamentally better algorithm for doing this?
Is there any more "appropriate" racket construct for making this happen?

Thanks to both of you for the attention to this code and revisions you have
made. As ever, this has been a learning experience; both in Style and
content. I'll go back through my other RC submissions at some time and
Style them up.

Tim





Posted on the users mailing list.