[racket] Struct fields in struct-info + match enhancements

From: J. Ian Johnson (ianj at ccs.neu.edu)
Date: Wed May 14 12:35:06 EDT 2014

tl;dr if you use struct-info in your programs, I might break them. Please continue reading.

I had a PR a while ago suggesting a change to struct-copy due to its unhygienic nature with fields. It did not go through since there wasn't enough information in the struct-info to separate the struct-name and the field-name. Because struct-info does not have a procedural-only interface, changing it to instead or also hold the individual field identifiers would be backwards incompatible. However, I also expect that struct-info manipulation outside of core Racket is rare.

Is there anyone out there that would be affected by a this change that would be unwilling to make slight modifications to support the new struct-info?
I ask not because of struct-copy itself, but for an additional enhancement to racket/match: named field selection from structs instead of positional only.
I'm getting bitten by pervasive refactoring woes whenever I add fields to structs. All of my match patterns must change to have an extra _ somewhere.

My proposal for match is to change the pattern language in a backwards-compatible way:
The two forms
(struct-id pat ...)	
(struct struct-id (pat ...))
will get optional keyword arguments #:first, and #:last that will match named fields appropriately, and unnamed patterns will be matched positionally from either the first field, or the (#fields - #unnamed patterns)th field.

(struct-id op-first-or-last-kw pat-or-named-pat ...)
(struct struct-id op-first-or-last-kw pat-or-named-pat ...)
op-first-or-last-kw ::=
op-first-or-last-kw ::= #:first | #:last
pat-or-named-pat ::= [#:field field-id pat] | pat

If first-or-last-kw is not given, named patterns are only allowed if all patterns are named.
If they are given, then there do not have to be as many patterns as there are fields.

Names that clash with the positions will do either of the following, depending on popular opinion: 
(A) the name will be treated as positionally correct, and further patterns skip past the clashing named patterns.
Ex: for (struct A (w x y z)), 
  (match (A 0 1 #t 2) [(A #:first 0 [#:field y y*] 1) y*]   [_ #f]) ==> #t
  (match (A 0 1 #t 2) [(A #:first 0 [#:field y y*] 1 2) y*] [_ #f]) ==> #t
  (match (A #f 0 #t 1) [(A #:last [#:field y y*] 0 1) y*] [_ #f]) ==> #t
  (match (A 'x 0 #t 1) [(A #:last [#:field y y*] x 0 1) (cons x y*)] [_ #f]) ==> '(x . #t)
(B) clashes have confusing behavior when refactoring. Syntax error.
Ex: all the above would error, but the following are still allowed
  (match (A 0 1 2 3) [(A #:first 0 [#:field y 2]) #t] [_ #f]) ==> #t
  (match (A 0 1 2 3) [(A #:last 3 [#:field y 2]) #t] [_ #f]) ==> #t
(C) some kind of option to match to prefer (A) or (B) behavior?

Hygiene-wise, field identifiers are interpreted in the context of the struct identifier's context, and not the local context. Thus we can bind x to #t and still name the x field of the A struct. This might be better dealt with via delta-transformers, but I'm not sure. Matthew, Carl or Ryan would be better judges of that.

Thanks,
-Ian

Posted on the users mailing list.