[racket-dev] consistency in names and signatures

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Wed Mar 28 10:32:13 EDT 2012


This post is minimally related to the topic of this thread. 
If you continue discussing this idea, please move it elsewhere. 

(I happen to agree with Eli's perspective here, which is why I 
started this particular thread. Racket makes it particularly 
convenient to avoid contracts here and Typed Racket makes it even
possible to type check such things AND to assign meaning to #f
results beyond the function body. BUT this is barely related to
my question.) 



On Mar 28, 2012, at 9:34 AM, Eli Barzilay wrote:

> Yesterday, Andy Gocke wrote:
>> [...]  This is especially relevant for functions like string->number
>> because the most obvious implementation checks validity during
>> parsing -- checking the validity and parsing basically duplicate the
>> function.
> 
> And that makes most of my point.
> 
> The thing is that `string->url' is basically *just* a parser -- it
> does very little after matching the regexp.  I therefore view the
> commit as adding a contract to, say, `read-xml', where the contract
> runs the function to see that the input is valid.
> 
> An even more extreme example would be `get-pure-port': if you really
> want a complete specification of the domain in a contract, then the
> contract should make sure that the server is reachable, and that it
> returns a valid page.  Combine this with parsing the page, and how
> this is not really a great way to run code (ATM!) should be clear.
> Besides the issue of doing a bunch of work twice, the contract would
> still be broken since having a valid server and/or a page now doesn't
> mean that it's going to be valid on the next attempt.  To make this
> practical, you'd need some way to expose values that are computed
> as part of the contract.  ("Reify" feels wrong to me in this
> context...)
> 
> That's why I added the above "ATM".  There is an obvious appeal in
> doing this -- having all error handling in specific pieces of code and
> "floating" them upwards sounds tempting *if* there's some way to do it
> right.  I suspect that such an exposure of the contract results is
> just one small step in getting this.  I'm also not sure that it's
> doable in a way that actually leads to a practical benefit.  This is
> similar to me doubting the theoretical utility in running a parser
> twice: on one level you get your guaranteed, nicely total function,
> but on the level of providing that guarantee, you get the original
> problem.  (And in terms that I'm used to, this is switching the same
> work to your well-formedness goal, and that buys nothing in terms of
> getting things done.)
> 
> IMO, this problem is fundamental enough that it shows up in many
> contexts.  One of them is already visible in the `string->url'
> example.  The new documentation reads:
> 
>  | url-regexp : regexp?
>  |
>  |   This is a regular expression based on the one in Appendix B of
>  |   RFC 3986 for recognizing urls.  This is the precise regexp:
>  |
>  |   ^(?:([^:/?#]*):)?(?://(?:([^/?#@]*)@)?([^/?#:]*)?(?::([0-9]*))?)
>  |   ?([^?#]*)(?:\?([^#]*))?(?:#(.*))?$
> 
> (Pre-disclaimer: the following is not said in a negative way.)
> 
> At least in my view, this documentation is is useless.  It's true that
> it's precise, but as a user of this code, I get nothing out of it.  I
> can't even *use* that regexp (the one quoted in the docs) since it
> looks like something that can easily change, so I better use the
> `url-regexp' binding and not the quoted regexp.
> 
> But the deeper reason that this is not useful to me is that it
> essentially spells out the parser code -- and documenting a function
> using its own code is (IMO) often a sign that the abstraction is
> questionable.
> 
> 
> But there are a few additional problems with this change that I see:
> 
> * Beyond quoting it in the documentation, exposing the regexp means
>  that it becomes part of the interface.  This means that I now cannot
>  re-implement the code in any way other than matching a regexp.
> 
> * It is still partial.  For example, this
> 
>    -> (string->url "1:/")
>    ; Invalid URL string; bad scheme "1": "1:/"
> 
>  is still not a contract error.  (And I can't see an obvious way to
>  add it to the regexp, maybe with some lookahead tricks.)
> 
>  Another example is the host part, which is not even checked, but
>  this is just sloppiness (= deferring it to network errors that will
>  happen with malformed hosts).  And BTW, doing that means that the
>  contract becomes platform dependent:
> 
>    -> (file-url-path-convention-type 'unix)
>    -> (url-host (string->url "file://x:&x/baz"))
>    "x"
>    -> (file-url-path-convention-type 'windows)
>    -> (url-host (string->url "file://x:&x/baz"))
>    ""
> 
> * More importantly, and possibly related to the first bullet, it
>  stands in the way of improving this code.  There is a major problem
>  in the design of the code -- it parses all urls as `http'.  A proper
>  way to deal with it is to choose a specific parser based on the
>  schema.  For example, as it looks now, I can't change it to properly
>  treat "mailto:..." urls.
> 
>  That's not theoretical -- I planned on doing that extension, and now
>  it is impossible to do it in a nice way.
> 
> -- 
>          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.