[racket-dev] consistency in names and signatures

From: Eli Barzilay (eli at barzilay.org)
Date: Wed Mar 28 09:34:40 EDT 2012

Previous message: [racket-dev] consistency in names and signatures
Next message: [racket-dev] consistency in names and signatures
Messages sorted by: [date] [thread] [subject] [author]

Yesterday, Andy Gocke wrote:
> [...]  This is especially relevant for functions like string->number
> because the most obvious implementation checks validity during
> parsing -- checking the validity and parsing basically duplicate the
> function.

And that makes most of my point.

The thing is that `string->url' is basically *just* a parser -- it
does very little after matching the regexp.  I therefore view the
commit as adding a contract to, say, `read-xml', where the contract
runs the function to see that the input is valid.

An even more extreme example would be `get-pure-port': if you really
want a complete specification of the domain in a contract, then the
contract should make sure that the server is reachable, and that it
returns a valid page.  Combine this with parsing the page, and how
this is not really a great way to run code (ATM!) should be clear.
Besides the issue of doing a bunch of work twice, the contract would
still be broken since having a valid server and/or a page now doesn't
mean that it's going to be valid on the next attempt.  To make this
practical, you'd need some way to expose values that are computed
as part of the contract.  ("Reify" feels wrong to me in this
context...)

That's why I added the above "ATM".  There is an obvious appeal in
doing this -- having all error handling in specific pieces of code and
"floating" them upwards sounds tempting *if* there's some way to do it
right.  I suspect that such an exposure of the contract results is
just one small step in getting this.  I'm also not sure that it's
doable in a way that actually leads to a practical benefit.  This is
similar to me doubting the theoretical utility in running a parser
twice: on one level you get your guaranteed, nicely total function,
but on the level of providing that guarantee, you get the original
problem.  (And in terms that I'm used to, this is switching the same
work to your well-formedness goal, and that buys nothing in terms of
getting things done.)

IMO, this problem is fundamental enough that it shows up in many
contexts.  One of them is already visible in the `string->url'
example.  The new documentation reads:

  | url-regexp : regexp?
  |
  |   This is a regular expression based on the one in Appendix B of
  |   RFC 3986 for recognizing urls.  This is the precise regexp:
  |
  |   ^(?:([^:/?#]*):)?(?://(?:([^/?#@]*)@)?([^/?#:]*)?(?::([0-9]*))?)
  |   ?([^?#]*)(?:\?([^#]*))?(?:#(.*))?$

(Pre-disclaimer: the following is not said in a negative way.)

At least in my view, this documentation is is useless.  It's true that
it's precise, but as a user of this code, I get nothing out of it.  I
can't even *use* that regexp (the one quoted in the docs) since it
looks like something that can easily change, so I better use the
`url-regexp' binding and not the quoted regexp.

But the deeper reason that this is not useful to me is that it
essentially spells out the parser code -- and documenting a function
using its own code is (IMO) often a sign that the abstraction is
questionable.

But there are a few additional problems with this change that I see:

* Beyond quoting it in the documentation, exposing the regexp means
  that it becomes part of the interface.  This means that I now cannot
  re-implement the code in any way other than matching a regexp.

* It is still partial.  For example, this

    -> (string->url "1:/")
    ; Invalid URL string; bad scheme "1": "1:/"

  is still not a contract error.  (And I can't see an obvious way to
  add it to the regexp, maybe with some lookahead tricks.)

  Another example is the host part, which is not even checked, but
  this is just sloppiness (= deferring it to network errors that will
  happen with malformed hosts).  And BTW, doing that means that the
  contract becomes platform dependent:

    -> (file-url-path-convention-type 'unix)
    -> (url-host (string->url "file://x:&x/baz"))
    "x"
    -> (file-url-path-convention-type 'windows)
    -> (url-host (string->url "file://x:&x/baz"))
    ""

* More importantly, and possibly related to the first bullet, it
  stands in the way of improving this code.  There is a major problem
  in the design of the code -- it parses all urls as `http'.  A proper
  way to deal with it is to choose a specific parser based on the
  schema.  For example, as it looks now, I can't change it to properly
  treat "mailto:..." urls.

  That's not theoretical -- I planned on doing that extension, and now
  it is impossible to do it in a nice way.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.

Previous message: [racket-dev] consistency in names and signatures
Next message: [racket-dev] consistency in names and signatures
Messages sorted by: [date] [thread] [subject] [author]