[racket] require url?

From: Eli Barzilay (eli at barzilay.org)
Date: Sat Apr 9 00:12:21 EDT 2011

Earlier today, Noel Welsh wrote:
> It has been talked about a lot, but no-one has implemented this
> feature AFAIK. Should be possible, as require is extensible.

I suggested it a while ago, and there are still some questions that
are difficult to answer.  One issue is caching -- you'll obviously
need some cache, since files can be accessed many times (eg, use a
simple macro that prints a line and run it in drracket), and the
question is what caching policy should be used.  The main problem is
how to differentiate the developer from the user -- in the latter case
it's probably fine to poll the URL every some time (maybe once every
hour, possibly with a way for the author to control the frequency),
but the developer should be able to have a very high refresh frequency
to allow fixing bugs and checking them.  That's probably not too
problematic if the system is completely transparent and works exactly
the same for files from a url and local files -- this way the
developer can test files directly on the FS, while users get the
hourly (or whatver) updates.

Another minor issue is making people aware of it all: what I wanted to
get to is a system that makes it easy to share code for running *and*
to look at.  This means that the cache should live in some visible
place, unlike planet that stores files in an awkward place and in a
complex hierarchy.  Ideally, this could even be hooked to drracket so
instead of openning a file you'd open a URL and that will show you the
cached file, which you can read through and run (as well as running it
from another file, of course).  But a more difficult problem here is
what to do in case of edits -- the obvious thing to do is to "break"
the connection to the URL, so the file will no longer get updated from
the web.  But in this case, how should the user know about all of
this?  I really wanted this to be a super lightweight way for
disributing code, which prohibits additional command-line tools etc.
Also, the user should probably be made aware of the polling behavior,
which could imply privacy issues (I make some very useful utility, and
now I can look in my logs and see when you're working and your IP) --
although it's becoming more common to just do these polls anyway...

And probably the most difficult part is how to deal with multiple
files: I write one file that looks like:

  #lang racket
  (require "blah.rkt")

and Racket should know that "blah.rkt" should be retreived from the
same URL base.  I'm not sure if this is possible to do without any
low-level support -- but I don't want to end up with yet another
specialized set of hooks for this functionality.  (Planet ended up
having hooks in too many places, IMO, especially the setup code has a
lot of code for it.)  "Obviously", one way to get this is if Racket
would treat URLs as files at the lower level, doing the fetching and
the polling so that everything works transparently, but this is also
an "obviously" bad idea to do that kind of work at that level (and
that's without even trying to make up reasons, I'm sure that Matthew
can write books on why this would be a bad idea).


Earlier today, Neil Van Dyke wrote:
> 
> Could be useful.  You'd have to use caution in what URLs you
> reference, since it's pretty much executing arbitrary code.  And,
> unless you were always on a trusted network including trusted DNS,
> you'd want to use HTTPS and a trusted CA list.  At that point, it
> becomes less lightweight.

Good point.  At some point I considered this a non-problem since it
shouldn't be different than me putting out some code in any form and
suggesting that you run it.  That's of course true for any kind of
executable code -- but a little less problematic in this case since
you can see the source.  However, the real problem is with malicious
third party hacking -- I put out a useful piece of code which many
people use, then someone hacks my web pages and replaces it with
malicious code and everyone gets their code updated and run it without
their knowledge.

There is a good solution for this that I recently saw -- the way that
chrome extensions are distributed.  The summary of what I understand
from it is: an extension is a zip file with a prefix holding two
things -- a public key for the extension, and a signature for the zip
file that was done using the private key.  Clients can now use the
public key and verify that the zip file was written by whoever holds
the matching private key.  This is not helping against malicious code
that I explicitly install -- but of course there's almost nothing to
do against that if you want to avoid a central blessing authority
(which google implements for chrome extensions).

The nice thing is how extensions are updated: when I want to update an
extension I wrote, I put out a file with the new contents, and I
attach the *same* public key, and use it to create a signature for the
new contents.  So now clients can be sure that the new file was
created by the same author (assuming the private key was not stolen).
So essentially that public key becomes the global identifier for the
extension, and is actually used in extension URLs.

AFAICT (and that's not saying much) this works well for the security
aspect -- and for a commercial setting you can use the usual http
security for the initial installation, or have a different way to
install it.  The thing is that it fits planet(2) well, but it won't
work for the lightweightedness that this (require (url ...)) is
supposed to be, and I don't have a good idea how to do that.  (... in
a way that doesn't require a special packaging tool etc -- just the
minimal copying of the file(s) to the webserver's directory.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the users mailing list.