[racket-dev] Packaging

From: Jay McCarthy (jay.mccarthy at gmail.com)
Date: Sat Feb 19 10:39:36 EST 2011

I've batch my responses to yesterday's questions together.

As a general note, I'd like to have my document be an accurate
reflection of what I should do when I start coding, so if you think I
should update it to clarify the answers to these questions, please let
me know. I'm blinded a bit by my intentions when I read it.

2011/2/18 Robby Findler <robby at eecs.northwestern.edu>:
> Minor comment: why encourage names like "libgtk" and "libgtk2" instead
> of a major and minor version number (ala PLaneT)? Don't we want those
> two libraries to be associated somehow (at least loosely)?

I think having libgtk and libgtk2 alleviates a confusion with the
"most recent version" policy. For example, I write my package when
there is only libgtk and I don't specify which version I want, because
any will do. Later libgtk gets upgraded and its second major version
is not fully backwards compatible. I don't think it is plausible for
the reader of my package metadata to know that I didn't know about the
new version and SHOULD have written a "major version 1 requirement",
but didn't.

By having a strict separation between major versions, I think it is
kinder to less specific package metadata. (If you look around PLaneT,
very few packages actually specify their dependencies with much
detail.)

As a slight social engineering, this policy also makes it easier to
provide alternatives to dead packages, because upgrading to an
alternative vendor is no harder than upgrading to the official
version. Also, it may discourage new major versions and therefore
encourage more compatibility.

> Also, it also isn't clear which of the complaints with PLaneT you're
> actually dealing with. I don't see anything about security for example
> or the discovery issue.

By moving package dependencies outside of modules and allowing a
system-wide installation, I attack the second-class-nes of PLaneT. By
a different default policy, I attack the upgrade path. By having a
more exposed directory structure and looser package hosting
constraints, I attack the improvement problem. On the social site by
having reputation, reviews, and tags, I hope to solve the discovery
problem. By allowing any URL (without a custom server) as a package
source, I hope to solve the centralization problem. By making the
packaging system more capable, I hope to decrease the size of the
core. Once that happens, those packages can be independently
upgraded. On the security defaults, by using URLs, we can use HTTPS
to get slightly better security. I mention having checksums on package
files to ensure the right file was got. By allowing private
repositories, you can ensure that you control your security destiny. I
solve the unintended installation problem by making package
installation separate from the runtime, where we could provide an
"apt-get" style of "these are the other packages that will be
installed". I solve the module privacy with the linking directories.

I have no solution for expensive installations (indeed in my final
open problems I mention how we would store compiled code at the
server.) The other part of installation---documentation---could be
solved the same way we do now (explicit option to not build) and have
better searching on the package site.

My solution for untrustworth packages is a rather weak "black balling".
I think sandboxing compilation is outside the short-term scope of this project.

> And one fairly abstract worry: can I end up in a situation where I had
> a working system with dependencies between packages being "obvious"
> but install a new package and fall out of the "obvious" realm in some
> potentially confusing way?

I think this is the most confusing situation:

- Install package A that requires B
- Develop a packageless program that uses B
- Upgrade to a new provider of B's features... B2

Your packageless program uses B2, but A still uses B, even though
you may think of them as the same package. I think the alternative
behavior, that A starts using a different provider, is worse though.

2011/2/18 Jos Koot <jos.koot at telefonica.net>:
> I read your contribution with great interest. One problem that is not
> addressed, as far as I have seen, is that any idiot, like me, can install
> his/her contributions (modules/collections/packages or whatever)
>
> For a simple windows 7 user as I it is rather difficult to use command line
> instructions. I plead for an easy to use gui for making
> contributions.

I think this is orthogonal to the project, but I do mention how
important it is to have an easy way of getting packages
post-installation. I was imagining that this would be a separate GUI
tool that was auto-launched on Windows installation and suggested on
"first launch" of DrRacket.

> I
> also plead for an option to delete contributions that have never been
> dowloaded by anyone else than the contributor (would that be possible?) I
> once contributed a module wrongly. I am sure it has never been dowloaded,
> but I have no option to correct my silly mistake. All I could do is to
> upload an empty version, but the former version is still there.

Under the distribution system I write, "Provide a simple interface to
remove and replace packages (including old versions) arbitrarily."

> As for security, would it be possible to sandbox a download with some
> restrictions/options of permissions (opening files, creating new files,
> deleting files, renaming files, calling foreign applications, enter internet
> or email, and so on)???

As I mention above, I don't have any solution to this, but I think
that it could be solved independently.


2011/2/18 YC <yinso.chen at gmail.com>:
> Thanks for having this out - this is a great start and a very important
> problem to solve
>
> Is it correct that *heap* maps to the account name in planet?  Such as
> jaymccarthy, schematics, or bzlib?

No. When I say a "user" heap, I mean a user on your
computer. Currently every account on a system-wide Racket installation
has their own set of PLaneT package installations (one for each
version of Racket they've used.) That's what I mean. This part of the
proposed implementation is meant to suggest:
- System installations are possible (big deal for courses)
- User installations are not versioned
- The number of installations is flexible for exotic requirements

> There is always tension between the naming by capability or author in
> package systems.  Do we have a preference one way or another?

I suggest that the PLT supported server require packages prefixed by
author to control conflicting package names. The packaging system
itself makes no requirement or suggestion other than that though.

> For all the modules that are currently in core, it might make sense to
> simply lump them under a *core* heap to simplify the reorganization, but
> it's clear that not all packages are strictly required, so it would
> eventually be great to separate them out:
>
> absolute core (scheme & racket language, no GUI)
> GUI
> other languages
> etc

Modulo the confusion of the heaps, yes the core should be split. I
write: "Once this infrastructure is in place, we should spend a good
amount of time breaking the core into packages that are distributed
optionally and separately. I believe this requires a good "post
install" to keep the batteries included, so to speak."

I think your list is a good start.

> Agreed that the installation/compilation takes way too long.  Would like the
> ability to turn off doc installation - I believe docs needs to be online,
> and easily updated, i.e. wiki format

In the short-run, I think we should solve this problem by having an
option to not install docs. Once we figure out the right way to get
trusted ZOs at the server[*] we could fold downloading compiled docs
into the same solution.

[*] BTW, while I don't want to get derailed on this point, I think the
main problem is that ZOs are version-specific so there would have to
be a lot at the server and it is not reasonable to ask package
maintainers to go through a manual process to update them on new versions.

> Another thing to improve the time of installation is to have pre-compiled
> binaries.  If this is a possibility, then perhaps binary-only package should
> also be considered.  That might not be the open source ideal, but can
> increase more adoption

Yes, mentioned above and in the document a few times.

> Planet packages also have problems with dependency breakage when the
> dependency is a development link.  So that will need to be
> addressed

I don't this problem would apply anymore, because (at least in my
proposed simple filesystem layout) development links are trivial to
get in the filesystem and packages are not necessarily public in general.

> It's hard to understand the Glue section - are you suggesting having the URL
> embedded into the require spec?  I hope I am reading it incorrectly, as such
> hardcoding should be avoided.

No. I am suggesting having an optional URL specified in the package
metadata. For example, in the Web Server package I have a module,
xexpr.rkt. It contains:

(require xml)

But in the Web Server's info.rkt it contains:

(define dependencies
 (list (pkg "xml" 40 "http://planet.racket-lang.org/pkgs/xml")))
; where the URL is a default location if the package is not already installed

rather than

(define dependencies
 (list (pkg "xml" 40)))
; where the default location is assumed to be the installation's
default (a la apt-get's repos)

or

(define dependencies
 (list (pkg "xml")))
; where the default is as above and the version is "most recent"


> The Linux repo (such as apt or yum) should be the preferred approach here -
> having a repo definition that defines where to search for packages.  Such
> approach allows for having planet mirrors and private repos

I agree.

2011/2/18 Robby Findler <robby at eecs.northwestern.edu>:
> One more comment: one of PLaneT's design goals what that if you have a
> working system and you install a new planet package, then you didn't
> break any of the working parts from before. The new system doesn't
> seem to have that as a design goal anymore (I noticed automatic
> upgrades and "freezing" being an explicit operation but there may be
> other places).
>
> Do you have a rationale for deviating from this seemingly nice
property?

I agree with Sam's explanation. I do think it is valuable to be able
to do this sometimes, which is why I think there should be an explicit

pkg freeze web-server

that ensures that the EXACT versions, with no upgrades, are used for
the currently installed web-server package until it is unfrozen.

2011/2/18 Gregory Woodhouse <gregwoodhouse at me.com>:
> If you do this, please at least make installing the full
documentation an option. Not everyone can ensure that they will always
be online when working with Racket.

Absolutely.

-- 
Jay McCarthy <jay at cs.byu.edu>
Assistant Professor / Brigham Young University
http://faculty.cs.byu.edu/~jay

"The glory of God is Intelligence" - D&C 93



Posted on the dev mailing list.