10 aug 2014
source and binary distribution in linux distributions is outdated and nobody is doing anything [tm].
today i was packaging prosody [0] for nixos linux and wondered why
lua-socket 2.0.2 wasn’t working as expected. the solution was pretty
easy: it seems that 2.0.2 is just outdated and other distributions as
ubuntu [2] and fedora core [3] do use
[luasocket_3.0~rc1.orig.tar.gz]
or
lua-socket-3.0-0.6rc1.fc21.x86_64.rpm
respectively. also
both repos don’t state where they got the source code form which is not
only annoying but also dangerous since i now don’t know what
modifications either distribution included!
the issue: [1] does not have a 3.x release nor can i find the RCS used for development.
diego’s homepage [1] states:
Last modified by Diego Nehab on
Wed Oct 3 02:07:59 BRT 2007
now, what can we do?
seems like a abandoned project, as so often…
or a different example:
pkgs/development/libraries/minmay/default.nix states:
meta = {
homepage = "https://github.com/mazhe/minmay";
license = stdenv.lib.licenses.lgpl21Plus;
description = "An XMPP library (forked from the iksemel project)";
};
mazhe removed minmay, probably unknowingly that the nixos distribution still was using this minmay 1.0.0 release from this repo, see https://github.com/mazhe/minmay/. or maybe mazhe didn’t find any other release of minmay, forked it on github.com and made a release himself.
in case [1] is abandoned it would be a good idea to:
in fact, we as a community don’t have any good reliable way to have access to virtually any project over a large period of time.
first, let’s have a look at how software distribution is done in different linux distributions:
as shown in the picture above:
and most horrible yet:
shipping source/binary data must be redone.
i thought about mixing GIT with the torrent technology, so that every time a new release happens, we just add it to a giant GIT database. i also propose that we need a meta layer between upstream (the developers) and downstream (the distributions or endusers). i’d like to call this midstream and ubuntu’s launchpad, github.com and similar platforms are pretty close to what midstream would do.
favoured technologies:
wanted features:
free and open source (FLOSS)
we could standardize the deployment workflow this way
easy to scale
high reliability
signatures (like debian uses for their packages)
could be used for ‘binary distribution’ as well (my focus here is ‘source deployment’ though)
all linux distributions should download from there, instead directly from upstream. also, if a distribution downloads the code, it effectively (like in torrents) is now a new node also hosting the code for as long as users access the code
we could also host metadata and QA there, i thought about:
git clone --depth 1 --branch <branch> url
; this
avoids the usage of tar.gz|bz2|xv completelyi want to outline that there is apt-p2p [6] already, which is AFAIK used for binary distribution of deb files and not used for source distribution. i could be wrong though.
when packaging software for nixos, we do ‘source deployment’ on the developers machine. when the nix-expression gets into nixpkgs, hydra will do ‘source deployment’ again. and in most cases hydra will produce a binary substitute (a NAR file) which is lika a DEB file on ubuntu.
when doing ‘source deployment’, most often we would be using fetchurl, like below:
src = fetchurl {
url = "http://downloads.sourceforge.net/project/pdfgrep/${version}/${name}.tar.gz";
sha256 = "6e8bcaf8b219e1ad733c97257a97286a94124694958c27506b2ea7fc8e532437";
};
the given sha256
checksum is given by the packager and
every time someone needs to do ‘source deployment’ again, this sha256sum
is used to verify the download given by the url
. ebuilds
used by portage on gentoo also implement checksums of downloads this
way.
github.com, sourceforge.net and similar sites often create/recreate containers like .tar.gz or .zip on demand (cause unknown) and therefore the same container changes hashes over time, altough it still contains the same files. github.com features a new release system [4] which addresses this problem.
what we really need:
since we have to checksum the input for sanity/reproducibility we
can’t just rely on a md5sum/sha256sum of the (compressed?) container.
maybe we could ‘deserialize’ the (compressed?) archives into /nix/store,
then build a NAR file of it and use the NAR’s hash to compare it to
sha256
(doing so would remove all superfluous attributes
like (timestamps, ownership, file order in the container, compression
artifacts, compression mechanism, container format issues).
advantage/disatvantage:
pro:
con:
note: a different interesting approach would be to maintain a list of files per container with respective checksums per file. the overall checksum would be the sum of all the single checksums
we do need distribution agnostic source code storage systems which scale well and have a high reliability. we also need to alter the way we build the checksums of source code containers.
update: (18.8.2014) i've been talking to aszlig and he mentiones that we have <tarballs.nixos.org> which i didn't know of so if hydra was able to build the nix expression it would cache the tarball there. but i still don't think that these tarballs can be accessed by a normal user doing source deployment on his development machine. so in case a tarball changes hashes again, we still have to alter the checksum in the nix expression.
i would love to see more direct GIT usage in nixpkgs, since i think that doing explicit releases via containers is a waste of human time. this should be automated and containers releases should only be used to make source or binary deployment more efficient.