16 feb 2011
‘binary deployment’ seems to be a good and fast solution nowadays (i’m talking about open source here). but what prove do i have to check if the source code was modified before compiled and signed (say by downstream::debian)?
Note: you can replace debian by any other distribution doing ‘binary deployment’ (it is just an example).
this is very much distribution dependent. in general this workflow is used:
downstream could simply add another ‘evil’ patch in step (3) but when the package got created, the source patch could be removed to hide the modification. this has happended already, see [2]. if the user wants to prevent such a situations there is a limited set of options. he could:
i’ve been plying with nix lately and as nix is a ‘purely functional package manager’ this implies that step (2) effects are minimized as components don’t interfere. as a result this means: if you clone the original build chain, you could expect the same outcome using the same input. so i experimented with two components:
the results are very promising as:
Edit: it turns out that there was some research on this topic already, see [3] page 30. I quote it and hightlight some passages:
To ascertain how well these measures work in preventing impurities in NixOS, we performed two builds of the Nixpkgs collection6 on two different NixOS machines. This consisted of building 485 non-fetchurl derivations. The output consisted of 165927 files and directories. Of these, there was only one file name that differed between the two builds, namely in mono-1.1.4: a directory gac/IBM.Data.DB2/1.0.3008.37160 7c307b91aa13- d208 versus 1.0.3008.40191 7c307b91aa13d208. The differing number is likely derived from the system time. We then compared the contents of each file. There were differences in 5059 files, or 3.4% of all regular files. We inspected the nature of the differences: almost all were caused by timestamps being encoded in files, such as in Unix object file archives or compiled Python code. 1048 compiled Emacs Lisp files differed because the hostname of the build machines were stored in the output. Filtering out these and other file types that are known to contain timestamps, we were left with 644 files, or 0.4%. However, most of these differences (mostly in executables and libraries) are likely to be due to timestamps as well (such as a build process inserting the build time in a C string). This hypothesis is strongly supported by the fact that of those, only 42 (or 0.03%) had different file sizes. None of these content differences have ever caused an observable difference in behaviour.
i used a prefix installation of nix on gentoo. i set the store path to something like ‘~/mynix/store’ so that every program needs to be recompiled (nix limitation/feature). afterwards i did:
nix-env -i apache-httpd
ls store| grep apache-httpd
cp -R store/gyp2arhqcglbq6iq1hndclljs7v9n30k-apache-httpd-2.2.17/ apache1
nix-env -e apache-http
nix-env --delete-generations old
nix-store --delete store/gyp2arhqcglbq6iq1hndclljs7v9n30k-apache-httpd-2.2.17/
and then do it again but copy to apache2/ instead. next start the comparing.
as it seems that the timestamps are the only problems, here are some thoughts how to overcome this:
write a compare utility which ignores timestamps (of course one has to find such regions first)
always freeze the clock when compiling and setting it to a fixed time: this could be done by altering the libc library using LD_LIBRARY_PATH to map a indirection layer to the syscalls used for time/date things. remapping syscalls is nothing new (‘trickle is a portable lightweight userspace bandwidth shaper’ uses it). NOTE: this might have unknown side effects and needs to be evaluated as a fixed time will interfere with:
adding a PACKAGE_MANAGER_BUILD_TIME variable to the build environment. this implies one would either have to alter the buildchain (gcc timestamps) or one would have to patch upstream’s source dependent where that timestamp is applied. but the effect would be that the same timestamp is used resulting in a 1:1 match