Friday, November 29, 2013

On PDiffs Usefulness for Stable

Axel just said that PDiffs are really useful for stable because it changes seldom. The truth is that there are no PDiffs for stable. I consider this a bug because you need to download a huge blob on every point release even though only few stanzas changed.

The real problem of PDiffs is the count that needs to be applied. We run "dinstall" much more often now than we used to and we cannot currently skip diffs in the middle when looking at a dak-maintained repository. It would be more useful if you could fetch a single diff from some revision to the current state and apply this instead of those (download, apply, hash check)+ cycles. 56 available index diffs are also not particularly helpful in this regard. They cover 14 days with four dinstalls each, but it takes a very long time to apply them, even on modern machines.

I know that there was ongoing work improving PDiffs that wanted to use git to generate the intermediate diffs. The current algorithm used in dak does not require old versions except one to be kept, but keeping everything is the far other end of the spectrum where you have a git repository that potentially gets huge unless you rewrite it. It should be possible to just implement a rolling ring buffer of Packages files to diff against, reprepro already generates the PDiffs in the right way, the Index format is flexible enough to support this. So someone would just need to make that bit of work in dak.