-
Notifications
You must be signed in to change notification settings - Fork 647
Support for custom snap delta algorithm #16295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…ormats Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
The version in 22.04/24.04 has buffer overflow bug when working with pseudo file definition and snap 40MB or bigger Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
|
Tue Dec 16 17:30:43 UTC 2025 No spread failures reported |
zyga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First partial pass. I spent some time on the new bits in squashfs but I think you need to re-do it without the whole wait group and goroutunes. Please feel free to ping me for interactive session.
Separately from this, it needs a design review. I would suggest booking a meeting with @pedronis, @alfonsosanchezbeato to discuss that.
| func (s stat) User() string { return s.user } | ||
| func (s stat) Group() string { return s.group } | ||
|
|
||
| func ParseCompression(id uint16, mksqfsArgs []string) ([]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of those can be trivial function assignments:
var (
ParseCompression = parseCompression
...
)
build-aux/snap/snapcraft.yaml
Outdated
| source-subdir: squashfs-tools | ||
| make-parameters: | ||
| - INSTALL_PREFIX=${CRAFT_PART_INSTALL}/usr | ||
| override-pull: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My go-to solution for override-pull on my projects is:
override-pull: |
craftctl default
# Set defaults
grade=devel
tag="$(git describe --tags --abbrev=0)" || true
hash="$(git rev-parse --short HEAD)"
# Check for tagged version
if [ -n "$tag" ]; then
count="$(git rev-list "$tag".. --count)"
if [ "$count" -eq 0 ]; then
version="$tag"
grade=stable
else
version="$tag+git$count.$hash"
fi
else
count="$(git rev-list HEAD --count)"
version="0+git$count.$hash"
fi
# Relay back to snapcraft
craftctl set grade="$grade"
craftctl set version="$version"
echo "$version" >.versionHere we could drop the entire logic and just keep the checkout to the given sha, that corresponds with the tagged release.
I'm sharing the snippet for information purpose only. Please delete override-pull entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the grade is a really nice touch. I will use that elsewhere.
I am using ls-remove as some of the repos are too large to do a full clone, so it's worth listing tags, especially if I am only interested in the tagged version. So I copied the snippet from some of my other snaps...
build-aux/snap/snapcraft.yaml
Outdated
| echo "building tag: ${tag}" | ||
| git checkout "${tag}" | ||
| stage: | ||
| - usr/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to strip the binaries. Please handle that in override stage or similar.
snap/squashfs/squashfs.go
Outdated
| }) | ||
| } | ||
|
|
||
| // run unsquashfs source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the two goroutines? Just make a pair of processes, pipe them directly or with your fifos and run both. No threads required. There's also no need for osutil.RunWithContext as this is just os/exec.CommandContext (since go 1.7)
snap/squashfs/squashfs.go
Outdated
| deltaPipe := pipePaths[1] | ||
|
|
||
| // Run concurrent processes | ||
| var wg sync.WaitGroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same story. You just don't need this.
snap/squashfs/squashfs.go
Outdated
| } | ||
|
|
||
| // handleApplyDelta applies the smart delta file. | ||
| func ApplySnapDelta(sourceSnap, delta, targetSnap string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take a context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created context at the start of the function now
None of the calling functions (store download, snap client) seems to have context wihin the scope.
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…ented support Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…e kernel deltas Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…elper Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
we are deliberatelly avoiding staging delta generating tool (hdiffz) as that is something we do not require on the device and saves us ~1MB compressed Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…r convention Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
This reverts commit 66f644f.
Adding support for a custom snap delta algorithm
SNAPDENG-36094
Updated on 16th December 2025: added hdiffz support
Currently, delta updates are facilitated by xdelta3 between two snap revisions. While xdelta3 is an efficient delta tool, it does not perform well on compressed packages, such as SquashFS.$\rightarrow$ computes delta $\rightarrow$ compresses resulting delta.
Why is that? Ideally, a compressed package should have close to the maximum theoretical entropy, so comparing two "ideal" random sets is destined to generate a delta size close to the source or target set.
xdelta3 acknowledges this, and if it recognizes source and targets as compressed packages, it automatically unpacks
However, xdelta3 does not support SquashFS as a "compressed package," perhaps due to its complexity, or the SquashFS pseudo file definition (more on that later) was not supported back then.
As a result, snap delta with xdelta3 gives results all over the park; it can be very small for snaps with loads of small files (base snaps), but can be quickly derailed if more changes are within the snap, causing the data to shift, all the way to snaps with few big files (snapd) where it fails badly and the resulting delta is often close to the size of the source snap.
So how can we improve, taking into consideration embedded use cases, and the fact that the reassembled snap has to be bit-identical to the target so assertions or dm-verity merkle tree are still valid?
squashfs-tools supports the so-called pseudo file definition, which is an uncompressed representation of the SquashFS content. This includes things like file names, owner, mode, time, plus the file's binary content. This is much more suitable content to run xdelta3 on. As a matter of fact, results are usually from 50% to less than 10% of what we have today.
But this approach has a few challenges:
Another problem can be fixed by the introduction of an own custom delta superblock with the required information to restore the target SquashFS with the exact parameters as the original. This information can be lifted from the target SquashFS superblock. At the same time, the fact that snaps are packed with
snap pack ...means that supported use cases are relatively limited. This allows us to omit support for custom compression arguments, e.g., a custom zstd dictionary.Custom snap-delta algorithm
Implementation
Introduction of an own delta header file can resolve most of the limitations from SquashFS. The proposal is based on the PoC from
https://github.com/kubiko/squashfs-delta
Custom snap-delta header structure:
Magic number and version are to identify the snap-delta and its version.
Delta tool identifies what delta tool was used to generate the delta.
Time stamp, compression, and super block flags are directly copied from the target snap, ensuring the same mksquashfs parameters are used when recreating the snap.
The implementation automatically detects a plain xdelta3 delta based on its own magic number and applies it, providing clean backwards compatibility.
Limitations
Other options considered
bsdiff:
projg2-squashdelta:
bsdiff size comparison:
projg2 size comparison: (snap-delta refers to the proposed solution)
Comparison to plain xdelta3
In general, the improvement is to achieve 50% to less than 10% of comparable xdelta3 results. Observations from various tests:
Tuning of direct xdelta3 on two snaps
No noticeable improvement was observed for various options: source window size, input window size, secondary compression algorithm, compression level.
Tuning of xdelta3 on proposed solution
Time requirements
Delta Generation
Time to generate delta can vary, but is generally slower than xdelta3, depending on the nature of the input snap.
Applying Delta
Time to apply delta can vary significantly, mostly because of the diversity of the target hardware (hw), from single/dual-core low-power arm systems to typical x86 systems.
Use-case considerations
Considering the significant time to apply delta on the low-end systems, the proposed delta algorithm would not always be the right answer. On standard desktop systems, the time to apply the delta is still significantly higher than xdelta3, possibly still impacting user experience. Further tests should be done to determine the percentage impact on the whole refresh experience.
Taking the mentioned constraints into account, perhaps the following approach could be considered:
snap refresh ..) would use the existing xdelta3 for a better user experience.Some other wins
Possible future improvement
As the delta works on the pseudo file definition, and the target snap is essentially recreated with correct arguments, we can consider a future where the target device would choose a compression which is more suitable for the application (e.g., zstd over xz).
As long as the snap store generates a corresponding snap revision assertion for a given compression.
Added on 16th of December 2025
hdiffz/hpatchz support
hdiffzpromises further improvement over the xdelta3 tool without the lofty processing and memory demand of thebsdiff, while also providing a slew of tuning options, and even a streaming option, though the source has to be still be available as a file.Streaming support is only provided internally when only a portion of the source file is loaded into the working memory. This limits us with the options for processing of the squashfs pseudo definition stream. The solution is to first parse psedo definition header, which contains the sizes of the offset of each of the file within the stream. Like this, we can compare each file individually and feed it back to the delta or target stream. Using this strategy means we can avoid using a large temp file for the entire pseudo definition, as well as memory usage. Memory required is proportional to the size of the largest file within the squashfs.
This solution still has limitations in detecting candidates for comparison if the library version changes, resulting in two different filenames. Often, the case is with browser snaps. A similar problem is with kernel snap, when the kernel modules path contains the kernel version, while the file name remains the same. To improve delta efficiency fuzzy matching score is introduced, building a score from dirname, basename, size and the offset from the current place in the stream. If the resulting score is high enough, files are matched as related, and a delta is calculated between them.
Gained delta improvements over
xdelta3streamed option.It is also important to consider an increase in the size of the snapd, snap binaries, as well as snapd snap. For the current draft PR, increases are as follows