Skip to content

Conversation

@kubiko
Copy link
Contributor

@kubiko kubiko commented Nov 21, 2025

Adding support for a custom snap delta algorithm

SNAPDENG-36094

Updated on 16th December 2025: added hdiffz support

Currently, delta updates are facilitated by xdelta3 between two snap revisions. While xdelta3 is an efficient delta tool, it does not perform well on compressed packages, such as SquashFS.
Why is that? Ideally, a compressed package should have close to the maximum theoretical entropy, so comparing two "ideal" random sets is destined to generate a delta size close to the source or target set.
xdelta3 acknowledges this, and if it recognizes source and targets as compressed packages, it automatically unpacks $\rightarrow$ computes delta $\rightarrow$ compresses resulting delta.

However, xdelta3 does not support SquashFS as a "compressed package," perhaps due to its complexity, or the SquashFS pseudo file definition (more on that later) was not supported back then.
As a result, snap delta with xdelta3 gives results all over the park; it can be very small for snaps with loads of small files (base snaps), but can be quickly derailed if more changes are within the snap, causing the data to shift, all the way to snaps with few big files (snapd) where it fails badly and the resulting delta is often close to the size of the source snap.

So how can we improve, taking into consideration embedded use cases, and the fact that the reassembled snap has to be bit-identical to the target so assertions or dm-verity merkle tree are still valid?
squashfs-tools supports the so-called pseudo file definition, which is an uncompressed representation of the SquashFS content. This includes things like file names, owner, mode, time, plus the file's binary content. This is much more suitable content to run xdelta3 on. As a matter of fact, results are usually from 50% to less than 10% of what we have today.

But this approach has a few challenges:

  • Uncompressed snap content is big, usually 3x the snap size, and we would need 3 times that.
  • The Pseudo file definition has no info about SquashFS's used compression, or time of creation.
  • Luckily, xdelta3 can use pipes for both input and output, and so can mksquashfs and unsquashfs when operating with the pseudo file definition. This allows us to run xdelta3 on the pseudo definition file without actually ever unpacking the whole thing.

Another problem can be fixed by the introduction of an own custom delta superblock with the required information to restore the target SquashFS with the exact parameters as the original. This information can be lifted from the target SquashFS superblock. At the same time, the fact that snaps are packed with snap pack ... means that supported use cases are relatively limited. This allows us to omit support for custom compression arguments, e.g., a custom zstd dictionary.


Custom snap-delta algorithm

Implementation

Introduction of an own delta header file can resolve most of the limitations from SquashFS. The proposal is based on the PoC from
https://github.com/kubiko/squashfs-delta
Custom snap-delta header structure:

# |       32b    |   16b   |     16b    |    32b     |     16b     |        16b        |
# | magic number | version | delta tool | time stamp | compression | super block flags |

Magic number and version are to identify the snap-delta and its version.
Delta tool identifies what delta tool was used to generate the delta.
Time stamp, compression, and super block flags are directly copied from the target snap, ensuring the same mksquashfs parameters are used when recreating the snap.
The implementation automatically detects a plain xdelta3 delta based on its own magic number and applies it, providing clean backwards compatibility.

Limitations

  • Conscious decision to omit support for custom SquashFS compression arguments, as they are not used by snaps at the moment and added complexity does not justify the effort. This does not mean limited support for compression algorithms! e.g., xz, zstd, lzo, all those are supported.
  • squashfs-tools included in the snapd snap (Ubuntu 22.04 archive) have a buffer overflow error when working on snaps larger than $\approx$20MB. The upstream version of squashfs-tools (4.7.4) has this fixed. Therefore the PR is vendoring the latest stable version. The version in the Ubuntu 24.04 archive still has the same bug.
  • CPU heavy: both delta generation and applying of the delta are comparatively heavy on the CPU in comparison to xdelta3.

Other options considered

bsdiff:

  • Unsuitable for anything larger than "few" MB.
  • Very slow encoder (for 500MB 15 to 30 mins), slow to medium for decoder.
  • 17x input size RAM requirement (1GB snap $\rightarrow$ 10GB RAM)
  • The decoder still requires 4x input size RAM, but can act under memory pressure with increased time required
  • bsdiff does not support streaming, so the RAM requirement cannot be sidestepped.
  • But probably the worst: delta size on snaps is no different to what xdelta3 produces in a fraction of the time.
  • When running bsdiff on the pseudo file definition, the delta for snapd snap is ~ 1/2 to xdelta3 (18MB -> 9MB)

projg2-squashdelta:

  • Does not support other than lzo compression, though support for other compression could be added.
  • Use of large temporary files, 2x uncompressed SquashFS content on the encoder, 1x on the decoder side.
  • A 3rd party solution without a clear advantage.
  • Produced sizes comparable to the proposed solution, but seems less predictable from sample lzo snaps tests.

bsdiff size comparison:

delta-core20-2603-2672-bsdiff              16M
delta-core20-2603-2672-xdelta3             16M
delta-core22-2134-2140-bsdiff              6.9M
delta-core22-2134-2140-xdelta3             6.8M
delta-core24-1197-1226-bsdiff              11M
delta-core24-1197-1226-xdelta3             11M
delta-snapd-25585-25839-bsdiff             40M
delta-snapd-25585-25839-xdelta3            40M

projg2 size comparison: (snap-delta refers to the proposed solution)

# delta sizes
delta-chromium-3286-3293-projg2            15M
delta-chromium-3286-3293-snap-delta        19M
delta-chromium-3286-3293-xdelta3           21M

delta-chromium-3293-3287-projg2            89M
delta-chromium-3293-3287-snap-delta        89M
delta-chromium-3293-3287-xdelta3          134M

delta-firefox-7177-7250-projg2           239M   <- outlier +30%
delta-firefox-7177-7250-snap-delta       188M
delta-firefox-7177-7250-xdelta3          156M   <- outlier (xdelta3 the best)

delta-firefox-7177-7259-projg2            42M
delta-firefox-7177-7259-snap-delta        42M
delta-firefox-7177-7259-xdelta3           73M

# source, target sizes
chromium_3286.snap                        184M
chromium_3287.snap                        176M
chromium_3293.snap                        184M
firefox_7177.snap                         250M
firefox_7250.snap                         285M
firefox_7259.snap                         251M

Comparison to plain xdelta3

In general, the improvement is to achieve 50% to less than 10% of comparable xdelta3 results. Observations from various tests:

  • Snap with many small files (core{20,22,24...}): around 10% of the reference xdelta3 delta.
  • Snap with few large files (go binaries, e.g., snapd): under 50% of the reference xdelta3 delta.
  • Snap with already compressed files (pc-kernel, kernel container, modules, firmware all compressed): negligible gain.
  • Kernel snap: uncompressed kernel image and modules: $\approx$50% of the reference xdelta3 delta.
  • Kernel snap: no initrd, uncompressed kernel image and modules: <25% of the reference xdelta3 delta.
  • Gadget snap: <25% of the reference xdelta3 delta.
  • Example size comparison: (snap-delta refers to the proposed solution)
# delta sizes
delta-core20-2603-2672-snap-delta              1.5M
delta-core20-2603-2672-xdelta3                 16M
delta-core22-2134-2140-snap-delta              986K
delta-core22-2134-2140-xdelta3                 6.8M
delta-core24-1197-1226-snap-delta              541K
delta-core24-1197-1226-xdelta3                 11M
delta-arm-gadget-4-5-snap-delta                238K
delta-arm-gadget-4-5-xdelta3                   1.1M
delta-arm-kernel-1006-1008-snap-delta          13M
delta-arm-kernel-1006-1008-xdelta3             35M
delta-arm-kernel-no-initrd-1006-1008-snap-delta  2.5M
delta-arm-kernel-no-initrd-1006-1008-xdelta3   12M
delta-snapd-25585-25839-snap-delta             18M
delta-snapd-25585-25839-xdelta3                40M

# source, target sizes
core20_2603.snap                               60M
core20_2672.snap                               60M
core22_2134.snap                               69M
core22_2140.snap                               69M
core24_1197.snap                               62M
core24_1226.snap                               62M
arm-gadget_4.snap                              1.2M
arm-gadget_5.snap                              1.2M
arm-kernel-no-initrd_1006.snap                 22M
arm-kernel-no-initrd_1008.snap                 22M
arm-kernel_1006.snap                           48M
arm-kernel_1008.snap                           48M
snapd_25585.snap                               45M
snapd_25839.snap                               45M

Tuning of direct xdelta3 on two snaps

No noticeable improvement was observed for various options: source window size, input window size, secondary compression algorithm, compression level.

# -9: compression level, -Sdjw/-Slzma: secondary compression alg, -Bxxxx: source window size, -Wxxxx: input window size
delta-arm-kernel-1006-1008-xdelta3                      36560987
delta-arm-kernel-1006-1008-xdelta3-9                    36559138
delta-arm-kernel-1006-1008-xdelta3-9-B67108864          36559067
delta-arm-kernel-1006-1008-xdelta3-9-B67108864-W16777216  36558866
delta-arm-kernel-1006-1008-xdelta3-9-Sdjw               36559391
delta-arm-kernel-1006-1008-xdelta3-9-Slzma              36559067
delta-arm-kernel-1006-1008-xdelta3-9-W16777216          36558866

Tuning of xdelta3 on proposed solution

  • Only compression level seems to have meaningful improvement ($\approx$10% down). From default 3 to 7, going beyond 7 does not provide additional gain.
  • Source window size or input window size do not seem to have a meaningful effect.
  • On Ubuntu, default lzma secondary compression is superior to alternative djw.
# -{3,5,6,7,9}: compression level, -Sdjw/-Slzma: secondary compression alg
# -Bxxxx: source window size, -Wxxxx: input window size, -Pxxxx: compression duplicates window
delta-snapd-25585-25839-snap-delta-3                    20M
delta-snapd-25585-25839-snap-delta-5                    20M
delta-snapd-25585-25839-snap-delta-6                    18M
delta-snapd-25585-25839-snap-delta-7                    18M
delta-snapd-25585-25839-snap-delta-7-P67108864          18M
delta-snapd-25585-25839-snap-delta-9                    18M
delta-snapd-25585-25839-snap-delta-9-B67108864          18M
delta-snapd-25585-25839-snap-delta-9-B67108864-W16777216  18M
delta-snapd-25585-25839-snap-delta-9-Sdjw               20M
delta-snapd-25585-25839-snap-delta-9-Slzma              18M
delta-snapd-25585-25839-snap-delta-9-Snone              24M
delta-snapd-25585-25839-snap-delta-9-W16777216          18M

Time requirements

Delta Generation

Time to generate delta can vary, but is generally slower than xdelta3, depending on the nature of the input snap.

# xdelta3 vs snap-delta
pc-kernel:                     40.7s vs 41.3s
snapd:                          9s vs 16s
core24:                         2.7s vs 2.4s
arm-kernel:                     8.3s vs 6.4s

Applying Delta

Time to apply delta can vary significantly, mostly because of the diversity of the target hardware (hw), from single/dual-core low-power arm systems to typical x86 systems.

# xdelta3 vs snap-delta
# typical x86 system
pc-kernel:                     0.6s vs 8.6s
snapd:                         0.2s vs 3.7s
core24:                        0.1s vs 4.4s
arm-kernel:                    0.1s vs 2.4s

# low end arm system (2 x Cortex A55)
snapd:                         0.4s vs 1m 36s
core24:                        0.4s vs 2m 8s
arm-kernel:                    0.6s vs 1m

# compared sizes
pc-kernel                      208MB
core24                          62M
snapd                           45M
arm-kernel                      48M

Use-case considerations

Considering the significant time to apply delta on the low-end systems, the proposed delta algorithm would not always be the right answer. On standard desktop systems, the time to apply the delta is still significantly higher than xdelta3, possibly still impacting user experience. Further tests should be done to determine the percentage impact on the whole refresh experience.
Taking the mentioned constraints into account, perhaps the following approach could be considered:

  • All automatic background refreshes would use the new, more optimised approach, as there is no visible user impact.
  • User invoked refreshes (e.g., snap refresh ..) would use the existing xdelta3 for a better user experience.
  • If a very slow network is detected (e.g., <1 Mbit/s), the new approach could still be attempted.
  • Snapd would have a setting to choose from 3 options: xdelta3, snap-delta, or auto mode, allowing users to lock to a particular algorithm based on their own preferences.

Some other wins

  • The new approach generates a tiny to few bytes delta if the snap changes its compression method (e.g., xz $\rightarrow$ lzo). This is a case where xdelta3 would fail badly.
  • No new tooling is introduced.

Possible future improvement

As the delta works on the pseudo file definition, and the target snap is essentially recreated with correct arguments, we can consider a future where the target device would choose a compression which is more suitable for the application (e.g., zstd over xz).
As long as the snap store generates a corresponding snap revision assertion for a given compression.

Added on 16th of December 2025

hdiffz/hpatchz support

hdiffz promises further improvement over the xdelta3 tool without the lofty processing and memory demand of the bsdiff, while also providing a slew of tuning options, and even a streaming option, though the source has to be still be available as a file.
Streaming support is only provided internally when only a portion of the source file is loaded into the working memory. This limits us with the options for processing of the squashfs pseudo definition stream. The solution is to first parse psedo definition header, which contains the sizes of the offset of each of the file within the stream. Like this, we can compare each file individually and feed it back to the delta or target stream. Using this strategy means we can avoid using a large temp file for the entire pseudo definition, as well as memory usage. Memory required is proportional to the size of the largest file within the squashfs.
This solution still has limitations in detecting candidates for comparison if the library version changes, resulting in two different filenames. Often, the case is with browser snaps. A similar problem is with kernel snap, when the kernel modules path contains the kernel version, while the file name remains the same. To improve delta efficiency fuzzy matching score is introduced, building a score from dirname, basename, size and the offset from the current place in the stream. If the resulting score is high enough, files are matched as related, and a delta is calculated between them.
Gained delta improvements over xdelta3streamed option.

delta-core20-2603-2672-sd-hdiffz               792K
delta-core20-2603-2672-sd-xdelta3              1.5M
delta-core20-2603-2672-xdelta3                 1.5M

delta-core22-2134-2140-sd-hdiffz               758K
delta-core22-2134-2140-sd-xdelta3              986K
delta-core22-2134-2140-xdelta3                 6.8M

delta-core24-1197-1226-sd-hdiffz               425K
delta-core24-1197-1226-sd-xdelta3              541K
delta-core24-1197-1226-xdelta3                 11M

delta-core26-72-77-sd-hdiffz                   3.3M
delta-core26-72-77-sd-xdelta3                  4.1M
delta-core26-72-77-xdelta3                     16M

delta-firefox-7421-7474-sd-hdiffz              22M
delta-firefox-7421-7474-sd-xdelta3             35M
delta-firefox-7421-7474-xdelta3                63M

delta-firefox-7421-7503-sd-hdiffz              32M
delta-firefox-7421-7503-sd-xdelta3             43M
delta-firefox-7421-7503-xdelta3                76M

delta-firefox-7421-7504-sd-hdiffz              110M
delta-firefox-7421-7504-sd-xdelta3             202M
delta-firefox-7421-7504-xdelta3                151M

delta-gnome-46-2404-125-145-sd-hdiffz          43M
delta-gnome-46-2404-125-145-sd-xdelta3         422M
delta-gnome-46-2404-125-145-xdelta3            404M

delta-gadget-4-5-sd-new-hdiffz                 161K
delta-gadget-4-5-sd-xdelta3                    238K
delta-gadget-4-5-xdelta3                       1.1M

delta-kernel-1006-1008-sd-hdiffz               16M
delta-kernel-1006-1008-sd-xdelta3              13M
delta-kernel-1006-1008-xdelta3                 35M

# kernel without initrd
delta-kernel-mini-1006-1008-sd-hdiffz          1.8M
delta-kernel-mini-1006-1008-sd-xdelta3         2.5M
delta-kernel-mini-1006-1008-xdelta3            12M

delta-snapd-25585-25839-sd-hdiffz              7.9M
delta-snapd-25585-25839-sd-xdelta3             18M
delta-snapd-25585-25839-xdelta3                40M

It is also important to consider an increase in the size of the snapd, snap binaries, as well as snapd snap. For the current draft PR, increases are as follows

usr/bin/snap:         21760k -> 21824k -> +64k
usr/lib/snapd/snapd:  28672k -> 28736k -> +64k
snapd snap:           44240k -> 44676k -> +412k

Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…ormats

Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
The version in 22.04/24.04 has buffer overflow bug when working with pseudo file definition
and snap 40MB or bigger

Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
@github-actions github-actions bot added the Needs Documentation -auto- Label automatically added which indicates the change needs documentation label Nov 21, 2025
@github-actions
Copy link

github-actions bot commented Nov 21, 2025

Tue Dec 16 17:30:43 UTC 2025
The following results are from: https://github.com/canonical/snapd/actions/runs/20276755857

No spread failures reported

@kubiko kubiko marked this pull request as draft November 21, 2025 19:11
Copy link
Contributor

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First partial pass. I spent some time on the new bits in squashfs but I think you need to re-do it without the whole wait group and goroutunes. Please feel free to ping me for interactive session.

Separately from this, it needs a design review. I would suggest booking a meeting with @pedronis, @alfonsosanchezbeato to discuss that.

func (s stat) User() string { return s.user }
func (s stat) Group() string { return s.group }

func ParseCompression(id uint16, mksqfsArgs []string) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of those can be trivial function assignments:

var (
 ParseCompression = parseCompression
 ...
)

source-subdir: squashfs-tools
make-parameters:
- INSTALL_PREFIX=${CRAFT_PART_INSTALL}/usr
override-pull: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My go-to solution for override-pull on my projects is:

        override-pull: |
            craftctl default
            # Set defaults
            grade=devel
            tag="$(git describe --tags --abbrev=0)" || true
            hash="$(git rev-parse --short HEAD)"
            # Check for tagged version
            if [ -n "$tag" ]; then
                count="$(git rev-list "$tag".. --count)"
                if [ "$count" -eq 0 ]; then
                    version="$tag"
                    grade=stable
                else
                    version="$tag+git$count.$hash"
                fi
            else
                count="$(git rev-list HEAD --count)"
                version="0+git$count.$hash"
            fi
            # Relay back to snapcraft
            craftctl set grade="$grade"
            craftctl set version="$version"
            echo "$version" >.version

Here we could drop the entire logic and just keep the checkout to the given sha, that corresponds with the tagged release.

I'm sharing the snippet for information purpose only. Please delete override-pull entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the grade is a really nice touch. I will use that elsewhere.
I am using ls-remove as some of the repos are too large to do a full clone, so it's worth listing tags, especially if I am only interested in the tagged version. So I copied the snippet from some of my other snaps...

echo "building tag: ${tag}"
git checkout "${tag}"
stage:
- usr/bin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to strip the binaries. Please handle that in override stage or similar.

})
}

// run unsquashfs source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the two goroutines? Just make a pair of processes, pipe them directly or with your fifos and run both. No threads required. There's also no need for osutil.RunWithContext as this is just os/exec.CommandContext (since go 1.7)

deltaPipe := pipePaths[1]

// Run concurrent processes
var wg sync.WaitGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story. You just don't need this.

}

// handleApplyDelta applies the smart delta file.
func ApplySnapDelta(sourceSnap, delta, targetSnap string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take a context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created context at the start of the function now
None of the calling functions (store download, snap client) seems to have context wihin the scope.

@kubiko kubiko changed the title Support for custom snap delta algorythm Support for custom snap delta algorithm Nov 27, 2025
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…ented support

Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…e kernel deltas

Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…elper

Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
we are deliberatelly avoiding staging delta generating tool (hdiffz)
as that is something we do not require on the device and saves us ~1MB compressed

Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
…r convention

Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Signed-off-by: Ondrej Kubik <ondrej.kubik@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Documentation -auto- Label automatically added which indicates the change needs documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants