Skip to content

Conversation

@d-sonuga
Copy link
Contributor

@d-sonuga d-sonuga commented Sep 18, 2025

Resolves bytecodealliance/wasmtime#11545 and bytecodealliance/wasmtime#11544.

  • Add support for any, fixed-reg and stack-only branch arguments being defined in its branch instruction.
  • Remove over-constraint on reservation of registers for operands with any-reg constraints. Instead, use counters to decide when it is safe to allocate registers to operands with no constraints.
  • Correct allocd_within_constraint to account for allocation of clobbers to late phase registers.
  • Correct select_suitable_reg_in_lru to only allocate available pregs in both late and early phases to early defs and late uses.
  • Allocate late operands first, followed by early operands, instead of defs then uses.
  • Remove over-constraint on available registers by removing clobbers only from the late available register set.

Previously, operands with any-reg constraints all got their registers reserved as if they were all late uses so as to avoid a situation where a register meant for an operand valid for both early and late phases is allocated to an operand valid only in an early phase or a late phase, but not both, potentially leaving no valid registers for an early & late phase operand.
This is an over-constraint that led to this issue bytecodealliance/wasmtime#11544.
This is resolved by completely ditching the reservation of any-reg operands in favor of using counters to determine whether or not it is safe to allocate registers to operands with no constraints.

Another issue:

use v0 fixed(p0), def v1 fixed(p0), use late v0 any

In this scenario, p0 is fixed to both v1 and v0, but that shouldn't be a problem because they are in different phases. Prior to this PR, this was problematic because all defs were allocated first, then uses resulting in an allocation order in the above example that looked like this:

p0 -> v1 (this is a def, so it's freed and vreg_allocs[v1] is set to none)
p0 -> v0 (vreg_allocs[v0] = p0)
p0 -> v0 (vreg_allocs[v0] is p0, which is within constraints, so it is selected)

Which is incorrect. The root cause is that during allocation, vreg_allocs[vi] tells the current allocation of some register vi - but when the late v0 operand is being allocated, vreg_allocs[vi] tells the allocation of v0 in the early phase of the instruction, not the late phase, and since allocation proceeds in reverse, this is an incorrect order. It should always proceed from the late phase to the early phase. To resolve this, instead of all def operands being allocated first, then use operands, it's the late operands that are allocated first, followed by the early operands. This is still safe because the reason def operands were allocated first is because registers allocated to late def operands can be reused by early use operands, and in this processing order, this order will still remain this same.

Fuzzed overnight for 8-9 hours.
I also ran Wasmtime's tests. Most pass. The ones that didn't pass didn't seem to fail because of register allocation - for example, the disas test checks against hardcoded output.

…ers to late phase registers.

Ditch reservation of registers for reg-only operands for using a counters to determine whether or not registers should be allocated to operands with Any constraints
…ly from the late available register set.

Allocate in order of late operands then early instead of def then use
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for these fixes!

The "move in successor if predecessor def'd value in branch at fixed reg" exception to the always-in-spillslots-at-block-start invariant is kind of unfortunate, but I think I agree that it's the most pragmatic solution here. The only other way out seems to be to generalize the inter-block state, so we actually track locations of every value, which gives up a lot of the efficiency advantages that SSRA has.

…g a branch arg on the branch & improve code clarity of decrementing counters when allocating scratch registers
@cfallin cfallin merged commit 5c78ae4 into bytecodealliance:main Sep 18, 2025
6 checks passed
cfallin added a commit to cfallin/regalloc2 that referenced this pull request Sep 18, 2025
Includes fastalloc fixes from bytecodealliance#240 as well as a few miscellaneous
refactors/cleanups.
@cfallin cfallin mentioned this pull request Sep 18, 2025
abrown pushed a commit that referenced this pull request Sep 18, 2025
Includes fastalloc fixes from #240 as well as a few miscellaneous
refactors/cleanups.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Sep 18, 2025
github-merge-queue bot pushed a commit to bytecodealliance/wasmtime that referenced this pull request Sep 18, 2025
Pulls in fix for fastalloc from bytecodealliance/regalloc2#240
(thanks!).

Fixes #11544.
Fixes #11545.
bongjunj pushed a commit to prosyslab/wasmtime that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

single-pass regalloc: Corruption (?) with throw.wast

2 participants