Skip to content

Conversation

@lmb
Copy link
Owner

@lmb lmb commented Dec 1, 2025

The ebpf-go CI has been plagued by a non-deterministic hang of unit tests. It affects all packages and manifests as a write to stdout getting stuck, followed by the test timing out. This triggers a goroutine dump, which in turn unblocks the stuck write to stdout.

Its possible to reproduce this behaviour using the following commandline:

taskset -c 0 vimto -smp cpus=2 -kernel ghcr.io/cilium/ci-kernels:6.15.3
exec -- sh -c 'seq 1 1000000 | while read i; do echo "line $i"; done'

After a few seconds the output will freeze. Inspecting the stack of the executing program shows something like the following:

[<0>] wait_port_writable+0x139/0x2d0
[<0>] port_fops_write+0x88/0x130
[<0>] vfs_write+0xf3/0x450
[<0>] ksys_write+0x6d/0xe0
[<0>] do_syscall_64+0x9e/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f
1 0x1 0x7ffdf4878c80 0x9 0x0 0x0 0x0 0x7ffdf4878c20 0x7f592daed77e

As far as I can tell it is critical that execution is restricted to a single CPU on the host side, while qemu presents two vCPU to the VM.

Passing ioeventfd=off to the serial console device works around this problem.

See cilium/ebpf#1734 for more details.

@lmb lmb force-pushed the stdout-write-hang branch from 1f2b09a to d883528 Compare December 1, 2025 19:00
The ebpf-go CI has been plagued by a non-deterministic hang of
unit tests. It affects all packages and manifests as a write to
stdout getting stuck, followed by the test timing out. This
triggers a goroutine dump, which in turn unblocks the stuck write
to stdout.

Its possible to reproduce this behaviour using the following
commandline:

    taskset -c 0 vimto -smp cpus=2 -kernel ghcr.io/cilium/ci-kernels:6.15.3 \
      exec -- sh -c 'seq 1 1000000 | while read i; do echo "line $i"; done'

After a few seconds the output will freeze. Inspecting the stack of
the executing program shows something like the following:

    [<0>] wait_port_writable+0x139/0x2d0
    [<0>] port_fops_write+0x88/0x130
    [<0>] vfs_write+0xf3/0x450
    [<0>] ksys_write+0x6d/0xe0
    [<0>] do_syscall_64+0x9e/0x1a0
    [<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f
    1 0x1 0x7ffdf4878c80 0x9 0x0 0x0 0x0 0x7ffdf4878c20 0x7f592daed77e

As far as I can tell it is critical that execution is restricted to
a single CPU on the host side, while qemu presents two vCPU to the VM.

Passing ioeventfd=off to the serial console device works around
this problem.

See cilium/ebpf#1734 for more details.
@lmb lmb force-pushed the stdout-write-hang branch from d883528 to 4e07dbd Compare December 1, 2025 19:05
@lmb lmb merged commit 475850d into main Dec 1, 2025
4 checks passed
@lmb lmb deleted the stdout-write-hang branch December 1, 2025 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants