Skip to content

Conversation

@lmb
Copy link
Owner

@lmb lmb commented Dec 5, 2025

ebpf-go CI has been plagued by sporadic hangs, where tests simply time out while trying to write status information to stdout.

The bug manifests when issuing blocking writes to a virtio console while also polling it. The way we trigger the bug is quite involved:

  • init opens tha port via os.OpenFile. This sets O_NONBLOCK on the fd, and registers the os.File with the poller.
  • The port is passed to the child process via exec.Cmd.Stdout. This internally calls os.File.Fd(), which clears O_NONBLOCK but doesn't remove the file from the poller.
  • The child process receives a blocking stdout. Writing to it will issue a blocking write to the virtio-console port, specifically port_fops_write() in virtio_console.c.
  • port_fops_write() calls wait_port_writable(). This puts the calling thread to sleep if the virtqueue is full, by waiting on port->waitqueue.

We now enter the race window.

  • The host processes the guest's write, frees up some space in the virtqueue and issues an interrupt to the guest.
  • This interrupt races with a call to port_fops_poll() issued by the init process's Go runtime. That function invokes will_write_block(), which consumes all used buffers from the virtqueue.
  • The interrupt handler vring_interrupt() checks whether the virtqueue has any unused buffers via more_used(). Since all buffers have just been consumed by port_fops_poll() the interrupt is dropped.

At this point we still have a writer stuck in port_fops_write() waiting for a wakeup that never comes.

The workaround for this issue is to close the stdio file in init, thereby removing it from the runtime poller.

Fixes: #29

ebpf-go CI has been plagued by sporadic hangs, where tests simply
time out while trying to write status information to stdout.

The bug manifests when issuing blocking writes to a virtio console
while also polling it. The way we trigger the bug is quite involved:

- init opens tha port via os.OpenFile. This sets O_NONBLOCK on the
  fd, and registers the os.File with the poller.
- The port is passed to the child process via exec.Cmd.Stdout.
  This internally calls os.File.Fd(), which clears O_NONBLOCK
  but doesn't remove the file from the poller.
- The child process receives a blocking stdout. Writing to it
  will issue a blocking write to the virtio-console port,
  specifically port_fops_write() in virtio_console.c.
- port_fops_write() calls wait_port_writable(). This puts the calling
  thread to sleep if the virtqueue is full, by waiting
  on port->waitqueue.

We now enter the race window.

- The host processes the guest's write, frees up some space in the
  virtqueue and issues an interrupt to the guest.
- This interrupt races with a call to port_fops_poll() issued by
  the init process's Go runtime. That function invokes
  will_write_block(), which consumes all used buffers from
  the virtqueue.
- The interrupt handler vring_interrupt() checks whether the
  virtqueue has any unused buffers via more_used().
  Since all buffers have just been consumed by port_fops_poll()
  the interrupt is dropped.

At this point we still have a writer stuck in port_fops_write()
waiting for a wakeup that never comes.

The workaround for this issue is to close the stdio file in init,
thereby removing it from the runtime poller.

Fixes: #29
@lmb lmb merged commit fc25e09 into main Dec 5, 2025
6 of 8 checks passed
@lmb lmb deleted the virtio-console-race-workaround branch December 5, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

virtio-console: hang when reading / writing

2 participants