Skip to content

RTX 5060 Ti eGPU unable to init, falls of the bus immediately #974

@Cyrille37

Description

@Cyrille37

NVIDIA Open GPU Kernel Modules Version

nvidia-headless-580-open

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 24.04.3 LTS

Kernel Release

Linux NS5x-NS7xAU 6.14.0-36-generic #36~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 15 15:45:17 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

PNY OC 16G Geforce RTX 5060 Ti

Describe the bug

preamble: I use the PCI brige "Cyid TB3-HL7" connected with a Thunderbolt 4 cable. It works fine with a Gigabyte Windforce OC 12G GForce RTX 3060 with driver nvidia-headless-580-open nvidia-dkms-580-open.

I've just buy a PNY OC 16GB Geforce RTX 5060 Ti to replace the previous, but it does not work as you will see in nvidia-bug-report.log.gz

To Reproduce

  • power on the laptop
  • plug the 5060 on the PCI Bridge (Cyid TB3-HL7)
  • plug power wire to 5060
  • plug power wire to PCI Bridge
  • plug the Thunderbolt 4 wire on the bridge then the laptop
    then
  • nvidia-msi does not find the card
  • many error in the system journal

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

Installed packages :

$ dpkg --get-selections | grep -i nvidia
libnvidia-cfg1-580:amd64			install
libnvidia-common-580				install
libnvidia-compute-535:amd64			deinstall
libnvidia-compute-580:amd64			install
libnvidia-decode-580:amd64			install
libnvidia-gpucomp-580:amd64			install
libnvidia-ml-dev:amd64				install
nvidia-cuda-dev:amd64				install
nvidia-dkms-580-open				install
nvidia-driver-assistant				install
nvidia-firmware-580				install
nvidia-headless-580-open			install
nvidia-headless-no-dkms-580-open		install
nvidia-kernel-common-580			install
nvidia-kernel-source-580-open			install
nvidia-modprobe					install
nvidia-persistenced				install
nvidia-utils-580				install

Here are some strange lines which I don't understand :

kernel: pci 0000:03:00.0: bridge window [mem size 0x24400000 64bit pref]: can't assign; no space
...
kernel: pci 0000:03:00.0: bridge window [io  size 0x3000]: can't assign; no space

I've identified this error lines :

kernel: [drm] [nvidia-drm] [GPU ID 0x00000500] Loading driver
kernel: NVRM: GPU at PCI:0000:05:00: GPU-ab296f23-e6a6-a23b-b6c1-33f9b813df84
kernel: NVRM: Xid (PCI:0000:05:00): 79, GPU has fallen off the bus.
kernel: NVRM: GPU 0000:05:00.0: GPU has fallen off the bus.
kernel: NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all channels for critical error 79.
...
kgspBootstrap_GH100: GSP-FMC reported an error while attempting to boot GSP: 0xffffffff
_kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP
_kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset)
RmInitAdapter: Cannot initialize GSP firmware RM
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions