Skip to content

RTX 5080 (GB203): Random Xid 79 with zero precursor — any load level, instant atomic GPU death, open kernel module 595.71.05 #1151

@luciaLebrun

Description

@luciaLebrun

NVIDIA Open GPU Kernel Modules Version

kmod-nvidia-595.71.05-1.fc44.x86_64

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Bazzite

Kernel Release

Linux bazzite 6.19.14

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA GeForce RTX 5080

Describe the bug

Symptom

Random Xid 79 (GPU has fallen off the bus) at completely unpredictable intervals — minutes to 9+ hours after boot. Reproducible under:

  • Heavy Vulkan/Proton gaming (Diablo IV)
  • Light/indie gaming (Hyper Light Drifter)
  • Desktop idle with only Spotify running
  • Locked screen with no active GPU workload

Crash presents as black screen + audio artifacts. Wayland compositor collapses. Hard reboot required.

Key log — Xid 79 with zero precursor

  May 18 19:02:39 kernel: NVRM: GPU at PCI:0000:2b:00: GPU-0b11d014-1685-04bd-26ed-5fc0bc1c44ae
  May 18 19:02:39 kernel: NVRM: Xid (PCI:0000:2b:00): 79, GPU has fallen off the bus.
  May 18 19:02:39 kernel: NVRM: GPU 0000:2b:00.0: GPU has fallen off the bus.
  May 18 19:02:39 kernel: NVRM: krcRcAndNotifyAllChannels_IMPL: RC all channels for critical error 79.
  May 18 19:02:39 kernel: NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed!
  [x35 repetitions]
  May 18 19:02:47 kernel: NVRM: GPU0 _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 10 sequence 137124
  [cascade of GspRmFree failures with status=0x0000000f]
  May 18 19:02:47 kernel: nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices

Zero kernel events in the 10 seconds before 19:02:39. No AER errors anywhere in the boot. The GPU disappears atomically with no
software-visible trigger.

In a different boot the same crash was preceded only by 9+ hours of clean operation (last NVIDIA log entry was Steam GPU detection at
session start).

What has been ruled out

Fix attempted Result
NVreg_EnableGpuFirmware=0 Ignored — open module on Blackwell enforces GSP
pcie_aspm=off kernel arg No effect
PCIe Gen 3 forced in BIOS No effect
GPU power limit 300W (nvidia-smi -pl 300) No effect
nvidia-persistenced + min clock lock (500 MHz floor) No effect
PCIe AER hardware errors None present — physical link never drops

Additional observations

  • nvidia-powerd exits at boot with ERROR! Running on an unsupported system (PCI device Id: 0x2c02) — RTX 5080 is absent from its
    supported device list, leaving Dynamic Boost entirely unmanaged
  • No Xid other than 79 has ever appeared
  • The failure mode matches issue RTX 5090 (GB202): GSP heartbeat timeout -> Xid 109/8 under Vulkan load via Proton (595.58.03, 590.48.01) #1080 (GB202, GSP silent death under Proton/Vulkan) but extends to near-zero GPU load, suggesting the
    GSP firmware on GB20x can die spontaneously regardless of command queue pressure
  • status=0x0000000f (NV_ERR_GPU_NOT_FULL_POWER) in all post-crash GspRmFree calls

The crash also occurs on Windows 11 with the official NVIDIA Windows driver — identical black screen + hard reboot required. The issue is absent on an RTX 2070 in the same system (same slot, same PSU, same everything), confirming this is specific to the RTX 5080 (GB203/Blackwell) and not an environment issue.

To Reproduce

  1. Use the system normally or leave it idle
  2. After an unpredictable interval (minutes to 9+ hours), Xid 79 fires with no preceding warning — display goes black, hard reboot required

Notes:

  • There is no reliable on-demand trigger — the crash occurs under any load level including near-zero (locked screen, Spotify in background)
  • Crash does not reproduce on RTX 2070 in the same system (same slot, PSU, and motherboard), confirming the issue is specific to GB203/Blackwell silicon or this GPU's GSP firmware
  • Zero kernel events precede the Xid 79 — the GPU vanishes atomically with no AER errors, no GSP heartbeat timeout log, and no PCIe link drop

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions