-
Notifications
You must be signed in to change notification settings - Fork 216
Description
An example ringbuf:
nathanael@sam ~ $ pfexec humility -p 0483:3754:0039001D4741500920383733 -a /data/local/images/sidecar/d/sp/build-sidecar-d-image-default-v1.0.52.zip ringbuf sequencer
humility: attached to 0483:3754:0039001D4741500920383733 via ST-Link V3
humility: ring buffer drv_oxide_vpd::__RINGBUF in sequencer:
humility: ring buffer drv_packrat_vpd_loader::__RINGBUF in sequencer:
humility: ring buffer drv_sidecar_seq_server::__RINGBUF in sequencer:
NDX LINE GEN COUNT PAYLOAD
2 904 1 1 MainboardControllerId(0x1de5bae)
3 918 1 1 MainboardControllerChecksum(0x6407475e)
4 950 1 1 MainboardControllerVersion(0x283)
5 951 1 1 MainboardControllerSha(0x3c8d1c33)
6 952 1 1 FpgaInitComplete
7 31 1 1 LoadingClockConfiguration
8 977 1 1 ClockConfigurationComplete
9 228 1 1 FrontIOBoardPowerEnable(true)
10 245 1 1 FrontIOBoardPowerGood
11 982 1 1 FrontIOBoardPresent
12 81 1 1 LoadingFrontIOControllerBitstream { fpga_id: 0x0 }
13 91 1 1 FrontIOControllerIdent { fpga_id: 0x0, ident: 0x1deaa55 }
14 98 1 1 FrontIOControllerChecksum { fpga_id: 0x0, checksum: [ 0xd4, 0xaa, 0x2a, 0x16 ], expected: [ 0xd4, 0xaa, 0x2a, 0x16 ] }
15 81 1 1 LoadingFrontIOControllerBitstream { fpga_id: 0x1 }
16 91 1 1 FrontIOControllerIdent { fpga_id: 0x1, ident: 0x1deaa55 }
17 98 1 1 FrontIOControllerChecksum { fpga_id: 0x1, checksum: [ 0xd4, 0xaa, 0x2a, 0x16 ], expected: [ 0xd4, 0xaa, 0x2a, 0x16 ] }
18 340 1 1 TofinoSequencerTick(LatchOffOnFault, A2 { error: None })
19 154 1 1 FanModuleLedUpdate(Zero, On)
20 154 1 1 FanModuleLedUpdate(One, On)
21 154 1 1 FanModuleLedUpdate(Two, On)
22 154 1 1 FanModuleLedUpdate(Three, On)
23 340 1 3 TofinoSequencerTick(LatchOffOnFault, A2 { error: None })
24 245 1 1 FrontIOBoardPowerGood
25 328 1 1 FrontIOBoardPhyPowerEnable(true)
26 550 1 1 FrontIOBoardPhyOscGood
27 340 1 1 TofinoSequencerTick(LatchOffOnFault, A2 { error: None })
28 81 1 1 TofinoPowerUp
29 89 1 1 TofinoVidAttempt(0x0)
30 50 1 1 SetVddCoreVout(Volts(0.79))
31 107 1 1 TofinoVidAck
0 796 2 1 TofinoSequencerError(FpgaError)
1 340 2 2713 TofinoSequencerTick(LatchOffOnFault, A0 { pcie_link: false })
Near the end we got an "FpgaError", but stayed up in A0, but the tofino is essentially un-usable in this state as it hasn't been properly configured for SRIS and PCIe stuff.
@mkeeter put some investigation into the internal ticket:
I inspected the system on
sam, building a custom image with additional logging.The failing call is this
write_direct. It's failing because theTOFINO_DEBUG_PORT_STATEis invalid: it has a value of 0x24, which corresponds toreceive_buffer_empty | address_nack_error. Inwrite_direct, we also requirewrite_buffer_empty(bit 0) to be set; this is not the case, so it exits with an error.I'm not sure why this is happening, though. Seeing
address_nack_errorseems suspicious, but I'm not sure what kind of hardware issue would case this problem.(See also #1763, where failing to reset this register caused issues on Sidecar hot-resets. I don't think this is relevant, because I see the failure when powering on the Sidecar from off)
I have partially worked around this manifestation of the issue in PR #2325.
We should:
- increase the logging fidelity to show what is failing (not just "FpgaError")
- consider re-trying or otherwise clearing the transaction that was NACKd,
- Improve the overall system response to show that we're in an invalid Tofino state, probably by de-sequencing in this case since the Tofino is un-usable in this state. Without this we continue to allow up-stack stuff to try and eventually tip over for confusing reasons.