Skip to content

Conversation

@hawkw
Copy link
Member

@hawkw hawkw commented Jan 6, 2026

Currently, ereport metadata contains a hubris_archive_id field with the 8-byte image ID. Upstack software does not currently use this field for anything, at time of writing. @cbiffle would like to ensure that upstack software continues to not use it for anything in the future. Therefore, this commit removes it.

I'll replace it with a better identifier, such as the image name and Git SHA, in a subsequent commit.

@hawkw hawkw requested a review from cbiffle January 6, 2026 17:31
Currently, ereport metadata contains a `hubris_archive_id` field with
the 8-byte image ID. Upstack software does not currently use this field
for anything, at time of writing. @cbiffle would like to ensure that
upstack software continues to not use it for anything in the future.
Therefore, this commit removes it.

I'll replace it with a better identifier, such as the image name and Git
SHA, in a subsequent commit.
hawkw added a commit that referenced this pull request Jan 6, 2026
Follow-up from #2343

PR #2343 removes the `hubris_archive_id` field from ereport metadata, as
we have determined that this ought not be used to identify Hubris except
in the case of firmware updates. If this is being removed, though, we
really ought to have other metadata fields identifying the Hubris image.
Therefore, this commit adds fields from the caboose (in particular, the
`BORD`, `VERS`, and `GITC` tags) to the ereport metadata message. These
fields are read from the caboose every time metadata is refreshed, in
order to avoid buffering them in packrat, which would duplicate data
already in flash and didn't seem necessary as metadata refreshes occur
infrequently (on SP reset/MGS restart).

All of these fields are optional, and if any of them are not present or
could not be read successfully, we send a CBOR `null`. Additionally,
I've nested all of them under a `hubris_caboose` field, which is `null`
if the image has no caboose whatsoever. This way, we can differentiate
between images with no caboose and images where none of the tags we read
into the metadata message could be found. I'm open to being convinced
this is unnecessary, but it seemed worthwhile, and since the metadata
message doesn't compete for space in the ereport ringbuffer, we can be a
bit more verbose here, provided it fits in a UDP datagram.

Naturally, every app.toml where Packrat produces ereports needed to be
updated to allow packrat to read from the caboose. Packrat also uses a
bit more stack in order to do this.

For example, here's output from a Gimletlet with caboose fields
(including a fake version tag) in its ereport metadata:

```console eliza@hekate ~/Code/oxide/hubris $ faux-mgs --interface
eno1np0 --discovery-addr '[fe80::0c1d:deff:fef0:d922]:11111' ereports
Jan 06 10:19:21.564 INFO creating SP handle on interface eno1np0,
component: faux-mgs Jan 06 10:19:21.565 INFO initial discovery complete,
addr: [fe80::c1d:deff:fef0:d922%2]:11111, interface: eno1np0, socket:
control-plane-agent, component: faux-mgs restart ID:
aecfcbd7-4637-8a9a-ed0b-0d5b60a884e8 restart IDs did not match
(requested 00000000-0000-0000-0000-000000000000) count: 1

ereports: 0x1: { "baseboard_part_number": String("LOLNO000000"),
"baseboard_rev": Number(42), "baseboard_serial_number":
String("69426661337"), "ereport_message_version": Number(0),
"hubris_caboose": Object { "board": String("gimletlet-2"), "commit":
String("51dac3ec71877d330981cd5167a4aef5fb48311c-dirty"), "version":
String("42.69.420-eliza-test"), }, "hubris_task_gen": Number(0),
"hubris_task_name": String("packrat"), "hubris_uptime_ms": Number(0),
"lost": Null, } ```
@hawkw hawkw enabled auto-merge (squash) January 6, 2026 18:25
@hawkw hawkw added service processor Related to the service processor. psc Related to the power shelf controller gimlet cosmo SP5 Board fault-management Everything related to the Oxide's Fault Management architecture implementation labels Jan 6, 2026
Copy link
Collaborator

@cbiffle cbiffle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

@hawkw hawkw merged commit 3a2e358 into master Jan 6, 2026
168 checks passed
@hawkw hawkw deleted the eliza/archivent branch January 6, 2026 19:01
hawkw added a commit that referenced this pull request Jan 6, 2026
Follow-up from #2343

PR #2343 removes the `hubris_archive_id` field from ereport metadata, as
we have determined that this ought not be used to identify Hubris except
in the case of firmware updates. If this is being removed, though, we
really ought to have other metadata fields identifying the Hubris image.
Therefore, this commit adds fields from the caboose (in particular, the
`BORD`, `VERS`, and `GITC` tags) to the ereport metadata message. These
fields are read from the caboose every time metadata is refreshed, in
order to avoid buffering them in packrat, which would duplicate data
already in flash and didn't seem necessary as metadata refreshes occur
infrequently (on SP reset/MGS restart).

All of these fields are optional, and if any of them are not present or
could not be read successfully, we send a CBOR `null`. Additionally,
I've nested all of them under a `hubris_caboose` field, which is `null`
if the image has no caboose whatsoever. This way, we can differentiate
between images with no caboose and images where none of the tags we read
into the metadata message could be found. I'm open to being convinced
this is unnecessary, but it seemed worthwhile, and since the metadata
message doesn't compete for space in the ereport ringbuffer, we can be a
bit more verbose here, provided it fits in a UDP datagram.

Naturally, every app.toml where Packrat produces ereports needed to be
updated to allow packrat to read from the caboose. Packrat also uses a
bit more stack in order to do this.

For example, here's output from a Gimletlet with caboose fields
(including a fake version tag) in its ereport metadata:

```console eliza@hekate ~/Code/oxide/hubris $ faux-mgs --interface
eno1np0 --discovery-addr '[fe80::0c1d:deff:fef0:d922]:11111' ereports
Jan 06 10:19:21.564 INFO creating SP handle on interface eno1np0,
component: faux-mgs Jan 06 10:19:21.565 INFO initial discovery complete,
addr: [fe80::c1d:deff:fef0:d922%2]:11111, interface: eno1np0, socket:
control-plane-agent, component: faux-mgs restart ID:
aecfcbd7-4637-8a9a-ed0b-0d5b60a884e8 restart IDs did not match
(requested 00000000-0000-0000-0000-000000000000) count: 1

ereports: 0x1: { "baseboard_part_number": String("LOLNO000000"),
"baseboard_rev": Number(42), "baseboard_serial_number":
String("69426661337"), "ereport_message_version": Number(0),
"hubris_caboose": Object { "board": String("gimletlet-2"), "commit":
String("51dac3ec71877d330981cd5167a4aef5fb48311c-dirty"), "version":
String("42.69.420-eliza-test"), }, "hubris_task_gen": Number(0),
"hubris_task_name": String("packrat"), "hubris_uptime_ms": Number(0),
"lost": Null, } ```
hawkw added a commit that referenced this pull request Jan 6, 2026
Follow-up from #2343

PR #2343 removes the `hubris_archive_id` field from ereport metadata, as
we have determined that this ought not be used to identify Hubris except
in the case of firmware updates. If this is being removed, though, we
really ought to have other metadata fields identifying the Hubris image.
Therefore, this commit adds fields from the caboose (in particular, the
`BORD`, `VERS`, and `GITC` tags) to the ereport metadata message. These
fields are read from the caboose every time metadata is refreshed, in
order to avoid buffering them in packrat, which would duplicate data
already in flash and didn't seem necessary as metadata refreshes occur
infrequently (on SP reset/MGS restart).

All of these fields are optional, and if any of them are not present or
could not be read successfully, we send a CBOR `null`. Additionally,
I've nested all of them under a `hubris_caboose` field, which is `null`
if the image has no caboose whatsoever. This way, we can differentiate
between images with no caboose and images where none of the tags we read
into the metadata message could be found. I'm open to being convinced
this is unnecessary, but it seemed worthwhile, and since the metadata
message doesn't compete for space in the ereport ringbuffer, we can be a
bit more verbose here, provided it fits in a UDP datagram.

Naturally, every app.toml where Packrat produces ereports needed to be
updated to allow packrat to read from the caboose. Packrat also uses a
bit more stack in order to do this.

For example, here's output from a Gimletlet with caboose fields
(including a fake version tag) in its ereport metadata:

```console
eliza@hekate ~/Code/oxide/hubris $ faux-mgs --interface eno1np0 --discovery-addr '[fe80::0c1d:deff:fef0:d922]:11111' ereports
Jan 06 10:19:21.564 INFO creating SP handle on interface eno1np0, component: faux-mgs
Jan 06 10:19:21.565 INFO initial discovery complete, addr: [fe80::c1d:deff:fef0:d922%2]:11111, interface: eno1np0, socket: control-plane-agent, component: faux-mgs
restart ID: aecfcbd7-4637-8a9a-ed0b-0d5b60a884e8
restart IDs did not match (requested 00000000-0000-0000-0000-000000000000)
count: 1

ereports:
0x1: {
    "baseboard_part_number": String("LOLNO000000"),
    "baseboard_rev": Number(42),
    "baseboard_serial_number": String("69426661337"),
    "ereport_message_version": Number(0),
    "hubris_caboose": Object {
        "board": String("gimletlet-2"),
        "commit": String("51dac3ec71877d330981cd5167a4aef5fb48311c-dirty"),
        "version": String("42.69.420-eliza-test"),
    },
    "hubris_task_gen": Number(0),
    "hubris_task_name": String("packrat"),
    "hubris_uptime_ms": Number(0),
    "lost": Null,
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cosmo SP5 Board fault-management Everything related to the Oxide's Fault Management architecture implementation gimlet psc Related to the power shelf controller service processor Related to the service processor.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants