Skip to content

Conversation

@rical
Copy link
Contributor

@rical rical commented Dec 9, 2025

Description

Dump operational datastore to timestamped JSON snapshots every 5 minutes (in /var/lib/statd/). The operational.json symlink always points to the latest snapshot.

Implement hierarchical retention policy that keeps the first snapshot of each time period (hour/day/week/month/year), providing fine-grained recent history while preventing unbounded disk usage.

This will allow us to plot / track how the system state evolves as well as give us somewhat fine-grained info in the case of an event, such as a crash.

Add unit test simulating months of snapshots to verify retention behavior using a statd stub that only runs the retention code locally (unit test)

Checklist

Tick relevant boxes, this PR is-a or has-a:

  • Bugfix
    • Regression tests
    • ChangeLog updates (for next release)
  • Feature
    • YANG model change => revision updated?
    • Regression tests added?
    • ChangeLog updates (for next release)
    • Documentation added?
  • Test changes
    • Checked in changed Readme.adoc (make test-spec)
    • Added new test to group Readme.adoc and yaml file
    • New Unit Test
  • Code style update (formatting, renaming)
  • Refactoring (please detail in commit messages)
  • Build related changes
  • Documentation content changes
    • ChangeLog updated (for major changes)
  • Other (please describe):

Dump operational datastore to timestamped JSON snapshots every 5 minutes
(in /var/lib/statd/). The operational.json symlink always points to the
latest snapshot.

Implement hierarchical retention policy that keeps the first snapshot of
each time period (hour/day/week/month/year), providing fine-grained recent
history while preventing unbounded disk usage.

This will allow us to plot / track how the system state evolves as
well as give us somewhat fine-grained info in the case of an event,
such as a crash.

Add unit test simulating months of snapshots to verify retention
behavior using a statd stub that only runs the retention code locally
(unit test)

Signed-off-by: Richard Alpe <richard@bit42.se>
Copy link
Contributor

@wkz wkz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! 🔥

Comment on lines 52 to 53
ret = lyd_print_path(timestamp_path, tree, LYD_JSON,
LYD_PRINT_WITHSIBLINGS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some quick testing on a 28-port system with more or less an empty config:

root@test-05-25-f8:/tmp$ sysrepocfg -X -d operational -f json >oper.json
root@test-05-25-f8:/tmp$ ll -h oper.json
-rw-r--r--    1 root     root      237.9K Dec  9 22:36 oper.json
root@test-05-25-f8:/tmp$ gzip oper.json
root@test-05-25-f8:/tmp$ ll -h oper.json.gz
-rw-r--r--    1 root     root       19.6K Dec  9 22:37 oper.json.gz

Should we pass this through libz before going to disk? I'd hate for this storage to become an issue on smaller systems or as the amount of generated data grows. With compression, disk usage after one year should go from 14MB down to 1MB with today's feature-set. That should set us up to not have to think about this for a long time.

Store historical snapshots as compressed .json.gz files to reduce disk
usage. The operational.json file remains uncompressed for easy access.

Signed-off-by: Richard Alpe <richard@bit42.se>
Copy link
Contributor

@wkz wkz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 it!

@rical rical merged commit 7fcbc88 into main Dec 15, 2025
7 checks passed
@rical rical deleted the statd-journaling branch December 15, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants