Skip to content

Conversation

@Quantumplation
Copy link
Contributor

@Quantumplation Quantumplation commented Dec 5, 2025

Allows you to truncate your database to the start of some epoch;

Works by manipulating the rocksdb database, deleting all epochs after the target epoch, and then copying the previous one as "live" to "start" at the beginning of the epoch.

Summary by CodeRabbit

  • New Features
    • Added a new CLI command "reset_to_epoch" to reset ledger snapshots to a specified epoch.
    • Command accepts a ledger directory and target epoch, validates safety (min history requirement, no future resets, epoch bounds), preserves the penultimate epoch for restoration, and restores it as the live snapshot while removing others.

✏️ Tip: You can customize this high-level summary in your review settings.

Allows you to truncate your database to the start of some epoch;

Works by manipulating the rocksdb database, deleting all epochs after
the target epoch, and then copying the previous one as "live" to "start"
at the beginning of the epoch.

Signed-off-by: Pi Lanningham <pi@sundae.fi>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Walkthrough

Adds a new reset_to_epoch CLI command that parses ledger directory and epoch, enumerates snapshots, validates epoch safety, and performs filesystem operations to remove obsolete snapshots and restore the predecessor snapshot as the live ledger.

Changes

Cohort / File(s) Change Summary
Reset command module
crates/amaru/src/bin/amaru/cmd/mod.rs
Added module declaration: pub(crate) mod reset_to_epoch;.
Reset to Epoch implementation
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
New CLI command implementation. Adds public Args struct (ledger_dir: PathBuf with default "ledger.db", epoch: u64), a private Epoch enum (Live / Past(u64)), copy_dir_recursive helper, snapshot enumeration and validation logic, filesystem removal/copy operations, and a public async run(args: Args) -> Result<(), Box<dyn Error>>. Robust error handling present.
CLI integration
crates/amaru/src/bin/amaru/main.rs
Added ResetToEpoch(cmd::reset_to_epoch::Args) variant to Command enum and dispatch to cmd::reset_to_epoch::run(args).await in main match.
sequenceDiagram
    actor User
    participant CLI as CLI Parser
    participant FS as Filesystem
    participant Validator as Snapshot Validator
    participant Ops as File Ops

    User->>CLI: run reset_to_epoch --ledger-dir DIR --epoch N
    CLI->>CLI: parse Args
    CLI->>FS: list snapshot folders
    FS-->>CLI: snapshot list
    CLI->>Validator: validate target epoch N against snapshots
    Validator-->>CLI: validation result
    alt valid
        CLI->>Ops: delete `live/`
        Ops->>FS: remove live folder
        CLI->>Ops: delete past folders > N and not (N-1)
        Ops->>FS: remove obsolete past/* folders
        CLI->>Ops: copy past/(N-1) -> live/
        Ops->>FS: copy directory recursively
        Ops-->>CLI: success
        CLI-->>User: completion message
    else invalid
        Validator-->>CLI: error
        CLI-->>User: error message
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (epoch validation logic, ordering of delete/copy, error handling).
  • Verify CLI integration in main.rs follows existing dispatch patterns.
  • Review copy_dir_recursive for edge cases (permissions, partial copies, symlinks).

"A cheeky reset like rewinding to a fav game save,
snapshots stacked like film reels in an indie rave.
Keep the penultimate — don't let the ledger fall,
copy, delete, validate — cue triumphant call! 🎮📼"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implements reset to epoch' accurately and concisely describes the main change—adding a complete reset-to-epoch feature across multiple files including a new command module, CLI integration, and execution logic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pi/reset-to-epoch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (2)

47-47: Consider more idiomatic parameter types.

Small Rust idiom tip, like knowing the difference between a flat white and a latte – &Path is generally preferred over &PathBuf and &[Folder] over &Vec<Folder> for function parameters. It's more flexible and accepts both owned and borrowed types automatically through Deref.

-fn get_ledger_db_snapshots(ledger_dir: &PathBuf) -> Result<Vec<Folder>, Box<dyn Error>> {
+fn get_ledger_db_snapshots(ledger_dir: &Path) -> Result<Vec<Folder>, Box<dyn Error>> {
-fn check_safe_to_reset(epoch: u64, folders: &Vec<Folder>) -> Result<(), Box<dyn Error>> {
+fn check_safe_to_reset(epoch: u64, folders: &[Folder]) -> Result<(), Box<dyn Error>> {

You'd need to add use std::path::Path; at the top.

Also applies to: 66-66


110-111: Potential underflow if epoch is 0.

Now, your safety checks upstream should prevent args.epoch from being 0 (the min_epoch + 3 check), but this subtraction could underflow in debug mode if those checks ever change. It's a bit like trusting the save point in Dark Souls – usually fine, but...

Consider using checked_sub or saturating_sub for extra safety, or add an explicit epoch > 0 check in check_safe_to_reset.

-                if epoch == args.epoch - 1 {
+                if Some(epoch) == args.epoch.checked_sub(1) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58f23bf and 0e090fe.

📒 Files selected for processing (3)
  • crates/amaru/src/bin/amaru/cmd/mod.rs (1 hunks)
  • crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (1 hunks)
  • crates/amaru/src/bin/amaru/main.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-20T13:02:25.763Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 387
File: crates/amaru-stores/src/lib.rs:40-40
Timestamp: 2025-08-20T13:02:25.763Z
Learning: In the amaru-stores crate, amaru_slot_arithmetic types like Epoch and EraHistory are used throughout the main crate code in modules like in_memory/mod.rs, rocksdb/consensus.rs, and rocksdb/ledger/columns/, not just in tests. This means amaru-slot-arithmetic should be a regular dependency, not a dev-dependency.

Applied to files:

  • crates/amaru/src/bin/amaru/cmd/mod.rs
  • crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
📚 Learning: 2025-01-21T15:32:17.911Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 69
File: crates/amaru/src/ledger/state/diff_epoch_reg.rs:112-117
Timestamp: 2025-01-21T15:32:17.911Z
Learning: When suggesting code changes in Rust, always verify that the types align correctly, especially when dealing with references and Options. The `Fold::Registered` variant in `diff_epoch_reg.rs` expects a reference `&'a V`, so unwrapping an `Option<&V>` requires only a single `.expect()`.

Applied to files:

  • crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
🧬 Code graph analysis (1)
crates/amaru/src/bin/amaru/main.rs (1)
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (1)
  • run (97-127)
🪛 GitHub Actions: Coding Practices
crates/amaru/src/bin/amaru/cmd/mod.rs

[error] 24-24: Rustfmt formatting changes detected in this file. Run 'cargo fmt' to apply formatting.

crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs

[error] 80-80: Rustfmt formatting changes detected in this file. Run 'cargo fmt' to apply formatting.


[error] 105-105: Rustfmt formatting changes detected in this file. Run 'cargo fmt' to apply formatting.


[error] 114-114: Rustfmt formatting changes detected in this file. Run 'cargo fmt' to apply formatting.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test the CLI
  • GitHub Check: Build riscv32
  • GitHub Check: Build wasm32
  • GitHub Check: Build x86_64/windows
  • GitHub Check: Build x86_64/linux
  • GitHub Check: Build aarch64/linux
  • GitHub Check: Benches
  • GitHub Check: Build aarch64/macos
  • GitHub Check: Test coverage
  • GitHub Check: Analyze (rust)
🔇 Additional comments (3)
crates/amaru/src/bin/amaru/main.rs (1)

103-105: Looks mint, mate!

The new ResetToEpoch command variant slots in perfectly with the existing command structure. Clean integration following the established pattern – like fitting the last piece of a Tetris puzzle right where it belongs.

Also applies to: 192-192

crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (2)

66-95: Solid validation logic!

The safety checks here are well thought out, mate – covering past limits, future limits, and ensuring enough historical epochs remain. Like a proper pre-flight checklist before takeoff!

One tiny theoretical edge case: max_epoch + 1 on line 82 could overflow if max_epoch is u64::MAX, but that's about as likely as Half-Life 3 releasing, so probably not worth worrying about in practice.


97-127: The flow is solid (aside from the copy bug).

The overall structure makes sense: enumerate → validate → cleanup → restore. Like a proper speedrun strat, you've got the steps down pat.

One observation: this async fn doesn't actually await anything. Not necessarily a problem since it maintains consistency with other commands, but worth knowing if you ever want to make this sync for simplicity.

@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 0% with 79 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs 0.00% 78 Missing ⚠️
crates/amaru/src/bin/amaru/main.rs 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
crates/amaru/src/bin/amaru/cmd/mod.rs 0.00% <ø> (ø)
crates/amaru/src/bin/amaru/main.rs 0.00% <0.00%> (ø)
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs 0.00% <0.00%> (ø)

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jeluard
Copy link
Contributor

jeluard commented Dec 5, 2025

This makes several assumptions about internal implementation details tied to our current usage of rocksdb. Mainly, that snapshots are complete live state copies and can be swapped around.
This makes it problematic IMO to enshrine in the API as those might not hold anymore if we optimize snapshots and will probably break with non-rocksdb implementations (we don't strictly need full copies of the current epoch data for previous epochs).
In gral I think we should just stick with what is offered at the trait Store level.

Specifically here, I think that an extra script would be more sensible.

@Quantumplation
Copy link
Contributor Author

IMO if the details change, we should just update the code 😅 You can just do things.

fs: :copy won't copy directories, and we don't want to add a dependency for this, so I just implemented a simple copy copy dir utility
Signed-off-by: Pi Lanningham <pi@sundae.fi>
Signed-off-by: Pi Lanningham <pi@sundae.fi>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (3)

36-44: Tiny doc nit: “chain database” vs “ledger snapshots”

The arg doc says “chain database”, but the rest of this module explicitly talks about the ledger DB snapshots under ledger_dir. To avoid confusing future-you at 2am, I’d align the wording to “ledger database” or “ledger snapshots directory”.


64-81: Snapshot discovery is fine, but consider a small robustness tweak

The directory scan / parse logic is neat and defensive (skips non-dirs and non-numeric names), but it currently uses file_stem(). If snapshot dirs ever grow suffixes like 123.backup or similar, that’ll silently change the parsed epoch. Using file_name() instead of file_stem() would make this a bit more future‑proof while keeping behaviour identical for live and bare numeric names.

Something like:

-            let stem = path.file_stem()?.to_str()?;
+            let stem = path.file_name()?.to_str()?;

Up to you if you want to harden it now or wait until the format evolves.


118-151: Overall flow is sound, but tightly couples the CLI to the current RocksDB layout

Logic-wise, the reset flow hangs together nicely:

  • delete live/
  • retain epoch-1, trash epoch >= target_epoch
  • rehydrate live/ from epoch-1

Once the epoch-1 existence check above is in place, behaviour is predictable and the failure modes are at least explicit.

The bigger-picture caveat (which jeluard already hinted at in the PR) is that this command bakes in assumptions about how the ledger store lays out snapshots on disk (folder names, the meaning of “live”, how many epochs you “need”, etc). If snapshot semantics change, or you switch away from RocksDB, this CLI either becomes a footgun or needs to be quietly neutered.

If you want to future‑proof it a bit without a big refactor right now, a couple of options:

  • Make it very explicit in the help text and logs that this is a low‑level, RocksDB‑layout‑aware operation intended for power users/ops.
  • Longer term, consider pushing this logic behind a store‑level abstraction (or a dedicated maintenance tool) so the CLI isn’t the one hard‑coding on‑disk invariants.

Right now it’s fine for “I know what I’m doing” workflows, but it’s got a bit of a Dark Souls vibe: powerful, unforgiving, and very tied to the current engine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e090fe and 08c32b0.

📒 Files selected for processing (2)
  • crates/amaru/src/bin/amaru/cmd/mod.rs (1 hunks)
  • crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: abailly
Repo: pragma-org/amaru PR: 228
File: crates/amaru-stores/src/rocksdb/consensus.rs:89-128
Timestamp: 2025-05-21T18:58:48.631Z
Learning: The InMemConsensusStore implementation in crates/amaru-stores/src/rocksdb/consensus.rs will be fleshed out incrementally on a by-need basis, driven by test requirements rather than implementing all functionality upfront.
📚 Learning: 2025-08-20T13:02:25.763Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 387
File: crates/amaru-stores/src/lib.rs:40-40
Timestamp: 2025-08-20T13:02:25.763Z
Learning: In the amaru-stores crate, amaru_slot_arithmetic types like Epoch and EraHistory are used throughout the main crate code in modules like in_memory/mod.rs, rocksdb/consensus.rs, and rocksdb/ledger/columns/, not just in tests. This means amaru-slot-arithmetic should be a regular dependency, not a dev-dependency.

Applied to files:

  • crates/amaru/src/bin/amaru/cmd/mod.rs
  • crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
📚 Learning: 2025-01-21T15:32:17.911Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 69
File: crates/amaru/src/ledger/state/diff_epoch_reg.rs:112-117
Timestamp: 2025-01-21T15:32:17.911Z
Learning: When suggesting code changes in Rust, always verify that the types align correctly, especially when dealing with references and Options. The `Fold::Registered` variant in `diff_epoch_reg.rs` expects a reference `&'a V`, so unwrapping an `Option<&V>` requires only a single `.expect()`.

Applied to files:

  • crates/amaru/src/bin/amaru/cmd/mod.rs
  • crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
🪛 GitHub Actions: Coding Practices
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs

[error] 83-83: clippy: writing &Vec instead of &[_] involves a new object where a slice will do. Change &Vec<Folder> to &[Folder]. (Suggestion: 'change this to: &[Folder]')

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test the CLI
  • GitHub Check: Build riscv32
  • GitHub Check: Benches
  • GitHub Check: Build wasm32
  • GitHub Check: Build aarch64/macos
  • GitHub Check: Build x86_64/windows
  • GitHub Check: Build x86_64/linux
  • GitHub Check: Build aarch64/linux
  • GitHub Check: Test coverage
  • GitHub Check: Analyze (rust)
🔇 Additional comments (1)
crates/amaru/src/bin/amaru/cmd/mod.rs (1)

19-29: Wiring reset_to_epoch into the cmd module looks solid

Nice clean addition, mate – module export is in alphabetical order with the rest and keeps the CLI wiring consistent with other commands. Nothing else to tweak here.

Comment on lines +83 to +116
fn check_safe_to_reset(epoch: u64, folders: &Vec<Folder>) -> Result<(), Box<dyn Error>> {
let epoch_numbers = folders.iter().filter_map(|f| f.epoch.epoch_no());
let (min_epoch, max_epoch) = (
epoch_numbers
.clone()
.min()
.ok_or("no epochs to roll back to")?,
epoch_numbers.max().ok_or("no epochs to roll back to")?,
);

if epoch < min_epoch {
return Err(format!("cannot reset to an epoch that far in the past. We've only kept snapshots as far back as {}", min_epoch).into());
}

// The +1 here is because if we're resetting to 175, and the max epoch is 174,
// we're *in* epoch 175, and we can just delete `live/` and copy `174` to `live`
if epoch > max_epoch + 1 {
return Err(format!(
"cannot reset to an epoch in the future. We're currently in epoch {}",
max_epoch + 1
)
.into());
}

// We need 3 previous epochs *plus* the "live" epoch, to function
// so if we try to reset to the start of 165, but our earliest epoch is 163
// this will break: we can keep 163 and 164, and copy 164 to live, but
// that leaves us with only 2 epochs
if epoch < min_epoch + 3 {
return Err(format!("resetting to epoch {} would leave us with too few historical epochs to proceed. The earliest epoch you can reset to is {}", epoch, min_epoch + 3).into());
}

Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix Clippy’s &Vec warning and tighten the safety check contract

Two birds with one stone here:

  1. Clippy warning (&Vec&[_])
    The function only needs a slice, so you can happily take &[Folder] and keep the call site unchanged (auto‑coercion from &Vec<_>). That’ll shut Clippy up.

  2. Safety check doesn’t guarantee epoch - 1 exists
    run assumes there’s always a snapshot for epoch - 1, but check_safe_to_reset only reasons about min and max. If the snapshot set ever has gaps (e.g. {10, 14, 15} and you reset to 14), the safety check passes but pentultimate_epoch stays None and you hit the “invariant violated” error.
    It’s not a crash, but it’s a pretty “what the hell just happened?” moment for an operator.

You can fix both with something like:

-fn check_safe_to_reset(epoch: u64, folders: &Vec<Folder>) -> Result<(), Box<dyn Error>> {
-    let epoch_numbers = folders.iter().filter_map(|f| f.epoch.epoch_no());
+fn check_safe_to_reset(epoch: u64, folders: &[Folder]) -> Result<(), Box<dyn Error>> {
+    let epoch_numbers = folders.iter().filter_map(|f| f.epoch.epoch_no());
@@
-    if epoch < min_epoch + 3 {
+    if epoch < min_epoch + 3 {
         return Err(format!("resetting to epoch {} would leave us with too few historical epochs to proceed. The earliest epoch you can reset to is {}", epoch, min_epoch + 3).into());
     }
 
+    // Ensure there is actually a snapshot for epoch-1, which `run` needs
+    // to promote to `live`. This also guards against unexpected gaps in
+    // the snapshot sequence.
+    if !folders
+        .iter()
+        .filter_map(|f| f.epoch.epoch_no())
+        .any(|e| e == epoch - 1)
+    {
+        return Err(
+            format!(
+                "no snapshot found for epoch {} (required to reset to epoch {})",
+                epoch - 1,
+                epoch
+            )
+            .into(),
+        );
+    }
+
     Ok(())
 }

Call site stays as‑is:

check_safe_to_reset(args.epoch, &folders)?;

That way the “safety check” really does guarantee the preconditions run relies on, and CI stops whinging.

🧰 Tools
🪛 GitHub Actions: Coding Practices

[error] 83-83: clippy: writing &Vec instead of &[_] involves a new object where a slice will do. Change &Vec<Folder> to &[Folder]. (Suggestion: 'change this to: &[Folder]')

🤖 Prompt for AI Agents
In crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs around lines 83–116, change
the function signature to accept a slice (&[Folder]) instead of &Vec<Folder> to
satisfy Clippy, and tighten the safety checks: after computing min and max, also
ensure that a snapshot for epoch - 1 exists in the folder epoch set (i.e.
epoch_numbers contains epoch - 1) and return a clear error if it does not; keep
the existing min/max and min+3 checks but update error messages to reflect the
new precondition so callers know we require an explicit snapshot for epoch - 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants