-
Notifications
You must be signed in to change notification settings - Fork 18
Implements 'reset to epoch' #586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Allows you to truncate your database to the start of some epoch; Works by manipulating the rocksdb database, deleting all epochs after the target epoch, and then copying the previous one as "live" to "start" at the beginning of the epoch. Signed-off-by: Pi Lanningham <pi@sundae.fi>
WalkthroughAdds a new Changes
sequenceDiagram
actor User
participant CLI as CLI Parser
participant FS as Filesystem
participant Validator as Snapshot Validator
participant Ops as File Ops
User->>CLI: run reset_to_epoch --ledger-dir DIR --epoch N
CLI->>CLI: parse Args
CLI->>FS: list snapshot folders
FS-->>CLI: snapshot list
CLI->>Validator: validate target epoch N against snapshots
Validator-->>CLI: validation result
alt valid
CLI->>Ops: delete `live/`
Ops->>FS: remove live folder
CLI->>Ops: delete past folders > N and not (N-1)
Ops->>FS: remove obsolete past/* folders
CLI->>Ops: copy past/(N-1) -> live/
Ops->>FS: copy directory recursively
Ops-->>CLI: success
CLI-->>User: completion message
else invalid
Validator-->>CLI: error
CLI-->>User: error message
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (2)
47-47: Consider more idiomatic parameter types.Small Rust idiom tip, like knowing the difference between a flat white and a latte –
&Pathis generally preferred over&PathBufand&[Folder]over&Vec<Folder>for function parameters. It's more flexible and accepts both owned and borrowed types automatically throughDeref.-fn get_ledger_db_snapshots(ledger_dir: &PathBuf) -> Result<Vec<Folder>, Box<dyn Error>> { +fn get_ledger_db_snapshots(ledger_dir: &Path) -> Result<Vec<Folder>, Box<dyn Error>> {-fn check_safe_to_reset(epoch: u64, folders: &Vec<Folder>) -> Result<(), Box<dyn Error>> { +fn check_safe_to_reset(epoch: u64, folders: &[Folder]) -> Result<(), Box<dyn Error>> {You'd need to add
use std::path::Path;at the top.Also applies to: 66-66
110-111: Potential underflow if epoch is 0.Now, your safety checks upstream should prevent
args.epochfrom being 0 (themin_epoch + 3check), but this subtraction could underflow in debug mode if those checks ever change. It's a bit like trusting the save point in Dark Souls – usually fine, but...Consider using
checked_suborsaturating_subfor extra safety, or add an explicitepoch > 0check incheck_safe_to_reset.- if epoch == args.epoch - 1 { + if Some(epoch) == args.epoch.checked_sub(1) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
crates/amaru/src/bin/amaru/cmd/mod.rs(1 hunks)crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs(1 hunks)crates/amaru/src/bin/amaru/main.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-20T13:02:25.763Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 387
File: crates/amaru-stores/src/lib.rs:40-40
Timestamp: 2025-08-20T13:02:25.763Z
Learning: In the amaru-stores crate, amaru_slot_arithmetic types like Epoch and EraHistory are used throughout the main crate code in modules like in_memory/mod.rs, rocksdb/consensus.rs, and rocksdb/ledger/columns/, not just in tests. This means amaru-slot-arithmetic should be a regular dependency, not a dev-dependency.
Applied to files:
crates/amaru/src/bin/amaru/cmd/mod.rscrates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
📚 Learning: 2025-01-21T15:32:17.911Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 69
File: crates/amaru/src/ledger/state/diff_epoch_reg.rs:112-117
Timestamp: 2025-01-21T15:32:17.911Z
Learning: When suggesting code changes in Rust, always verify that the types align correctly, especially when dealing with references and Options. The `Fold::Registered` variant in `diff_epoch_reg.rs` expects a reference `&'a V`, so unwrapping an `Option<&V>` requires only a single `.expect()`.
Applied to files:
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
🧬 Code graph analysis (1)
crates/amaru/src/bin/amaru/main.rs (1)
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (1)
run(97-127)
🪛 GitHub Actions: Coding Practices
crates/amaru/src/bin/amaru/cmd/mod.rs
[error] 24-24: Rustfmt formatting changes detected in this file. Run 'cargo fmt' to apply formatting.
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
[error] 80-80: Rustfmt formatting changes detected in this file. Run 'cargo fmt' to apply formatting.
[error] 105-105: Rustfmt formatting changes detected in this file. Run 'cargo fmt' to apply formatting.
[error] 114-114: Rustfmt formatting changes detected in this file. Run 'cargo fmt' to apply formatting.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Test the CLI
- GitHub Check: Build riscv32
- GitHub Check: Build wasm32
- GitHub Check: Build x86_64/windows
- GitHub Check: Build x86_64/linux
- GitHub Check: Build aarch64/linux
- GitHub Check: Benches
- GitHub Check: Build aarch64/macos
- GitHub Check: Test coverage
- GitHub Check: Analyze (rust)
🔇 Additional comments (3)
crates/amaru/src/bin/amaru/main.rs (1)
103-105: Looks mint, mate!The new
ResetToEpochcommand variant slots in perfectly with the existing command structure. Clean integration following the established pattern – like fitting the last piece of a Tetris puzzle right where it belongs.Also applies to: 192-192
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (2)
66-95: Solid validation logic!The safety checks here are well thought out, mate – covering past limits, future limits, and ensuring enough historical epochs remain. Like a proper pre-flight checklist before takeoff!
One tiny theoretical edge case:
max_epoch + 1on line 82 could overflow ifmax_epochisu64::MAX, but that's about as likely as Half-Life 3 releasing, so probably not worth worrying about in practice.
97-127: The flow is solid (aside from the copy bug).The overall structure makes sense: enumerate → validate → cleanup → restore. Like a proper speedrun strat, you've got the steps down pat.
One observation: this
async fndoesn't actuallyawaitanything. Not necessarily a problem since it maintains consistency with other commands, but worth knowing if you ever want to make this sync for simplicity.
Codecov Report❌ Patch coverage is
... and 10 files with indirect coverage changes 🚀 New features to boost your workflow:
|
|
This makes several assumptions about internal implementation details tied to our current usage of Specifically here, I think that an extra script would be more sensible. |
|
IMO if the details change, we should just update the code 😅 You can just do things. |
fs: :copy won't copy directories, and we don't want to add a dependency for this, so I just implemented a simple copy copy dir utility Signed-off-by: Pi Lanningham <pi@sundae.fi>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs (3)
36-44: Tiny doc nit: “chain database” vs “ledger snapshots”The arg doc says “chain database”, but the rest of this module explicitly talks about the ledger DB snapshots under
ledger_dir. To avoid confusing future-you at 2am, I’d align the wording to “ledger database” or “ledger snapshots directory”.
64-81: Snapshot discovery is fine, but consider a small robustness tweakThe directory scan / parse logic is neat and defensive (skips non-dirs and non-numeric names), but it currently uses
file_stem(). If snapshot dirs ever grow suffixes like123.backupor similar, that’ll silently change the parsed epoch. Usingfile_name()instead offile_stem()would make this a bit more future‑proof while keeping behaviour identical forliveand bare numeric names.Something like:
- let stem = path.file_stem()?.to_str()?; + let stem = path.file_name()?.to_str()?;Up to you if you want to harden it now or wait until the format evolves.
118-151: Overall flow is sound, but tightly couples the CLI to the current RocksDB layoutLogic-wise, the reset flow hangs together nicely:
- delete
live/- retain
epoch-1, trashepoch >= target_epoch- rehydrate
live/fromepoch-1Once the
epoch-1existence check above is in place, behaviour is predictable and the failure modes are at least explicit.The bigger-picture caveat (which jeluard already hinted at in the PR) is that this command bakes in assumptions about how the ledger store lays out snapshots on disk (folder names, the meaning of “live”, how many epochs you “need”, etc). If snapshot semantics change, or you switch away from RocksDB, this CLI either becomes a footgun or needs to be quietly neutered.
If you want to future‑proof it a bit without a big refactor right now, a couple of options:
- Make it very explicit in the help text and logs that this is a low‑level, RocksDB‑layout‑aware operation intended for power users/ops.
- Longer term, consider pushing this logic behind a store‑level abstraction (or a dedicated maintenance tool) so the CLI isn’t the one hard‑coding on‑disk invariants.
Right now it’s fine for “I know what I’m doing” workflows, but it’s got a bit of a Dark Souls vibe: powerful, unforgiving, and very tied to the current engine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/amaru/src/bin/amaru/cmd/mod.rs(1 hunks)crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: abailly
Repo: pragma-org/amaru PR: 228
File: crates/amaru-stores/src/rocksdb/consensus.rs:89-128
Timestamp: 2025-05-21T18:58:48.631Z
Learning: The InMemConsensusStore implementation in crates/amaru-stores/src/rocksdb/consensus.rs will be fleshed out incrementally on a by-need basis, driven by test requirements rather than implementing all functionality upfront.
📚 Learning: 2025-08-20T13:02:25.763Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 387
File: crates/amaru-stores/src/lib.rs:40-40
Timestamp: 2025-08-20T13:02:25.763Z
Learning: In the amaru-stores crate, amaru_slot_arithmetic types like Epoch and EraHistory are used throughout the main crate code in modules like in_memory/mod.rs, rocksdb/consensus.rs, and rocksdb/ledger/columns/, not just in tests. This means amaru-slot-arithmetic should be a regular dependency, not a dev-dependency.
Applied to files:
crates/amaru/src/bin/amaru/cmd/mod.rscrates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
📚 Learning: 2025-01-21T15:32:17.911Z
Learnt from: jeluard
Repo: pragma-org/amaru PR: 69
File: crates/amaru/src/ledger/state/diff_epoch_reg.rs:112-117
Timestamp: 2025-01-21T15:32:17.911Z
Learning: When suggesting code changes in Rust, always verify that the types align correctly, especially when dealing with references and Options. The `Fold::Registered` variant in `diff_epoch_reg.rs` expects a reference `&'a V`, so unwrapping an `Option<&V>` requires only a single `.expect()`.
Applied to files:
crates/amaru/src/bin/amaru/cmd/mod.rscrates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
🪛 GitHub Actions: Coding Practices
crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs
[error] 83-83: clippy: writing &Vec instead of &[_] involves a new object where a slice will do. Change &Vec<Folder> to &[Folder]. (Suggestion: 'change this to: &[Folder]')
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Test the CLI
- GitHub Check: Build riscv32
- GitHub Check: Benches
- GitHub Check: Build wasm32
- GitHub Check: Build aarch64/macos
- GitHub Check: Build x86_64/windows
- GitHub Check: Build x86_64/linux
- GitHub Check: Build aarch64/linux
- GitHub Check: Test coverage
- GitHub Check: Analyze (rust)
🔇 Additional comments (1)
crates/amaru/src/bin/amaru/cmd/mod.rs (1)
19-29: Wiringreset_to_epochinto the cmd module looks solidNice clean addition, mate – module export is in alphabetical order with the rest and keeps the CLI wiring consistent with other commands. Nothing else to tweak here.
| fn check_safe_to_reset(epoch: u64, folders: &Vec<Folder>) -> Result<(), Box<dyn Error>> { | ||
| let epoch_numbers = folders.iter().filter_map(|f| f.epoch.epoch_no()); | ||
| let (min_epoch, max_epoch) = ( | ||
| epoch_numbers | ||
| .clone() | ||
| .min() | ||
| .ok_or("no epochs to roll back to")?, | ||
| epoch_numbers.max().ok_or("no epochs to roll back to")?, | ||
| ); | ||
|
|
||
| if epoch < min_epoch { | ||
| return Err(format!("cannot reset to an epoch that far in the past. We've only kept snapshots as far back as {}", min_epoch).into()); | ||
| } | ||
|
|
||
| // The +1 here is because if we're resetting to 175, and the max epoch is 174, | ||
| // we're *in* epoch 175, and we can just delete `live/` and copy `174` to `live` | ||
| if epoch > max_epoch + 1 { | ||
| return Err(format!( | ||
| "cannot reset to an epoch in the future. We're currently in epoch {}", | ||
| max_epoch + 1 | ||
| ) | ||
| .into()); | ||
| } | ||
|
|
||
| // We need 3 previous epochs *plus* the "live" epoch, to function | ||
| // so if we try to reset to the start of 165, but our earliest epoch is 163 | ||
| // this will break: we can keep 163 and 164, and copy 164 to live, but | ||
| // that leaves us with only 2 epochs | ||
| if epoch < min_epoch + 3 { | ||
| return Err(format!("resetting to epoch {} would leave us with too few historical epochs to proceed. The earliest epoch you can reset to is {}", epoch, min_epoch + 3).into()); | ||
| } | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Clippy’s &Vec warning and tighten the safety check contract
Two birds with one stone here:
-
Clippy warning (
&Vec→&[_])
The function only needs a slice, so you can happily take&[Folder]and keep the call site unchanged (auto‑coercion from&Vec<_>). That’ll shut Clippy up. -
Safety check doesn’t guarantee
epoch - 1exists
runassumes there’s always a snapshot forepoch - 1, butcheck_safe_to_resetonly reasons aboutminandmax. If the snapshot set ever has gaps (e.g.{10, 14, 15}and you reset to14), the safety check passes butpentultimate_epochstaysNoneand you hit the “invariant violated” error.
It’s not a crash, but it’s a pretty “what the hell just happened?” moment for an operator.
You can fix both with something like:
-fn check_safe_to_reset(epoch: u64, folders: &Vec<Folder>) -> Result<(), Box<dyn Error>> {
- let epoch_numbers = folders.iter().filter_map(|f| f.epoch.epoch_no());
+fn check_safe_to_reset(epoch: u64, folders: &[Folder]) -> Result<(), Box<dyn Error>> {
+ let epoch_numbers = folders.iter().filter_map(|f| f.epoch.epoch_no());
@@
- if epoch < min_epoch + 3 {
+ if epoch < min_epoch + 3 {
return Err(format!("resetting to epoch {} would leave us with too few historical epochs to proceed. The earliest epoch you can reset to is {}", epoch, min_epoch + 3).into());
}
+ // Ensure there is actually a snapshot for epoch-1, which `run` needs
+ // to promote to `live`. This also guards against unexpected gaps in
+ // the snapshot sequence.
+ if !folders
+ .iter()
+ .filter_map(|f| f.epoch.epoch_no())
+ .any(|e| e == epoch - 1)
+ {
+ return Err(
+ format!(
+ "no snapshot found for epoch {} (required to reset to epoch {})",
+ epoch - 1,
+ epoch
+ )
+ .into(),
+ );
+ }
+
Ok(())
}Call site stays as‑is:
check_safe_to_reset(args.epoch, &folders)?;That way the “safety check” really does guarantee the preconditions run relies on, and CI stops whinging.
🧰 Tools
🪛 GitHub Actions: Coding Practices
[error] 83-83: clippy: writing &Vec instead of &[_] involves a new object where a slice will do. Change &Vec<Folder> to &[Folder]. (Suggestion: 'change this to: &[Folder]')
🤖 Prompt for AI Agents
In crates/amaru/src/bin/amaru/cmd/reset_to_epoch.rs around lines 83–116, change
the function signature to accept a slice (&[Folder]) instead of &Vec<Folder> to
satisfy Clippy, and tighten the safety checks: after computing min and max, also
ensure that a snapshot for epoch - 1 exists in the folder epoch set (i.e.
epoch_numbers contains epoch - 1) and return a clear error if it does not; keep
the existing min/max and min+3 checks but update error messages to reflect the
new precondition so callers know we require an explicit snapshot for epoch - 1.
Allows you to truncate your database to the start of some epoch;
Works by manipulating the rocksdb database, deleting all epochs after the target epoch, and then copying the previous one as "live" to "start" at the beginning of the epoch.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.