Refactor: storage manager trait splitted in multiple subtraits #311

ZocoLini · 2025-12-26T21:29:07Z

The idea of this PR is to split responsibilities when it comes to the storage system. Instead of a single large trait handling all storage operations, we would have multiple traits, each one specialized in a specific system (BlockHeaders, Filters, etc.).

The reason for this change is the need to refactor how peer reputation is stored, along with other systems that we may eventually need. This approach makes the code more scalable and testable, following good design patterns.

To make this change as smooth as possible, I preserve the StorageManager trait implementing all the new specialized traits, this way DiskStorageManager can still be used as a facade for all the specialized storage structs.

Also removed redundant documentation like:

/// This method stores filter headers in the storage
fn store_filter_headers() {}

If you find any change that you think needs documentation or comments bcs is not clear by reading the code feel free to ask for changes

This PR is built on top of:
#278
#292

Summary by CodeRabbit

New Features
- Introduced modular persistent storage components for headers, filters, chain state, and masternode data with improved reliability and scalability.
Bug Fixes
- Fixed chain lock validation to correctly fetch headers from persistent storage.
- Improved tip height tracking by sourcing from storage instead of in-memory state.
Refactor
- Removed clear_filters() public method from client API.
- Simplified chain state by removing in-memory header/filter storage; now sourced from persistent storage.
- Streamlined error handling across sync operations with sensible defaults.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…d command communciation

…ts to write the data

github-actions · 2025-12-26T21:29:16Z

This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them.

coderabbitai · 2025-12-26T21:29:23Z

📝 Walkthrough

Walkthrough

This PR refactors the SPV storage architecture from in-memory state management to a persistent, trait-based storage system. It removes the headers and filter_headers fields from ChainState, introduces modular storage components for blocks, filters, chainstate, and masternode state, replaces the old DiskStorageManager implementation, and updates client and sync logic to use async storage calls instead of in-memory state access.

Changes

Cohort / File(s)	Summary
FFI Type Cleanup `dash-spv-ffi/src/types.rs`, `dash-spv-ffi/tests/unit/test_type_conversions.rs`	Removed `header_height` and `filter_header_height` fields from `FFIChainState` struct and their initialization in type conversion; adjusted test assertions.
ChainState Simplification `dash-spv/src/types.rs`	Removed `headers` and `filter_headers` fields and all associated public methods (`tip_height`, `tip_hash`, `header_at_height`, `add_headers`, etc.); removed genesis header seeding from initialization.
New Storage Modules: Block Headers `dash-spv/src/storage/blocks.rs`	Added new `BlockHeaderStorage` trait and `PersistentBlockHeaderStorage` struct with async methods for storing, loading, and querying block headers; includes reverse-lookup index for hash-to-height mapping and persistent disk backend.
New Storage Modules: Filters `dash-spv/src/storage/filters.rs`	Added `FilterHeaderStorage` and `FilterStorage` traits with `PersistentFilterHeaderStorage` and `PersistentFilterStorage` structs; implements async header/filter persistence with `SegmentCache` backed storage.
New Storage Modules: ChainState, Masternode, Metadata, Transactions `dash-spv/src/storage/chainstate.rs`, `dash-spv/src/storage/masternode.rs`, `dash-spv/src/storage/metadata.rs`, `dash-spv/src/storage/transactions.rs`	Added persistent storage traits and implementations for chain state (JSON-based), masternode state, metadata (key-value), and mempool transactions; all use atomic writes for durability.
Storage Architecture Refactor `dash-spv/src/storage/mod.rs`	Rebuilt `DiskStorageManager` with new architecture; now aggregates per-component storage handles via `Arc<RwLock<>>` and implements composite `StorageManager` trait; added background worker for periodic persistence and lifecycle controls (`start_worker`, `stop_worker`, `clear`, `shutdown`).
Storage Backend Changes `dash-spv/src/storage/segments.rs`	Updated `SegmentCache` and `Persistable` trait to use `segment_file_name()` instead of `FOLDER_NAME`; added `evicted` map for tracking segments pending persistence; refactored persistence to use explicit `persist()` and `persist_evicted()` calls instead of `WorkerCommand`.
Removed Storage Modules `dash-spv/src/storage/headers.rs`, `dash-spv/src/storage/manager.rs`, `dash-spv/src/storage/state.rs`	Deleted old header storage implementation, legacy `DiskStorageManager` struct/methods, and state persistence module; all functionality migrated to new modular storage components.
Client Core Updates `dash-spv/src/client/core.rs`	Changed `tip_hash` and `tip_height` to retrieve from storage instead of in-memory state; removed public `clear_filters()` method.
Client Lifecycle Updates `dash-spv/src/client/lifecycle.rs`	Reworked tip/header loading to use storage directly; removed explicit error handling for storage calls in favor of `unwrap_or()` defaults; simplified checkpoint initialization and storage sync calls.
Client Progress & Status Display `dash-spv/src/client/progress.rs`, `dash-spv/src/client/status_display.rs`	Updated storage API calls from `Result<Option<T>>` to `Option<T>`; removed nested unwrap patterns and adjusted error handling accordingly.
Sync Coordinator Updates `dash-spv/src/client/sync_coordinator.rs`	Replaced error-handling for tip height retrieval with permissive `unwrap_or(0)`, reducing error propagation; defaults to height 0 on storage failures.
Sync Manager & Headers Manager `dash-spv/src/sync/manager.rs`, `dash-spv/src/sync/headers/manager.rs`	Refactored to use async storage calls for header/tip queries; removed `get_chain_height()` method; changed `load_headers_from_storage` to return unit instead of count; updated method signatures to accept storage parameter; removed dependency on in-memory `total_headers_synced` field.
Sync Filters & Transitions `dash-spv/src/sync/filters/headers.rs`, `dash-spv/src/sync/filters/retry.rs`, `dash-spv/src/sync/transitions.rs`	Simplified tip height retrieval by replacing error-mapped patterns with direct `unwrap_or(0)` calls; updated storage API calls to handle changed return types (`Result<Option<T>>` → `Option<T>`).
Sync Masternodes & Phase Execution `dash-spv/src/sync/masternodes/manager.rs`, `dash-spv/src/sync/phase_execution.rs`	Updated tip height retrieval to use `Option`-based patterns; removed verbose pre-computation and logging in masternode list download phase; added storage parameter to `request_headers()` calls.
Sync Message & Validation `dash-spv/src/sync/message_handlers.rs`, `dash-spv/src/chain/chainlock_manager.rs`	Updated blockchain height retrieval to handle changed return types; refactored chainlock validation to use async storage fetch with error propagation for missing headers.
Test Updates `dash-spv/tests/*.rs` (15+ test files)	Adapted tests to new storage API signatures; replaced double-unwrap patterns for `Result<Option<T>>` with single-unwrap for `Option<T>`; removed fixed delays and adjusted fixture setup to use `store_headers()` instead of direct `ChainState` field assignment.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Refactor: storage segments cleanup #244: Overlapping refactoring of storage subsystem (segments, headers, filters) with removal/renaming of header/filter storage methods and reorganization of segment persistence.
Clean/remove memory storage #275: Related storage subsystem changes including removal of MemoryStorageManager and integration of disk-backed storage with updated DiskStorageManager interfaces.

Suggested reviewers

xdustinface

Poem

🐰 Hoppy changes, persistent and true,
Storage now thrives in the disk's deepest brew,
Headers no longer in memory stay,
Async and modular, a cleaner way!
Let's sync and persist, the refactor's here.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.42% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately describes the main change: refactoring a storage manager trait into multiple subtraits for better separation of concerns.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dash-spv/src/storage/segments.rs (1)

413-413: Segment size violates documented storage architecture: change from 10,000 to 50,000 items per segment is not reflected in guidelines.

The coding guideline specifies "Store headers in 10,000-header segments", but the implementation uses const ITEMS_PER_SEGMENT: u32 = 50_000; (line 413). This 5x increase impacts memory usage, disk I/O patterns, and persistence granularity. Either revert to 10,000 items per segment to match the documented architecture, or update CLAUDE.md to reflect and justify the 50,000-item design decision.

🧹 Nitpick comments (15)

dash-spv/src/client/progress.rs (1)
41-43: Consider logging storage errors for observability.

The API change from Result<Option<u32>> to Option<u32> means storage errors are now silently ignored when retrieving the tip height. While acceptable for stats collection, consider logging storage failures at the debug or trace level to aid diagnostics.
🔎 Example with error logging
-if let Some(header_height) = storage.get_tip_height().await {
+match storage.get_tip_height().await {
+    Some(header_height) => {
         stats.header_height = header_height;
+    }
+    None => {
+        tracing::trace!("Could not retrieve tip height from storage for stats");
+    }
 }
dash-spv/src/client/core.rs (1)
192-198: Simplify the double ?? pattern.

The chained ?? operators on line 195 are harder to read than necessary. Consider flattening this logic:
🔎 Suggested refactor
 pub async fn tip_hash(&self) -> Option<dashcore::BlockHash> {
     let storage = self.storage.lock().await;
     
     let tip_height = storage.get_tip_height().await?;
-    let header = storage.get_header(tip_height).await.ok()??;
+    let header = storage.get_header(tip_height).await.ok().flatten()?;
     
     Some(header.block_hash())
 }
Alternatively, you could use and_then for clarity:
 pub async fn tip_hash(&self) -> Option<dashcore::BlockHash> {
     let storage = self.storage.lock().await;
     
-    let tip_height = storage.get_tip_height().await?;
-    let header = storage.get_header(tip_height).await.ok()??;
-    
-    Some(header.block_hash())
+    storage
+        .get_tip_height()
+        .await
+        .and_then(|height| storage.get_header(height).await.ok().flatten())
+        .map(|header| header.block_hash())
 }
dash-spv/src/sync/transitions.rs (1)
180-180: Inconsistent error handling between get_tip_height and get_filter_tip_height.

get_tip_height().await.unwrap_or(0) silently defaults to 0 on storage errors, while get_filter_tip_height() (line 412-416) still maps and propagates errors. This inconsistency could mask storage issues during header sync while surfacing them during filter sync.

Consider either:

Using consistent error handling for both (propagate errors or use defaults)

Adding a log/trace when defaulting to 0 to aid debugging
🔎 Suggestion to add trace logging
-                let start_height = storage.get_tip_height().await.unwrap_or(0);
+                let start_height = storage.get_tip_height().await.unwrap_or_else(|| {
+                    tracing::trace!("No tip height in storage, starting from height 0");
+                    0
+                });
dash-spv/src/storage/chainstate.rs (2)
74-97: Consider using serde derive for ChainState deserialization.

Manual JSON field extraction is verbose and error-prone. If ChainState derives Serialize and Deserialize, you can simplify this significantly and ensure field consistency between store and load.
🔎 Suggested simplification

If ChainState derives serde traits:
-        let value: serde_json::Value = serde_json::from_str(&content).map_err(|e| {
-            crate::error::StorageError::Serialization(format!("Failed to parse chain state: {}", e))
-        })?;
-
-        let state = ChainState {
-            last_chainlock_height: value
-                .get("last_chainlock_height")
-                .and_then(|v| v.as_u64())
-                .map(|h| h as u32),
-            // ... many more lines
-        };
+        let mut state: ChainState = serde_json::from_str(&content).map_err(|e| {
+            crate::error::StorageError::Serialization(format!("Failed to parse chain state: {}", e))
+        })?;
+        // Reset runtime-only field
+        state.masternode_engine = None;
43-61: Manual JSON construction could be replaced with serde serialization.

If ChainState derives Serialize, use serde_json::to_string(&state) for consistency and automatic handling of all fields.
dash-spv/src/storage/transactions.rs (1)
39-42: Naming may be misleading for non-persistent storage.

PersistentTransactionStorage doesn't actually persist data (as noted in line 57). Consider renaming to InMemoryTransactionStorage or adding a doc comment explaining the design rationale.
🔎 Suggested documentation
+/// In-memory transaction storage for mempool data.
+/// 
+/// Note: Despite the "Persistent" prefix (for trait conformance), this storage
+/// intentionally does not persist mempool data to disk, as mempool state is
+/// transient and rebuilds from network on restart.
 pub struct PersistentTransactionStorage {
     mempool_transactions: HashMap<Txid, UnconfirmedTransaction>,
     mempool_state: Option<MempoolState>,
 }
dash-spv/src/storage/blocks.rs (1)

117-127: persist_dirty does not update the hash-to-height index.

If the application crashes after calling persist_dirty but before a full persist, the index file may be stale on next startup. The fallback in load() (line 94) rebuilds the index from segments, so this is safe but potentially slow on restart.

Consider persisting the index in persist_dirty as well, or documenting this as intentional behavior.
dash-spv/src/sync/headers/manager.rs (4)
143-153: Consider combining tip retrieval into a single storage operation.

The current implementation makes two separate async calls to get the tip height and then the header at that height. If storage state changes between calls (e.g., in concurrent scenarios), this could lead to inconsistency. While unlikely in the current architecture, a combined get_tip_header() method would be more robust.

496-505: Checkpoint header is fetched unconditionally but only used in some branches.

The checkpoint_header is retrieved at the start of the method but is only used when effective_tip_height is None with checkpoint sync, or when at checkpoint height. This adds unnecessary I/O overhead for normal non-checkpoint sync paths.

🔎 Proposed refactor: fetch checkpoint header lazily

Move the checkpoint header retrieval inside the branches that actually need it, or use a lazy evaluation pattern to avoid the upfront I/O cost when not needed.

523-541: Redundant get_header(0) calls.

storage.get_header(0) is called at line 523 and again at lines 530-535 for the same purpose. The second call checks if the header exists to decide whether to store it, but the first call already retrieved it.
🔎 Proposed fix
-                    if let Some(genesis_header) = storage.get_header(0).await.map_err(|e| {
-                        SyncError::Storage(format!(
-                            "Error trying to get genesis block from storage: {}",
-                            e
-                        ))
-                    })? {
-                        // Store genesis in storage if not already there
-                        if storage
-                            .get_header(0)
-                            .await
-                            .map_err(|e| {
-                                SyncError::Storage(format!("Failed to check genesis: {}", e))
-                            })?
-                            .is_none()
-                        {
-                            tracing::info!("Storing genesis block in storage");
-                            storage.store_headers(&[genesis_header]).await.map_err(|e| {
-                                SyncError::Storage(format!("Failed to store genesis: {}", e))
-                            })?;
-                        }
+                    let existing_genesis = storage.get_header(0).await.map_err(|e| {
+                        SyncError::Storage(format!(
+                            "Error trying to get genesis block from storage: {}",
+                            e
+                        ))
+                    })?;
+                    
+                    if let Some(genesis_header) = existing_genesis {
                         let genesis_hash = genesis_header.block_hash();
                         tracing::info!("Starting from genesis block: {}", genesis_hash);
                         Some(genesis_hash)
655-679: Checkpoint header retrieved unconditionally; consider lazy loading.

Similar to prepare_sync, the checkpoint header is fetched at the start of the timeout handling path but is only used when current_tip_height is None and syncing from a checkpoint. For most timeout recovery scenarios (when we have a tip), this I/O is unnecessary.
dash-spv/src/storage/mod.rs (4)
45-57: Consider adding documentation clarifying thread-safety expectations.

Based on learnings, the StorageManager trait pattern with &mut self methods combined with Send + Sync bounds can be confusing. The implementations use interior mutability (Arc<RwLock<_>>), so explicit documentation would help clarify the thread-safety model and API design rationale.
🔎 Suggested documentation
 #[async_trait]
+/// Composite trait for SPV storage operations.
+///
+/// Implementations use interior mutability (e.g., `Arc<RwLock<_>>`) for thread-safe
+/// concurrent access while maintaining the `Send + Sync` bounds required for async contexts.
 pub trait StorageManager:
     blocks::BlockHeaderStorage
     + filters::FilterHeaderStorage
78-87: Synchronous I/O in async context.

std::fs::create_dir_all at line 84 performs blocking I/O. In an async context, this could briefly block the executor. Consider using tokio::fs::create_dir_all for consistency with the rest of the async operations.
🔎 Proposed fix
     pub async fn new(storage_path: impl Into<PathBuf> + Send) -> StorageResult<Self> {
-        use std::fs;
-
         let storage_path = storage_path.into();
 
         // Create directories if they don't exist
-        fs::create_dir_all(&storage_path)?;
+        tokio::fs::create_dir_all(&storage_path).await?;
156-161: stop_worker does not clear worker_handle after aborting.

After calling abort(), the worker_handle field remains Some(...) with an aborted handle. This could cause confusion if stop_worker is called multiple times or if code checks worker_handle.is_some(). The shutdown method correctly uses take().
🔎 Proposed fix
     /// Stop the background worker without forcing a save.
-    pub(super) fn stop_worker(&self) {
-        if let Some(handle) = &self.worker_handle {
+    pub(super) fn stop_worker(&mut self) {
+        if let Some(handle) = self.worker_handle.take() {
             handle.abort();
         }
     }
218-227: Consider logging or propagating errors from persist().

The persist() method silently ignores all errors. During shutdown, failure to persist data could result in data loss. Consider logging warnings or returning a result to allow callers to handle failures.
🔎 Proposed fix: log persist errors
     async fn persist(&self) {
         let storage_path = &self.storage_path;
 
-        let _ = self.block_headers.write().await.persist(storage_path).await;
-        let _ = self.filter_headers.write().await.persist(storage_path).await;
-        let _ = self.filters.write().await.persist(storage_path).await;
-        let _ = self.transactions.write().await.persist(storage_path).await;
-        let _ = self.metadata.write().await.persist(storage_path).await;
-        let _ = self.chainstate.write().await.persist(storage_path).await;
+        if let Err(e) = self.block_headers.write().await.persist(storage_path).await {
+            tracing::error!("Failed to persist block headers on shutdown: {}", e);
+        }
+        if let Err(e) = self.filter_headers.write().await.persist(storage_path).await {
+            tracing::error!("Failed to persist filter headers on shutdown: {}", e);
+        }
+        if let Err(e) = self.filters.write().await.persist(storage_path).await {
+            tracing::error!("Failed to persist filters on shutdown: {}", e);
+        }
+        if let Err(e) = self.transactions.write().await.persist(storage_path).await {
+            tracing::error!("Failed to persist transactions on shutdown: {}", e);
+        }
+        if let Err(e) = self.metadata.write().await.persist(storage_path).await {
+            tracing::error!("Failed to persist metadata on shutdown: {}", e);
+        }
+        if let Err(e) = self.chainstate.write().await.persist(storage_path).await {
+            tracing::error!("Failed to persist chainstate on shutdown: {}", e);
+        }
     }

dash-spv/src/storage/blocks.rs

dash-spv/src/storage/chainstate.rs

dash-spv/src/storage/filters.rs

dash-spv/src/storage/masternode.rs

dash-spv/src/storage/metadata.rs

dash-spv/src/storage/mod.rs

dash-spv/src/sync/filters/headers.rs

ZocoLini added 30 commits December 16, 2025 21:16

the disk storage manager worker is now a time based check, removed ol…

aa0e1ea

…d command communciation

tests updated

a558988

thanks coderabbit

0ade0bf

merged dev branch

1f168b7

removed chain storage an related code

2e85243

removed headers from ChainState

3ca55b5

tip_height method removed

8f3d065

removed get_tip_hesh

53be7f4

replaced header_at_height

eb32b7b

removed unused methods

2bf3a91

init_from_checkpoint sync

f758a7f

tip_header removed

f995b89

removed two methos that where invovled in the same process

7acccc0

fixed ffi

90957cf

test updated to the changes

c443764

chore: removed fork detector and fork structs (#290)

4be4cef

get_header now checks out of bound to return None instead of panics

0e98a8b

start height was not being updated properly

06d35e9

fixed other test by correctly storing the headers in the storage

abb37f8

removed genesis block creation in ChainState creation

4a26a7f

fixed clippy warnings

bc544a5

Merge branch 'remove-chain-storage' into remove-headers-from-chain-state

0b05253

dropped unuseed code

4be7e6e

removed filters field from ChainState (ez)

b492b93

traits created

90f0b07

merged storage worker rework

014baff

merged remove-filters-from-chain-state

7f24a22

merged remove-headers-from-chain-state

e8ce999

everything moved where I want it to be

cac9b29

general structure made

8759858

ZocoLini added 4 commits December 26, 2025 18:53

persist segments caches now requires the directory where the user wan…

58c29ad

…ts to write the data

using rwlock to allow segmentcache mutability behind inmutable ref

c809c1f

clear method fixed

0bf4407

default method implementations in storage traits

3a9e273

github-actions bot added the merge-conflict The PR conflicts with the target branch. label Dec 26, 2025

ZocoLini marked this pull request as draft December 26, 2025 21:29

coderabbitai bot reviewed Dec 26, 2025

View reviewed changes

ZocoLini added 3 commits December 29, 2025 17:50

storage manager trait implemented

f40a2bd

fixed code to pass the tests

8a542f0

storage documentation updated

2bc0490

ZocoLini mentioned this pull request Dec 30, 2025

Refactor/peer persistence #327

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: storage manager trait splitted in multiple subtraits #311

Refactor: storage manager trait splitted in multiple subtraits #311

Uh oh!

ZocoLini commented Dec 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 26, 2025

Uh oh!

coderabbitai bot commented Dec 26, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor: storage manager trait splitted in multiple subtraits #311

Are you sure you want to change the base?

Refactor: storage manager trait splitted in multiple subtraits #311

Uh oh!

Conversation

ZocoLini commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 26, 2025

Uh oh!

coderabbitai bot commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZocoLini commented Dec 26, 2025 •

edited

Loading

coderabbitai bot commented Dec 26, 2025 •

edited

Loading