refactor_create/seal_shard #340

JacksonYao287 · 2025-09-01T09:01:45Z

1 change create/seal shard to log only , so no push_data happens for these two requests.
2 add a check for on_commit put_blob to avoid putting blob to a sealed shard.
3 move select_specific_chunk(mark chunk to inuse state) to on_commit of create_shard.
4 write shard footer for seal shard in baseline resync
5 add a sealed_lsn for shard meta blk, which is not compatible with previous version

src/lib/homestore_backend/replication_state_machine.cpp

src/lib/homestore_backend/heap_chunk_selector.cpp

xiaoxichen · 2025-09-02T03:16:44Z

src/lib/homestore_backend/hs_shard_manager.cpp

+    // +1 here means the created shard should have the capacity to hold at least one blob, otherwise we refuse this
+    // request.
    const auto v_chunk_id = v_chunkID.value();
+    const static uint64_t shard_super_blk_count{


can we simply use get_reserved_blks() ?
If we want to protect the case that reserved_bytes_in_chunk set to zero, make this logical into get_reserved_blks() or simply force it to reserve a few MB.

according to our current logic, only seal_shard can use the reserved space(default 4 blks), which can make sure the seal_shard can be done in any case.

if we allow create shard to use the reserved space, it will break the above consumption?
sorry, I am not quite clear about you suggestion, or let me ask this in another way, how to guarantee the blk can be successfully allocated if we use get_reserved_blks()?

I mean using the get_reserved_blks() which default to 16MB , if any chunk has less than get_reserved_blks() bytes we simply dont use it. Do not need to calculate the size of Shard Header.

here, I use const static on purpose to avoid it to be calculated every time. it will be calculated only once.

src/lib/homestore_backend/hs_shard_manager.cpp

xiaoxichen · 2025-09-02T03:47:49Z

src/lib/homestore_backend/hs_shard_manager.cpp

+    homestore::BlkAllocStatus alloc_status;
+    auto gc_mgr = gc_manager();
+
+    while (true) {


is it possbile we failed to get enough space in first emergent GC(which already cleared all written-but-not-referenced data)? Seems like we dont need a while(true).

theoretically, there are some very corner cases that we failed to get enough space in first emergent GC.

for example, if we have two stale push_data req and each of them will try to allocate 4 blks, and meanwhile, in the target chunk, there only left exact 4-blk free space, other blks are all valid.

1 the first stale push_data take 4 blks in the target chunk, which lead to the no_space_left when we try to allocate blk for shard header, and we trigger emergent gc to free 4 blks

2 the second stale push_data comes after emergent gc and take the 4 blks just freed, then when trying to allocate blk for shard header, the second no_space_left occurs.

Considering Leader will not create shard on the chunk if chunk avaialbe space is lower than given threshold, is this still possible?

Considering Leader will not create shard on the chunk if chunk avaialbe space is lower than given threshold

the available space of a chunk when committing create_shard might be different from that when being selected for creating_shard because of the staled push_data request(put_blob).

Actually, for both leader and follower, this corner case might happen. this root cause is that we don`t exactly know how many staled push_data request are in filght , and when they will try to allocate blk.

a while-true loop here will not bring too much cost, since almost all the alloc_blk will succeed in the first time.

src/lib/homestore_backend/hs_shard_manager.cpp

codecov-commenter · 2025-09-02T05:31:18Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 51.74129% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.21%. Comparing base (1746bcc) to head (c1d3e02).
⚠️ Report is 120 commits behind head on main.

Files with missing lines	Patch %	Lines
src/lib/homestore_backend/hs_shard_manager.cpp	51.66%	50 Missing and 8 partials ⚠️
src/lib/homestore_backend/hs_blob_manager.cpp	18.51%	19 Missing and 3 partials ⚠️
src/lib/homestore_backend/replication_message.hpp	46.15%	6 Missing and 1 partial ⚠️
...ib/homestore_backend/replication_state_machine.cpp	50.00%	4 Missing and 1 partial ⚠️
src/lib/homestore_backend/hs_pg_manager.cpp	0.00%	4 Missing ⚠️
...lib/homestore_backend/snapshot_receive_handler.cpp	95.23%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #340      +/-   ##
==========================================
- Coverage   63.15%   58.21%   -4.94%     
==========================================
  Files          32       35       +3     
  Lines        1900     4301    +2401     
  Branches      204      519     +315     
==========================================
+ Hits         1200     2504    +1304     
- Misses        600     1533     +933     
- Partials      100      264     +164

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JacksonYao287 · 2025-09-03T08:11:19Z

@xiaoxichen I have address your comments in the latest commit, ptal

xiaoxichen

lgtm.

Lets bake it in SH before merging. The thinking is we might have more tiny fixes to be merge recently , want to avoid taking this out to production.

JacksonYao287 · 2025-09-05T00:46:56Z

lgtm.

Lets bake it in SH before merging. The thinking is we might have more tiny fixes to be merge recently , want to avoid taking this out to production.

sure, I will let you know after I have run storage hammer for several rounds and confident to merge it

JacksonYao287 marked this pull request as draft September 1, 2025 09:05

JacksonYao287 force-pushed the refactor-create-shard branch 3 times, most recently from 78f3f1a to a6d7c1f Compare September 2, 2025 02:43

JacksonYao287 marked this pull request as ready for review September 2, 2025 02:43

JacksonYao287 force-pushed the refactor-create-shard branch from a6d7c1f to 8330325 Compare September 2, 2025 02:57

xiaoxichen reviewed Sep 2, 2025

View reviewed changes

JacksonYao287 force-pushed the refactor-create-shard branch 2 times, most recently from cfce547 to 4fd93bd Compare September 2, 2025 04:50

JacksonYao287 requested review from Besroy, xiaoxichen and yuwmao September 2, 2025 09:57

JacksonYao287 force-pushed the refactor-create-shard branch from 09fe0b9 to 7ed3511 Compare September 3, 2025 07:08

JacksonYao287 force-pushed the refactor-create-shard branch 2 times, most recently from c7e7dfd to 3247d3c Compare September 3, 2025 08:20

xiaoxichen reviewed Sep 4, 2025

View reviewed changes

JacksonYao287 force-pushed the refactor-create-shard branch 10 times, most recently from 7338428 to a0a5afa Compare September 8, 2025 08:31

JacksonYao287 requested a review from xiaoxichen September 9, 2025 09:00

JacksonYao287 force-pushed the refactor-create-shard branch from fe5ac37 to f3a369e Compare September 10, 2025 05:44

refactor_create/seal_shard

c1d3e02

JacksonYao287 force-pushed the refactor-create-shard branch from f3a369e to c1d3e02 Compare September 12, 2025 07:38

refactor_create/seal_shard #340

Are you sure you want to change the base?

refactor_create/seal_shard #340

Uh oh!

Conversation

JacksonYao287 commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaoxichen Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

xiaoxichen Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaoxichen Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

xiaoxichen Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JacksonYao287 commented Sep 3, 2025

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JacksonYao287 commented Sep 1, 2025 •

edited

Loading

JacksonYao287 Sep 3, 2025 •

edited

Loading

codecov-commenter commented Sep 2, 2025 •

edited

Loading

JacksonYao287 commented Sep 5, 2025 •

edited

Loading