Skip to content

Conversation

@JacksonYao287
Copy link
Collaborator

@JacksonYao287 JacksonYao287 commented Sep 1, 2025

1 change create/seal shard to log only , so no push_data happens for these two requests.
2 add a check for on_commit put_blob to avoid putting blob to a sealed shard.
3 move select_specific_chunk(mark chunk to inuse state) to on_commit of create_shard.
4 write shard footer for seal shard in baseline resync
5 add a sealed_lsn for shard meta blk, which is not compatible with previous version

@JacksonYao287 JacksonYao287 marked this pull request as draft September 1, 2025 09:05
@JacksonYao287 JacksonYao287 force-pushed the refactor-create-shard branch 3 times, most recently from 78f3f1a to a6d7c1f Compare September 2, 2025 02:43
@JacksonYao287 JacksonYao287 marked this pull request as ready for review September 2, 2025 02:43
// +1 here means the created shard should have the capacity to hold at least one blob, otherwise we refuse this
// request.
const auto v_chunk_id = v_chunkID.value();
const static uint64_t shard_super_blk_count{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simply use get_reserved_blks() ?
If we want to protect the case that reserved_bytes_in_chunk set to zero, make this logical into get_reserved_blks() or simply force it to reserve a few MB.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to our current logic, only seal_shard can use the reserved space(default 4 blks), which can make sure the seal_shard can be done in any case.

if we allow create shard to use the reserved space, it will break the above consumption?
sorry, I am not quite clear about you suggestion, or let me ask this in another way, how to guarantee the blk can be successfully allocated if we use get_reserved_blks()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean using the get_reserved_blks() which default to 16MB , if any chunk has less than get_reserved_blks() bytes we simply dont use it. Do not need to calculate the size of Shard Header.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, I use const static on purpose to avoid it to be calculated every time. it will be calculated only once.

homestore::BlkAllocStatus alloc_status;
auto gc_mgr = gc_manager();

while (true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possbile we failed to get enough space in first emergent GC(which already cleared all written-but-not-referenced data)? Seems like we dont need a while(true).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theoretically, there are some very corner cases that we failed to get enough space in first emergent GC.

for example, if we have two stale push_data req and each of them will try to allocate 4 blks, and meanwhile, in the target chunk, there only left exact 4-blk free space, other blks are all valid.

1 the first stale push_data take 4 blks in the target chunk, which lead to the no_space_left when we try to allocate blk for shard header, and we trigger emergent gc to free 4 blks

2 the second stale push_data comes after emergent gc and take the 4 blks just freed, then when trying to allocate blk for shard header, the second no_space_left occurs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering Leader will not create shard on the chunk if chunk avaialbe space is lower than given threshold, is this still possible?

Copy link
Collaborator Author

@JacksonYao287 JacksonYao287 Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering Leader will not create shard on the chunk if chunk avaialbe space is lower than given threshold

the available space of a chunk when committing create_shard might be different from that when being selected for creating_shard because of the staled push_data request(put_blob).

Actually, for both leader and follower, this corner case might happen. this root cause is that we don`t exactly know how many staled push_data request are in filght , and when they will try to allocate blk.

a while-true loop here will not bring too much cost, since almost all the alloc_blk will succeed in the first time.

@JacksonYao287 JacksonYao287 force-pushed the refactor-create-shard branch 2 times, most recently from cfce547 to 4fd93bd Compare September 2, 2025 04:50
@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 51.74129% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.21%. Comparing base (1746bcc) to head (c1d3e02).
⚠️ Report is 120 commits behind head on main.

Files with missing lines Patch % Lines
src/lib/homestore_backend/hs_shard_manager.cpp 51.66% 50 Missing and 8 partials ⚠️
src/lib/homestore_backend/hs_blob_manager.cpp 18.51% 19 Missing and 3 partials ⚠️
src/lib/homestore_backend/replication_message.hpp 46.15% 6 Missing and 1 partial ⚠️
...ib/homestore_backend/replication_state_machine.cpp 50.00% 4 Missing and 1 partial ⚠️
src/lib/homestore_backend/hs_pg_manager.cpp 0.00% 4 Missing ⚠️
...lib/homestore_backend/snapshot_receive_handler.cpp 95.23% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #340      +/-   ##
==========================================
- Coverage   63.15%   58.21%   -4.94%     
==========================================
  Files          32       35       +3     
  Lines        1900     4301    +2401     
  Branches      204      519     +315     
==========================================
+ Hits         1200     2504    +1304     
- Misses        600     1533     +933     
- Partials      100      264     +164     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JacksonYao287
Copy link
Collaborator Author

@xiaoxichen I have address your comments in the latest commit, ptal

@JacksonYao287 JacksonYao287 force-pushed the refactor-create-shard branch 2 times, most recently from c7e7dfd to 3247d3c Compare September 3, 2025 08:20
Copy link
Collaborator

@xiaoxichen xiaoxichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Lets bake it in SH before merging. The thinking is we might have more tiny fixes to be merge recently , want to avoid taking this out to production.

@JacksonYao287
Copy link
Collaborator Author

JacksonYao287 commented Sep 5, 2025

lgtm.

Lets bake it in SH before merging. The thinking is we might have more tiny fixes to be merge recently , want to avoid taking this out to production.

sure, I will let you know after I have run storage hammer for several rounds and confident to merge it

@JacksonYao287 JacksonYao287 force-pushed the refactor-create-shard branch 10 times, most recently from 7338428 to a0a5afa Compare September 8, 2025 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants