add latch lsn #839

JacksonYao287 · 2025-12-08T10:55:41Z

there is a very corner case like following:

    T1: L1 is leader, push data to follower F1, F1 generate rreq1 and allocate blk for this data. let`s say the
    lsn of this data is lsn-100

    T2: leader switchs to L2, L2 send log to F1, F1 generate rreq2 and try to allocate blk but got
    no_space_left. then it will set the quiesce flag and waiting for the commit of lsn-99 (100 - 1).

    T3: leader switchs back to L1, L1 send log (lsn-100) to F1, when F1 tries to create rreq for lsn-100,  rreq1
    (which is created at T1) will be found since the rkey{server,term,dsn} is same as rreq1. so F1 will skip
    creating a new rreq. since the data for lsn-100 is already in written at T1, F1 start appending log entries
    to its log store, which will call logstore::append_log_entries and thus call localize_journal_entry_finish.

    T4: lsn-99 is committed at F1 and clear_chunk_req is called, so all the rreqs in F1 including rreq1 are
    cleared.

    T5: F1 call localize_journal_entry_finish, rreq1 is not found since it is cleared at T4, so F1 will try to
    create a new rreq for lsn-100. but now, F1 is in quiesce state, all the rreq creation will be rejected, so a
    nullptr will be returned. and cause RELEASE_ASSERT(rreq != nullptr) failure.

when we got no_space_left in log channel at follower, we set this latch_lsn, so if any lsn in this batch that is >= latch_lsn, the whole batch will be rejected. after we successfully handle no_space_left and call resume_accepting_reqs, latch_lsn will be reset to max value.

xiaoxichen

in general lgtm. The corner case explanation can goes into commit message.

xiaoxichen

also considering adding more logs before latch_lsn.store() for the sake of debugability (better to wrap it into a function ).

xiaoxichen · 2025-12-10T02:41:50Z

src/lib/logstore/log_dev.cpp

        if (!get_pending_request_num()) break;
        std::this_thread::sleep_for(std::chrono::milliseconds(1000));
    }
-    {


I doubt these lines can be removed as we still go through async mode for non-repl-dev cases(UT)

ok , this is good suggestion, I will keep it here and add a comment to notify the later comers that if they want to use aysnc mode, they should take m_pending_callback into account

xiaoxichen · 2025-12-10T02:44:01Z

src/lib/replication/repl_dev/raft_repl_dev.cpp

+        // there is a very corner case like following:
+        // T1: L1 is leader, push data to follower F1, F1 generate rreq1 and allocate blk for this data. let`s say the
+        // lsn of this data is lsn-100
+
+        // T2: leader switchs to L2, L2 send log to F1, F1 generate rreq2 and try to allocate blk but got
+        // no_space_left. then it will set the quiesce flag and waiting for the commit of lsn-99 (100 - 1).
+
+        // T3: leader switchs back to L1, L1 send log (lsn-100) to F1, when F1 tries to create rreq for lsn-100,  rreq1
+        // (which is created at T1) will be found since the rkey{server,term,dsn} is same as rreq1. so F1 will skip
+        // creating a new rreq. since the data for lsn-100 is already in written at T1, F1 start appending log entries
+        // to its log store, which will call logstore::append_log_entries and thus call localize_journal_entry_finish.
+
+        // T4: lsn-99 is committed at F1 and clear_chunk_req is called, so all the rreqs in F1 including rreq1 are
+        // cleared.
+
+        // T5: F1 call localize_journal_entry_finish, rreq1 is not found since it is cleared at T4, so F1 will try to
+        // create a new rreq for lsn-100. but now, F1 is in quiesce state, all the rreq creation will be rejected, so a
+        // nullptr will be returned. and cause RELEASE_ASSERT(rreq != nullptr) failure.
+


Suggest move these lines into commit message.

Iet`s keep it here, and I will put it into commit message when merging

codecov-commenter · 2025-12-11T09:21:41Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 28.57143% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.65%. Comparing base (1a0cef8) to head (1cdebad).
⚠️ Report is 295 commits behind head on master.

Files with missing lines	Patch %	Lines
src/lib/replication/repl_dev/raft_repl_dev.cpp	29.41%	5 Missing and 7 partials ⚠️
src/include/homestore/replication/repl_dev.h	0.00%	2 Missing ⚠️
src/lib/logstore/log_dev.cpp	0.00%	0 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #839      +/-   ##
==========================================
- Coverage   56.51%   49.65%   -6.86%     
==========================================
  Files         108      110       +2     
  Lines       10300    11324    +1024     
  Branches     1402     5334    +3932     
==========================================
- Hits         5821     5623     -198     
+ Misses       3894     2082    -1812     
- Partials      585     3619    +3034

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xiaoxichen

lgtm

JacksonYao287 requested review from Besroy, koujl and xiaoxichen December 8, 2025 10:56

xiaoxichen reviewed Dec 9, 2025

View reviewed changes

JacksonYao287 force-pushed the add-latch-lsn branch 5 times, most recently from 2c68776 to 10b9c93 Compare December 9, 2025 10:11

JacksonYao287 requested a review from xiaoxichen December 9, 2025 10:11

xiaoxichen reviewed Dec 10, 2025

View reviewed changes

add latch lsn and make logstore flush inline

1cdebad

JacksonYao287 force-pushed the add-latch-lsn branch from 10b9c93 to 1cdebad Compare December 11, 2025 08:39

JacksonYao287 requested review from sanebay and xiaoxichen December 15, 2025 01:34

xiaoxichen approved these changes Dec 15, 2025

View reviewed changes

JacksonYao287 merged commit e03d114 into eBay:master Dec 15, 2025
40 of 41 checks passed

JacksonYao287 deleted the add-latch-lsn branch December 15, 2025 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add latch lsn #839

add latch lsn #839

Uh oh!

JacksonYao287 commented Dec 8, 2025

Uh oh!

xiaoxichen left a comment

Uh oh!

xiaoxichen left a comment

Uh oh!

xiaoxichen Dec 10, 2025

Uh oh!

JacksonYao287 Dec 11, 2025

Uh oh!

xiaoxichen Dec 10, 2025

Uh oh!

JacksonYao287 Dec 11, 2025

Uh oh!

codecov-commenter commented Dec 11, 2025

Uh oh!

xiaoxichen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add latch lsn #839

add latch lsn #839

Uh oh!

Conversation

JacksonYao287 commented Dec 8, 2025

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

xiaoxichen Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

xiaoxichen Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

JacksonYao287 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Dec 11, 2025

Codecov Report

Uh oh!

xiaoxichen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants