-
Notifications
You must be signed in to change notification settings - Fork 24
add latch lsn #839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add latch lsn #839
Conversation
xiaoxichen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general lgtm. The corner case explanation can goes into commit message.
2c68776 to
10b9c93
Compare
xiaoxichen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also considering adding more logs before latch_lsn.store() for the sake of debugability (better to wrap it into a function ).
| if (!get_pending_request_num()) break; | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(1000)); | ||
| } | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt these lines can be removed as we still go through async mode for non-repl-dev cases(UT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok , this is good suggestion, I will keep it here and add a comment to notify the later comers that if they want to use aysnc mode, they should take m_pending_callback into account
| // there is a very corner case like following: | ||
| // T1: L1 is leader, push data to follower F1, F1 generate rreq1 and allocate blk for this data. let`s say the | ||
| // lsn of this data is lsn-100 | ||
|
|
||
| // T2: leader switchs to L2, L2 send log to F1, F1 generate rreq2 and try to allocate blk but got | ||
| // no_space_left. then it will set the quiesce flag and waiting for the commit of lsn-99 (100 - 1). | ||
|
|
||
| // T3: leader switchs back to L1, L1 send log (lsn-100) to F1, when F1 tries to create rreq for lsn-100, rreq1 | ||
| // (which is created at T1) will be found since the rkey{server,term,dsn} is same as rreq1. so F1 will skip | ||
| // creating a new rreq. since the data for lsn-100 is already in written at T1, F1 start appending log entries | ||
| // to its log store, which will call logstore::append_log_entries and thus call localize_journal_entry_finish. | ||
|
|
||
| // T4: lsn-99 is committed at F1 and clear_chunk_req is called, so all the rreqs in F1 including rreq1 are | ||
| // cleared. | ||
|
|
||
| // T5: F1 call localize_journal_entry_finish, rreq1 is not found since it is cleared at T4, so F1 will try to | ||
| // create a new rreq for lsn-100. but now, F1 is in quiesce state, all the rreq creation will be rejected, so a | ||
| // nullptr will be returned. and cause RELEASE_ASSERT(rreq != nullptr) failure. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest move these lines into commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iet`s keep it here, and I will put it into commit message when merging
10b9c93 to
1cdebad
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #839 +/- ##
==========================================
- Coverage 56.51% 49.65% -6.86%
==========================================
Files 108 110 +2
Lines 10300 11324 +1024
Branches 1402 5334 +3932
==========================================
- Hits 5821 5623 -198
+ Misses 3894 2082 -1812
- Partials 585 3619 +3034 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiaoxichen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
there is a very corner case like following:
when we got no_space_left in log channel at follower, we set this latch_lsn, so if any lsn in this batch that is >= latch_lsn, the whole batch will be rejected. after we successfully handle no_space_left and call resume_accepting_reqs, latch_lsn will be reset to max value.