Skip to content

Conversation

@Besroy
Copy link
Contributor

@Besroy Besroy commented Mar 13, 2025

In previous implementation, there may be a concurrency issue between 'permanent_destroy' and 'cp_flush/flush_durable_commit_lsn': if m_rd_sb is destroyed at T1 then is written at T2, a NPE will occur.
This pr is a fix way to avoid this issue by adding m_sb_mtx for destroy.

In addition to the current fix, there are two other potential approaches to address this issue:

  1. support triggering specific CP: so that HO's pg_destroy won't trigger repl_dev's cp_flush, which acquiring m_rd_map's lock
  2. refine leave_group func: modify the leave_group within the nuraft_mesg component to only shut down servers. so that gc can call leave_group and upper destroy without lock, call permanent_destroy and remove it from map with lock

In previous implementation, there may be a concurrency issue between 'permanent_destroy' and 'cp_flush/flush_durable_commit_lsn':
if m_rd_sb is destroyed at T1 then is written at T2, a NPE will occur.
This pr is a fix way to avoid this issue by adding m_sb_mtx for destroy.
@Besroy Besroy requested a review from yuwmao March 13, 2025 09:15
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 42.85714% with 4 lines in your changes missing coverage. Please review.

Project coverage is 66.18%. Comparing base (1a0cef8) to head (e36b56e).
Report is 146 commits behind head on master.

Files with missing lines Patch % Lines
src/lib/replication/repl_dev/raft_repl_dev.cpp 42.85% 3 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #661      +/-   ##
==========================================
+ Coverage   56.51%   66.18%   +9.67%     
==========================================
  Files         108      109       +1     
  Lines       10300    11288     +988     
  Branches     1402     1539     +137     
==========================================
+ Hits         5821     7471    +1650     
+ Misses       3894     3091     -803     
- Partials      585      726     +141     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xiaoxichen
Copy link
Collaborator

thanks for the effort to refine the edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants