-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Description
Hi guys,
while analyzing the use of a controller to manage KVrocks clusters in a multi-AZ deployment, we identified a risk of split-brain scenarios.
For example, in the diagram below, when a network partition occurs:
- The connection between the Load Balancer (LB) and Node N1 remains functional
- But the connection between the Controller Master and Node N1 fails due to network issues
This causes the Controller Master to mistakenly mark N1 as faulty and trigger failover, promoting N1's slave to master. However, N1 is actually healthy and continues to accept writes from the LB.
Result : Two active masters (split-brain) exist for the same shard, leading to data inconsistency.
In the current solution, this situation can occur in many scenarios, in addition to network partitions, there are also instances that hang for a while and then recover during failover and provide write services normally.
Metadata
Metadata
Assignees
Labels
No labels