Master Fault Detection in Elasticsearch
In Elasticsearch, Master Fault Detection is a critical component that ensures the health and stability of the cluster. It is responsible for detecting and handling master node failures, which can occur due to various reasons such as network disconnects, transport issues, or even node crashes.
Overview of MasterFaultDetection
The MasterFaultDetection class is an abstract base class that provides a common implementation for both MasterFaultDetection and NodesFaultDetection. It ensures that both classes use the same settings, such as CONNECT_ON_NETWORK_DISCONNECT_SETTING, PING_INTERVAL_SETTING, PING_TIMEOUT_SETTING, and PING_RETRIES_SETTING.
FaultDetection Class
The FaultDetection class implements the Closeable interface and defines the FDConnectionListener class, which is used to handle transport disconnect events. The FDConnectionListener class implements the TransportConnectionListener interface and provides a method to handle transport disconnect events, which is called when a node is disconnected from the cluster.
public abstract class FaultDetection implements Closeable {
// ...
private class FDConnectionListener implements TransportConnectionListener {
// ...
@Override
public void onNodeDisconnected(DiscoveryNode node) {
// ...
}
}
}
MasterFaultDetection Class
The MasterFaultDetection class extends the FaultDetection class and provides additional functionality for master node fault detection. It implements the Listener interface, which provides a method to handle master node failures.
public class MasterFaultDetection extends FaultDetection {
// ...
public interface Listener {
void onMasterFailure(DiscoveryNode masterNode, Throwable cause, String reason);
}
}
MasterPingRequestHandler Class
The MasterPingRequestHandler class is responsible for handling master ping requests. It checks if the request is targeted at the correct master node and if the cluster name matches. If not, it throws an exception.
public class MasterPingRequestHandler implements TransportRequestHandler<MasterPingRequest> {
// ...
@Override
public void messageReceived(final MasterPingRequest request, final TransportChannel channel, Task task) throws Exception {
// ...
if (!request.masterNode.equals(nodes.getLocalNode())) {
throw new ThisIsNotTheMasterYouAreLookingForException();
}
// ...
}
}
ZenDiscovery Class
The ZenDiscovery class is responsible for processing committed cluster states. It provides a method to process the next committed cluster state, which is called when a new cluster state is received.
public class ZenDiscovery extends AbstractLifecycleComponent implements Discovery, PingContextProvider, IncomingClusterStateListener {
// ...
public boolean processNextCommittedClusterState(String reason) {
// ...
final ClusterState newClusterState = pendingStatesQueue.getNextClusterStateToProcess();
// ...
if (newClusterState == null) {
return false;
}
// ...
if (currentState.nodes().isLocalNodeElectedMaster() && newClusterState.nodes().isLocalNodeElectedMaster() == false) {
handleAnotherMaster(currentState, newClusterState.nodes().getMasterNode(), newClusterState.version(), "Via a new cluster state");
return false;
}
// ...
}
}
In summary, Master Fault Detection in Elasticsearch is a critical component that ensures the health and stability of the cluster. It is responsible for detecting and handling master node failures, which can occur due to various reasons such as network disconnects, transport issues, or even node crashes. The MasterFaultDetection class provides a common implementation for both MasterFaultDetection and NodesFaultDetection, and the MasterPingRequestHandler class handles master ping requests. The ZenDiscovery class processes committed cluster states and handles master node failures.