You can configure how the cluster will deal with situations when a node fails. Three modes are available:
-
DRS (default). In this mode, virtual machines and containers which were running on a failed node are relocated to healthy nodes based on available RAM and license capacity. This mode can be used for nodes on which the
pdrsservice is running.Note
Note: If CPU pools are used, virtual machines and containers can only be relocated to other nodes in the same CPU pool. For details, see Section 8.7, “Managing CPU Pools”.
The DRS mode works as follows. The master DRS continuously collects the following data from each healthy node in the cluster via SNMP:
- total node RAM,
- total RAM used by virtual machines,
- total RAM used by containers,
- maximum running virtual machines allowed,
- maximum running containers allowed,
- maximum running virtual machines and containers allowed.
If a node fails, the
shamanservice sends a list of virtual machines and containers which were running on that node to the master DRS that sorts it by most required RAM. Using the collected data on node RAM and licenses, the master DRS then attempts to find a node with the most available RAM and a suitable license for the virtual environment on top of the list (requiring the most RAM). If such a node exists, the master DRS marks the virtual environment for relocation to that node. Otherwise, it marks the virtual environment as broken. Then the master DRS processes the next virtual environment down the list, adjusting the collected node data by the requirements of the previous virtual environment. Having processed all virtual environments on the list, the master DRS sends the list to theshamanservice for actual relocation. * Spare. In this mode, virtual machines and containers from a failed node are relocated to a target backup node—an empty node with enough resources and a license to node all virtual environments from any given node in the cluster. Such a node is required for high availability to work in this mode. To switch to this mode, run:# shaman set-config RESOURCE_RELOCATION_MODE=spare -
Round-robin (default fallback). In this mode, virtual machines, containers, and iSCSI targets from a failed node are relocated to healthy nodes in the round-robin manner. To switch to this mode, run:
# shaman set-config RESOURCE_RELOCATION_MODE=round-robin
Additionally, you can set a fallback relocation mode in case the chosen relocation mode fails. For example:
# shaman set-config RESOURCE_RELOCATION_MODE=drs,spare