Introduction
This feature introduces the capability to automatically disable KVM hosts when a customizable health check fails, and automatically re-enable them when the health check subsequently succeeds. This ensures that existing Hosts remain stable and operational, and new Instances will not be deployed on disabled Hosts (existing Instances running on disabled or auto-disabled hosts are not affected).
This feature is particularly beneficial in scenarios where maintaining the health of a large number of Hosts is crucial. For instance, in a data centre with numerous Instances spread across multiple Hosts, this feature can help administrators by automatically isolating Hosts that are experiencing issues.
Feature Description
The auto enable / disable functionality for KVM hosts is controlled by a new cluster setting in CloudStack: `enable.kvm.host.auto.enable.disable` which is disabled by default. Administrators can enable this functionality for specific clusters or globally.
In clusters where the auto enable / disable functionality is enabled, administrators must firstly define an executable file or script that will be used for the health check. Secondly, they need to set the location of this file or script in the ‘agent.health.check.script.path’ property. This property can be found in the /etc/cloudstack/agent/agent.properties file.
Functionality Specification
Health Check Results
The following actions are taken by CloudStack based on the health check result for a KVM host:
Result | Description |
Null | No action is taken. This scenario may occur if the health check script is not specified, inaccessible, or non-executable. |
True | If a Host was auto-disabled due to a health check failure, it will auto-enable after passing the next check. But, if an administrator manually disabled a Host, it won’t auto-enable, even after passing a health check. |
False | The host is auto-disabled if it was previously enabled or auto-enabled by previous successful health checks. This automatic disabling prevents the Host from being utilized while it is experiencing health issues. |
Automatic Alerts and Annotations
Whenever an auto-enable or disable event occurs, the CloudStack management server sends an email alert to the Administrator, notifying them about the change in the Host’s resource state. Additionally, an automatic annotation or comment is added or updated on the specific Host, providing a record of the event.
Priority of Manual Disabling
It’s essential to note that if a Host is manually disabled by the administrator, it takes precedence over auto-enabling triggered by health check success. In such cases, even if the health check subsequently succeeds, the Host remains disabled until explicitly enabled by the administrator.
Auto-Enable or Auto-Disable
CloudStack utilizes a host detail record with the key ‘autoenablekvmhost’ to control the auto-enable and auto-disable behavior. The following scenarios outline how this record influences the feature:
• If a host is auto-enabled or auto-disabled, and there is no Host detail record with the key `autoenablekvmhost` for that Host, a new Host detail record is created with the key `autoenablekvmhost` set to ‘true’. This indicates that the Host can be automatically enabled or disabled based on health check results.
• If the administrator manually disables a Host and a host detail record with the key `autoenablekvmhost` exists, the value of `autoenablekvmhost` is set to ‘false’. This ensures that the Host does not get auto-enabled when the health check succeeds, giving manual Host disabling priority.
• If the administrator manually enables a Host and a Host detail record with the key `autoenablekvmhost` exists, the value of `autoenablekvmhost` is set to ‘true’. This allows the Host to be auto-disabled if the health check fails.
• When the cluster setting `enable.kvm.host.auto.enable.disable` is disabled, and the administrator enables or disables Hosts within the Cluster, no Host detail record with the key `autoenablekvmhost` is created. This preserves the usual behavior without the auto-enable and auto-disable feature.
KVM Host Health Checks
The health check result for each KVM host is determined based on the execution of a specified script and its resulting exit code:
• The health check result is true if the script executes successfully and returns an exit code of 0, indicating a successful health check.
• The health check result is false if the script executes successfully but returns an exit code of 1, indicating a failed health check.
• The health check result is null under several conditions, typically indicating that the health check script could not be executed for some reason:
o The script file is not specified in the `agent.health.check.script.path` property in the /etc/cloudstack/agent/agent.properties file.
o The script file specified in the `agent.health.check.script.path` property does not exist.
o The script file exists but is not accessible by the user of the cloudstack-agent process, possibly due to file permissions.
o The script file exists and is accessible but is not executable, possibly due to missing executable permissions.
o There are errors when the script is executed, such as syntax errors in the script or an exit code other than 0 or 1 is returned.
If a Host is auto-disabled due to a health check failure, the administrator can manually re-enable the Host. However, unless the health check is also disabled for that specific Host, it will be auto-disabled again if the health check fails.
This feature relies on the Host being accessible. If a Host is unreachable, perhaps due to network problems, the script for this feature also becomes inaccessible and automatic disable / re-enable feature will not work.
Conclusion
The auto enable / disable functionality enhances CloudStack’s KVM host management based on customizable health checks, automatically enabling, or disabling hosts based on their health status. Administrators gain greater control over host resource states, and alerts and annotations provide clear visibility of the automatic changes.
This feature will be available in CloudStack since version 4.19.0.
Nicolas Vazquez is a Senior Software Engineer at ShapeBlue and is a PMC member of the Apache CloudStack project. He spends his time designing and implementing features in Apache CloudStack and can be seen acting as a release manager also. Nicolas is based in Uruguay and is a father of a young girl. He is a fan of sports, enjoy playing tennis and football. In his free time, he also enjoys reading and listening to economic and politics materials.