How to deploy a multi-region failover cluster with Keepalived
Configure a high-availability setup across two Ubuntu 24.04 nodes using Keepalived for VIP management, VRRP synchronization, and automatic failover to ensure business continuity during outages.
You will configure a high-availability cluster using Keepalived to manage a virtual IP address that floats between two physical nodes. These steps target Ubuntu 24.04 LTS nodes with a private network interface for VRRP communication and a public interface for the virtual IP. You will install the required packages, configure the VRRP instance, set up health checks, and enable automatic failover across regions.
Prerequisites
- Two Ubuntu 24.04 LTS servers (or more) with a static private IP address on an internal interface (e.g., eth1).
- One public IP address per node or a single shared VIP routed to the primary node.
- SSH access to both nodes as root or a user with sudo privileges.
- Firewall rules configured to allow VRRP protocol (IP protocol 112) between nodes.
- A static DNS record pointing the domain name to the virtual IP address.
Step 1: Install Keepalived and dependencies
Update the package index and install the keepalived package. This package includes the VRRP daemon and the configuration tools needed to manage the virtual IP address.
apt update
apt install keepalived -y
After installation, verify the service is active. You should see keepalived running as an enabled service.
systemctl status keepalived
The output should show active (exited) or active (running) depending on your system state.
Step 2: Configure the primary node
Edit the main configuration file for Keepalived. This file defines the virtual router instance, the virtual IP address, and the priority settings that determine which node owns the VIP.
vim /etc/keepalived/keepalived.conf
Add the following configuration to the file. Replace 192.168.1.10 with your primary node's private IP, 192.168.1.100 with the virtual IP, and 10.0.0.1 with the private IP of the secondary node.
global_defs {
router_id LVS_DEVEL
script_user root
}
vrrp_script check_health {
script "/usr/local/bin/check_health.sh"
interval 2
weight -20
}
vrrp_instance VI_1 {
state MASTER
interface eth1
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.100 dev eth1
}
track_script {
check_health
}
}
Create the health check script referenced in the configuration. This script checks if a critical service, like Nginx, is running on the primary node. If the service stops, Keepalived reduces the priority and triggers failover.
vim /usr/local/bin/check_health.sh
Insert the following content into the script. This simple bash script checks for the Nginx process. Adjust the process name if you use Apache or another service.
#!/bin/bash
if pgrep -x "nginx" > /dev/null; then
exit 0
else
exit 1
fi
Make the script executable so Keepalived can run it.
chmod +x /usr/local/bin/check_health.sh
Step 3: Configure the secondary node
Copy the configuration file from the primary node to the secondary node. Ensure you change the state from MASTER to BACKUP and adjust the priority to a lower value, such as 90.
scp /etc/keepalived/keepalived.conf root@:/etc/keepalived/keepalived.conf
Edit the configuration on the secondary node to reflect its role as the backup.
vim /etc/keepalived/keepalived.conf
Update the following lines in the file for the secondary node:
vrrp_instance VI_1 {
state BACKUP
interface eth1
virtual_router_id 51
priority 90
...
}
Ensure the virtual_ipaddress line remains identical to the primary node. The virtual IP must be the same on both nodes, but only the primary should initially hold it.
Step 4: Configure firewall rules
Allow the VRRP protocol through the firewall on both nodes. The default Ubuntu firewall (UFW) blocks protocol 112. You must explicitly allow it to prevent communication failures.
ufw allow proto 112 from
ufw reload
Verify the rule is active by listing the firewall status.
ufw status verbose
You should see a rule allowing protocol 112 from the secondary node's private IP address.
Step 5: Start and enable the service
Restart the Keepalived service on both nodes to apply the new configurations. The virtual IP address should now be assigned to the primary node.
systemctl restart keepalived
systemctl enable keepalived
Check the logs to ensure there are no errors during startup.
journalctl -u keepalived -f
Look for a line indicating that the virtual IP address is bound to the interface.
Virtual IP 192.168.1.100 is bound to eth1
Verify the installation
Run the following command to see the current state of the VRRP instance and the assigned virtual IP.
keepalived --check
The output should show VRRP_VI_1 with State=Master on the primary node and State=Backup on the secondary node. If you see State=Master on both nodes, check your priority settings and authentication passwords.
Troubleshooting
If the virtual IP does not appear on the expected interface, check the kernel parameters for VRRP. Ensure that the sysctl setting for VRRP is enabled.
sysctl net.ipv4.vrrp_strict_mode=0
Restart the service after changing kernel parameters.
systemctl restart keepalived
If the secondary node fails to take over, inspect the logs for authentication errors. Mismatched passwords in the auth_pass field will prevent the nodes from communicating. Verify that the private network interface has a static IP and is not using DHCP, as dynamic IPs break VRRP synchronization.
Ensure the health check script returns exit code 0 when the service is running and exit code 1 when it is stopped. If the script fails, the node will not advertise the VIP even if the service is healthy. Test the script manually to confirm it works as intended.
Finally, verify that the DNS record for the domain points to the virtual IP address. If the DNS record points to the primary node's private IP, clients will not reach the service during a failover. Update the DNS record to point to the public virtual IP address.