MySQL Apache Failover System with DRBD, Pacemaker, Corosync
Environment
To setup a failover system with Pacemaker, Corosync and DRBD, we need 2 servers, 3 IP address in the same subnet and one partition with same size in both server. In this setup, mysql and apache are already configured
Software requirement:
Software requirement:
- DRBD & drbdlinks
- Pacemaker & Corosync
- psmisc package (needed by pacemaker)
It is highly recommended to use separate network adapters for synchronization but it is also work with 1 network adapter.
In this example, we use below configuration:
- Node 1, hostname: fo2, ip: 10.0.0.2, synchronization partition: /dev/sdb1
- Node 2, hostname: fo3, ip: 10.0.0.3, synchronization partition: /dev/sdb1
- IP Cluster: 10.0.0.1
- Domain: test.ok
- Synchronization folder: /sync
10.0.0.2 fo2.test.ok 10.0.0.3 fo3.test.ok
DRBD Setup
Installation:
aptitude install drbd8-utils drbdlinks
Configuration:
Configure each node to use ntp server, It is important for drbd because of the filesystem timestamps. We need to disable drbd init script because pacemaker will handle the start and stop of drbd.
update-rc.d -f drbd remove
Edit /etc/drbd.d/global_common.conf so it contains:
global {
usage-count no;
}
common {
protocol C;
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
}
startup {
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}
syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
rate 100M;
al-extents 257;
}
}
Create file /etc/drbd.d/r0.res and put the following configuration:
resource r0 {
protocol C;
device /dev/drbd0 minor 0;
disk /dev/sdb1;
flexible-meta-disk internal;
# following 2 definition are equivalent
on fo2 {
address 10.0.0.2:7801;
}
on fo3 {
address 10.0.0.3:7801;
}
net {
after-sb-0pri discard-younger-primary; #discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
}
}
Copy those 2 configs to fo3:
fo2# scp /etc/drbd.d/* fo3:/etc/drbd.d/
Metadata initialization
Do this following command on each node:
drbdadm create-md r0
If some error appears which said that a file system exist, you must delete the filesystem with this command:
dd if=/dev/zero bs=512 count=512 of=/dev/sdb1
After this, you can try to start the drbd on both node:
/etc/init.d/drbd start
If everything ok, you will get similar with this result with command: cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:524236
Initial Synchronization
Run the following command to initiate synchronization:
fo2# drbdadm -- --overwrite-data-of-peer primary r0
Above command will make fo2 a primary and start synchronization between the node. After that you can create a filesystem, only do it in primary node.
fo2# mkfs.ext4 /dev/drbd0
drbdlinks configuration
Copy drbdlinks init script to /etc/init.d/
cp /usr/sbin/drbdlinks /etc/init.d/
Modify /etc/drbdlinks.conf so it contains similar like below:
mountpoint('/sync') link('/var/www','/sync/www') link('/var/lib/mysql','/sync/mysql')
Pacemaker & Corosync Setup
Installation
aptitude install pacemaker
Configuration
To use unicast messaging with corosync, you must use corosync version > 1.4, debian squeeze stable distribution only provide version 1.2, you must use debian testing or squeeze-backport to install version 1.4
Edit /etc/corosync/corosync.conf:
Edit /etc/corosync/corosync.conf:
totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 3600
vsftype: none
max_messages: 20
clear_node_high_bit: yes
secauth: off
threads: 0
rrp_mode: none
interface {
member {
memberaddr: 10.0.0.2
}
member {
memberaddr: 10.0.0.3
}
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 10.0.0.0
#mcastaddr: 226.94.1.1
mcastport: 5405
ttl: 1
}
transport: udpu
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
}
aisexec {
user: root
group: root
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
logfile: /var/log/corosync/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}
Copy that config to fo3:
fo2# scp /etc/corosync/corosync.conf fo3:/etc/corosync/
Create key for corosync and copy to fo3, this command will take some time:
fo2# corosync-keygen fo2# scp /etc/corosync/authkey fo3:/etc/corosync/
Start the cluster:
/etc/init.d/corosync start
Check the cluster state, the result will similar like this:
fo2# crm_mon -1f -V
crm_mon[1929]: 2022/02/21_00:14:30 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_mon[1929]: 2022/02/21_00:14:30 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_mon[1929]: 2022/02/21_00:14:30 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
============
Last updated: Mon Feb 21 00:14:30 2022
Stack: openais
Current DC: fo2 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ fo2 fo3 ]
The following command must be run only on 1 node, it will be synchronized automatically between nodes. We don't need STONITH resources so we can disable it using this command:
fo2# crm configure property stonith-enabled="false"
Because we only setup failover system with 2 nodes, we must disable the quorum policy, if not the resources will not move to another node when primary node fail.
fo2# crm configure property no-quorum-policy="ignore"
Failover configuration
Connect to cluster system:
crm conf
Add this following configuration:
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip=10.0.0.1 \
op monitor interval=30s
primitive WebSite ocf:heartbeat:apache params configfile=/etc/apache2/apache2.conf op monitor interval=1min
primitive DBase ocf:heartbeat:mysql
primitive Links heartbeat:drbdlinks
primitive r0 ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="29s" role="Master" \
op monitor interval="31s" role="Slave"
ms ms_r0 r0 \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
primitive WebFS ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/sync" fstype="ext4"
group WebServer ClusterIP WebFS Links DBase WebSite
colocation WebServer-with-ms_ro inf: WebServer ms_r0:Master
order WebServer-after-ms_ro inf: ms_r0:promote WebServer:start
location prefer-fo2 WebServer 50: fo2
commit
Above configuration will make fo2 always as preferred node. When the fo2 recovered from failure, the resources will be taken again by fo2.
Check the Cluster state again, resources should be activated on fo2.
Check the Cluster state again, resources should be activated on fo2.
Failover test
To test the failover, you can use this command:
fo2# crm node standby
The resources should move to another node.
By running this command:
fo2# crm node online
The resources should be back to fo2.
To move resources to another node:
crm resource move WebServer fo3
To give back the control to the Cluster system:
crm resource unmove WebServer
Comments