MHA切换演练附带日志之最佳实践-亲试OK20201218
MHA切换演练
t1 在Master节点 192.168.124.201 开始sysbech压测,制造测试数据
t2 关停Slave节点 192.168.124.202 的SQL THREAD
mysql> stop slave sql_thread;
Query OK, 0 rows affected (0.02 sec)
t3 MHA Manager节点检测到202距离Master有较大的gap,故会把Master自动切换到 203
以下为完整的切换过程
Fri Dec 18 02:26:21 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. 全局的配置文件 /etc/masterha_default.cnf未找到,故跳过此步
Fri Dec 18 02:26:21 2020 - [info] Reading application default configuration from /etc/mha/app1.conf.. 从默认的配置文件/etc/mha/app1.conf读取相关配置
Fri Dec 18 02:26:21 2020 - [info] Reading server configuration from /etc/mha/app1.conf.. 读取默认的配置文件/etc/mha/app1.conf
c 18 02:26:03 2020 - [info] 192.168.124.202(192.168.124.202:3306) 此处即为读取到的2个Slave
Fri Dec 18 02:26:03 2020 - [info] 192.168.124.203(192.168.124.203:3306)
Fri Dec 18 02:26:03 2020 - [info] Alive Slaves: 此处展现状态是alive的slave server
124.202的一些情况,比如 启用GTID、目标master 、是否可参与选举 master
Fri Dec 18 02:26:03 2020 - [info] 192.168.124.202(192.168.124.202:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 02:26:03 2020 - [info] GTID ON
Fri Dec 18 02:26:03 2020 - [info] Replicating from 192.168.124.201(192.168.124.201:3306)
Fri Dec 18 02:26:03 2020 - [info] Primary candidate for the new Master (candidate_master is set)
124.203的一些情况,比如 启用GTID、目标master 、是否可参与选举 master
Fri Dec 18 02:26:03 2020 - [info] 192.168.124.203(192.168.124.203:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 02:26:03 2020 - [info] GTID ON
Fri Dec 18 02:26:03 2020 - [info] Replicating from 192.168.124.201(192.168.124.201:3306)
Fri Dec 18 02:26:03 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 02:26:03 2020 - [warning] MySQL master is not currently alive! 此处诊断到目前的Master server不可用
Fri Dec 18 02:26:03 2020 - [info] Checking slave configurations.. 检查Slave的配置
Fri Dec 18 02:26:03 2020 - [info] read_only=1 is not set on slave 192.168.124.202(192.168.124.202:3306). 202未配置为 read_only
Fri Dec 18 02:26:03 2020 - [info] read_only=1 is not set on slave 192.168.124.203(192.168.124.203:3306). 203未配置为 read_only
Fri Dec 18 02:26:03 2020 - [info] Checking replication filtering settings.. 检查repl的过滤链路
Fri Dec 18 02:26:03 2020 - [info] Replication filtering check ok. repl的过滤链路 状态OK
Fri Dec 18 02:26:03 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. GTID模式复制靠的是主-从节点间的事务id,因此不需要远程拷贝文件,故不需要SSH
Fri Dec 18 02:26:03 2020 - [info] Getting current master (maybe dead) info .. 检测到目前的Master可能已dead
Fri Dec 18 02:26:03 2020 - [info] Identified master is 192.168.124.201(192.168.124.201:3306). 再次确认目前的Master是 201
Fri Dec 18 02:26:03 2020 - [info] Checking SSH publickey authentication settings on the current master..
Fri Dec 18 02:26:08 2020 - [info] HealthCheck: SSH to 192.168.124.201 is reachable. 201机器的健康监测可使用
Fri Dec 18 02:26:11 2020 - [info] Master MHA Node version is 0.58. MHA软件的版本 0.58
Fri Dec 18 02:26:11 2020 - [info]
192.168.124.201(192.168.124.201:3306) (current master) 201即目前的Master
+--192.168.124.202(192.168.124.202:3306) 目前的Slave
+--192.168.124.203(192.168.124.203:3306) 目前的Slave
Fri Dec 18 02:26:11 2020 - [info] Checking master_ip_failover_script status: 检测故障切换脚本
Fri Dec 18 02:26:11 2020 - [info] /etc/mha/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.124.201 --orig_master_ip=192.168.124.201 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens33:3 down==/sbin/ifconfig ens33:3 192.168.124.205 netmask 255.255.255.255 ;/sbin/arping -I ens33 -c 3 -s 192.168.124.205 192.168.124.1 >/dev/null 2>&1===
Checking the Status of the script.. OK 脚本状态OK
Fri Dec 18 02:26:15 2020 - [info] OK.
Fri Dec 18 02:26:15 2020 - [warning] shutdown_script is not defined. shutdown脚本未配置
Fri Dec 18 02:26:15 2020 - [info] Set master ping interval 1 seconds. 将Master的ping间隔设置为1s
Fri Dec 18 02:26:15 2020 - [info] Set secondary check script: masterha_secondary_check -s 192.168.124.201 -s 192.168.124.202 -s 192.168.124.203
Fri Dec 18 02:26:15 2020 - [info] Starting ping health check on 192.168.124.201(192.168.124.201:3306).. 开始对201做健康检查
Fri Dec 18 02:26:15 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.124.201' (111)) 由于此时201 的mysql服务已关闭,故无法通过3306端口访问201 的mysql服务
Fri Dec 18 02:26:15 2020 - [warning] Connection failed 1 time(s)..
Fri Dec 18 02:26:15 2020 - [info] Executing secondary network check script: masterha_secondary_check -s 192.168.124.201 -s 192.168.124.202 -s 192.168.124.203 --user=root --master_host=192.168.124.201 --master_ip=192.168.124.201 --master_port=3306 --master_user=ha_monitor --master_password=123456 --ping_type=SELECT
Fri Dec 18 02:26:15 2020 - [info] Executing SSH check script: exit 0
Fri Dec 18 02:26:16 2020 - [info] HealthCheck: SSH to 192.168.124.201 is reachable.
Monitoring server 192.168.124.201 is reachable, Master is not reachable from 192.168.124.201. OK. 201只是关停了mysql服务,因此无法通过3306端口访问。但是 201的网络服务还在
Fri Dec 18 02:26:16 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.124.201' (111))
Fri Dec 18 02:26:16 2020 - [warning] Connection failed 2 time(s)..
Fri Dec 18 02:26:17 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.124.201' (111))
Fri Dec 18 02:26:17 2020 - [warning] Connection failed 3 time(s)..
Fri Dec 18 02:26:18 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.124.201' (111))
Fri Dec 18 02:26:18 2020 - [warning] Connection failed 4 time(s).. 以上的mysql连接都是失败,正常哈哈
Monitoring server 192.168.124.202 is reachable, Master is not reachable from 192.168.124.202. OK. 从202无法访问201的mysql服务
Monitoring server 192.168.124.203 is reachable, Master is not reachable from 192.168.124.203. OK. 从203无法访问201的mysql服务
Fri Dec 18 02:26:20 2020 - [info] Master is not reachable from all other monitoring servers. Failover should start. 此处诊断到Master无法访问,故 开始故障切换
Fri Dec 18 02:26:20 2020 - [warning] Master is not reachable from health checker! 健康检查工具已发觉Master无法访问
Fri Dec 18 02:26:20 2020 - [warning] Master 192.168.124.201(192.168.124.201:3306) is not reachable!
Fri Dec 18 02:26:20 2020 - [warning] SSH is reachable. 但是SSH可用,正常
Fri Dec 18 02:26:20 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.conf again, and trying to connect to all servers to check server status..
Fri Dec 18 02:26:20 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Dec 18 02:26:20 2020 - [info] Reading application default configuration from /etc/mha/app1.conf..
Fri Dec 18 02:26:20 2020 - [info] Reading server configuration from /etc/mha/app1.conf..
Fri Dec 18 02:26:21 2020 - [info] GTID failover mode = 1
Fri Dec 18 02:26:21 2020 - [info] Dead Servers: 已关停的server
Fri Dec 18 02:26:21 2020 - [info] 192.168.124.201(192.168.124.201:3306) 201
Fri Dec 18 02:26:21 2020 - [info] Alive Servers: 存活的server
Fri Dec 18 02:26:21 2020 - [info] 192.168.124.202(192.168.124.202:3306) 202
Fri Dec 18 02:26:21 2020 - [info] 192.168.124.203(192.168.124.203:3306) 203
Fri Dec 18 02:26:21 2020 - [info] Alive Slaves:
Fri Dec 18 02:26:21 2020 - [info] 192.168.124.202(192.168.124.202:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 02:26:21 2020 - [info] GTID ON
Fri Dec 18 02:26:21 2020 - [info] Replicating from 192.168.124.201(192.168.124.201:3306)
Fri Dec 18 02:26:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 02:26:21 2020 - [info] 192.168.124.203(192.168.124.203:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 02:26:21 2020 - [info] GTID ON
Fri Dec 18 02:26:21 2020 - [info] Replicating from 192.168.124.201(192.168.124.201:3306)
Fri Dec 18 02:26:21 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 02:26:21 2020 - [info] Checking slave configurations..
Fri Dec 18 02:26:21 2020 - [info] read_only=1 is not set on slave 192.168.124.202(192.168.124.202:3306).
Fri Dec 18 02:26:21 2020 - [info] read_only=1 is not set on slave 192.168.124.203(192.168.124.203:3306).
Fri Dec 18 02:26:21 2020 - [info] Checking replication filtering settings..
Fri Dec 18 02:26:21 2020 - [info] Replication filtering check ok.
Fri Dec 18 02:26:21 2020 - [info] Master is down!
Fri Dec 18 02:26:21 2020 - [info] Terminating monitoring script.
Fri Dec 18 02:26:23 2020 - [info] The latest binary log file/position on all slaves is log-bin.000017:101635431 诊断到最新的binlog
Fri Dec 18 02:26:23 2020 - [info] Retrieved Gtid Set: 6c26c068-a952-11ea-ac80-000c29407201:5-259, 诊断到最新的gtid值
6c26c068-a952-11ea-ac80-000c29407203:1994-2110
Fri Dec 18 02:26:23 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Fri Dec 18 02:26:23 2020 - [info] 192.168.124.202(192.168.124.202:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled 最新的Slave 即 202
Fri Dec 18 02:26:23 2020 - [info] GTID ON
Fri Dec 18 02:26:23 2020 - [info] Replicating from 192.168.124.201(192.168.124.201:3306)
Fri Dec 18 02:26:23 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 02:26:23 2020 - [info] 192.168.124.203(192.168.124.203:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 02:26:23 2020 - [info] GTID ON
Fri Dec 18 02:26:23 2020 - [info] Replicating from 192.168.124.201(192.168.124.201:3306)
Fri Dec 18 02:26:23 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 02:26:23 2020 - [info] The oldest binary log file/position on all slaves is log-bin.000017:101635431
Fri Dec 18 02:26:23 2020 - [info] Retrieved Gtid Set: 6c26c068-a952-11ea-ac80-000c29407201:5-259,
6c26c068-a952-11ea-ac80-000c29407203:1994-2110
Fri Dec 18 02:26:23 2020 - [info] Oldest slaves:
Fri Dec 18 02:26:23 2020 - [info] 192.168.124.202(192.168.124.202:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 02:26:23 2020 - [info] GTID ON
Fri Dec 18 02:26:23 2020 - [info] Replicating from 192.168.124.201(192.168.124.201:3306)
Fri Dec 18 02:26:23 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 02:26:23 2020 - [info] 192.168.124.203(192.168.124.203:3306) Version=5.7.26-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 02:26:23 2020 - [info] GTID ON
Fri Dec 18 02:26:23 2020 - [info] Replicating from 192.168.124.201(192.168.124.201:3306)
Fri Dec 18 02:26:23 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 02:26:23 2020 - [info]
Fri Dec 18 02:26:23 2020 - [info] Searching from candidate_master slaves which have received the latest relay log events.. 找到最新的可参与选择Master的Slave 为202
Fri Dec 18 02:26:23 2020 - [info] New master is 192.168.124.202(192.168.124.202:3306)
Fri Dec 18 02:26:23 2020 - [info] Starting master failover..
Fri Dec 18 02:26:23 2020 - [info]
From: 从现在的Master201 切换到新Master202
192.168.124.201(192.168.124.201:3306) (current master)
+--192.168.124.202(192.168.124.202:3306)
+--192.168.124.203(192.168.124.203:3306)
To:
192.168.124.202(192.168.124.202:3306) (new master)
+--192.168.124.203(192.168.124.203:3306)
Fri Dec 18 02:26:23 2020 - [info]
Fri Dec 18 02:26:23 2020 - [info] * Phase 3.3: New Master Recovery Phase..
Fri Dec 18 02:26:23 2020 - [info]
Fri Dec 18 02:26:23 2020 - [info] Waiting all logs to be applied..
Fri Dec 18 02:26:23 2020 - [info] done.
Fri Dec 18 02:26:24 2020 - [info] Getting new master's binlog name and position..
Fri Dec 18 02:26:24 2020 - [info] log-bin.000021:161675086 找到新Master的logfile 和 log position
Fri Dec 18 02:26:24 2020 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.124.202', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; 所有的Slave重做change master操作,指向新的master
Fri Dec 18 02:26:24 2020 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: log-bin.000021, 161675086, 6c26c068-a952-11ea-ac80-000c29407201:1-259,
6c26c068-a952-11ea-ac80-000c29407202:1-623,
6c26c068-a952-11ea-ac80-000c29407203:1-2110
Fri Dec 18 02:26:24 2020 - [info] Executing master IP activate script:
Fri Dec 18 02:26:24 2020 - [info] /etc/mha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.124.201 --orig_master_ip=192.168.124.201 --orig_master_port=3306 --new_master_host=192.168.124.202 --new_master_ip=192.168.124.202 --new_master_port=3306 --new_master_user='ha_monitor' --new_master_password=xxx
IN SCRIPT TEST====/sbin/ifconfig ens33:3 down==/sbin/ifconfig ens33:3 192.168.124.205 netmask 255.255.255.255 ;/sbin/arping -I ens33 -c 3 -s 192.168.124.205 192.168.124.1 >/dev/null 2>&1===
The new master had set read_only=0.
Enabling the VIP - 192.168.124.205 on the new master - 192.168.124.202 启用VIP
Fri Dec 18 02:26:30 2020 - [info] OK.
Fri Dec 18 02:26:30 2020 - [info] ** Finished master recovery successfully. 哈哈 Master恢复完成
Fri Dec 18 02:26:30 2020 - [info] * Phase 3: Master Recovery Phase completed.
Fri Dec 18 02:26:30 2020 - [info]
Fri Dec 18 02:26:30 2020 - [info] * Phase 4: Slaves Recovery Phase..
Fri Dec 18 02:26:30 2020 - [info]
Fri Dec 18 02:26:30 2020 - [info]
Fri Dec 18 02:26:30 2020 - [info] * Phase 4.1: Starting Slaves in parallel.. 开始了Slave的批量恢复
Fri Dec 18 02:26:30 2020 - [info]
Fri Dec 18 02:26:30 2020 - [info] -- Slave recovery on host 192.168.124.203(192.168.124.203:3306) started, pid: 39423. Check tmp log /home/mha//192.168.124.203_3306_20201218022621.log if it takes time.. Slave203开始恢复
Fri Dec 18 02:26:57 2020 - [info]
Fri Dec 18 02:26:57 2020 - [info] Log messages from 192.168.124.203 ...
Fri Dec 18 02:26:57 2020 - [info]
Fri Dec 18 02:26:30 2020 - [info] Resetting slave 192.168.124.203(192.168.124.203:3306) and starting replication from the new master 192.168.124.202(192.168.124.202:3306).. 重新把203的repl指向202
Fri Dec 18 02:26:31 2020 - [info] Executed CHANGE MASTER.
Fri Dec 18 02:26:31 2020 - [info] Slave started.
Fri Dec 18 02:26:56 2020 - [info] gtid_wait(6c26c068-a952-11ea-ac80-000c29407201:1-259,
6c26c068-a952-11ea-ac80-000c29407202:1-623,
6c26c068-a952-11ea-ac80-000c29407203:1-2110) completed on 192.168.124.203(192.168.124.203:3306). Executed 13 events. Slave203执行了 13个新的events事件
Fri Dec 18 02:26:57 2020 - [info] -- Slave on host 192.168.124.203(192.168.124.203:3306) started.
Fri Dec 18 02:26:57 2020 - [info] All new slave servers recovered successfully. 所有的Slave恢复完毕
Fri Dec 18 02:26:57 2020 - [info]
Fri Dec 18 02:26:57 2020 - [info] * Phase 5: New master cleanup phase..
Fri Dec 18 02:26:57 2020 - [info]
Fri Dec 18 02:26:57 2020 - [info] Resetting slave info on the new master..
Fri Dec 18 02:26:58 2020 - [info] 192.168.124.202: Resetting slave info succeeded. 为新的Master重新配置Slave
Fri Dec 18 02:26:58 2020 - [info] Master failover to 192.168.124.202(192.168.124.202:3306) completed successfully. 故障切换已完成
Fri Dec 18 02:26:58 2020 - [info] Deleted server1 entry from /etc/mha/app1.conf . 从配置文件中删除原先的Master信息
Fri Dec 18 02:26:58 2020 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.124.201(192.168.124.201:3306) to 192.168.124.202(192.168.124.202:3306) succeeded 故障切换已ok
Master 192.168.124.201(192.168.124.201:3306) is down! 原Master201 down
Check MHA Manager logs at MHA-Manager:/home/mha/manager.log for details. 可以从该位置找到故障切换的所有日志
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.124.201(192.168.124.201:3306)
Selected 192.168.124.202(192.168.124.202:3306) as a new master.
192.168.124.202(192.168.124.202:3306): OK: Applying all logs succeeded. 202已应用完所有的日志
192.168.124.202(192.168.124.202:3306): OK: Activated master IP address. 202启用vip
192.168.124.203(192.168.124.203:3306): OK: Slave started, replicating from 192.168.124.202(192.168.124.202:3306) Slave203已启动并已指向新的Master 202
192.168.124.202(192.168.124.202:3306): Resetting slave info succeeded.
Master failover to 192.168.124.202(192.168.124.202:3306) completed successfully. 圆满完成
~
创建时间:2021-12-22 11:28
넶浏览量:0