2012年9月20日木曜日

mysql-mhaを入れてみた! その1

mysql-mhaを入れてみた!

お品書き

・mysql-mhaのインストール


・mysql-mhaの設定

・mysql-mhaの動作確認
- 全部一個ですませる
  監視ノード→ServerA
  DBmaster  →ServerA
  DBslave   →ServerA
- 監視とDBmasterを分ける
  監視ノード→ServerB
  DBmaster  →ServerA
  DBslave   →ServerB

---------------------------------------------------------------------------------------------------

まず

mha4mysql-manager-0.53をダウンロードして、しかるべき場所(/usr/local/srcとか)に展開した後

インストール開始

# perl Makeperl.PL
Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Can.pm line 6.
BEGIN failed--compilation aborted at inc/Module/Install/Can.pm line 6.
Compilation failed in require at inc/Module/Install.pm line 307.
Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Makefile.pm line 4.
BEGIN failed--compilation aborted at inc/Module/Install/Makefile.pm line 4.
Compilation failed in require at inc/Module/Install.pm line 307.
Can't locate ExtUtils/MM_Unix.pm in @INC (@INC contains:   inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Metadata.pm line 316.
■perlのMM系モジュールがたりない?

# yum install perl-ExtUtils-MakeMaker.x86_64
================================================================================
 Package                    Arch      Version                     Repository
                                                                           Size
================================================================================
Installing:
 perl-ExtUtils-MakeMaker    x86_64    6.55-119.el6_1.1            base    290 k
Installing for dependencies:
 perl-ExtUtils-ParseXS      x86_64    1:2.2003.0-119.el6_1.1      base     42 k
 perl-Test-Harness          x86_64    3.17-119.el6_1.1            base    229 k
 perl-devel                 x86_64    4:5.10.1-119.el6_1.1        base    420 k
Transaction Summary
================================================================================
Install       4 Package(s)
Total download size: 981 k
Installed size: 2.8 M
Is this ok [y/N]: y
------------------------------------------------------------------------------------------------------------------------
[root@MyMasterHost mha4mysql-manager-0.53]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
Can't locate CPAN.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/AutoInstall.pm line 279.
■む、CPANモジュール?

yum install perl-CPAN.x86_64
================================================================================
 Package               Arch         Version                    Repository  Size
================================================================================
Installing:
 perl-CPAN             x86_64       1.9402-119.el6_1.1         base       244 k
Installing for dependencies:
 perl-Digest-SHA       x86_64       1:5.47-119.el6_1.1         base        61 k
Transaction Summary
================================================================================
Install       2 Package(s)
Total download size: 305 k
Installed size: 748 k
Is this ok [y/N]: y

------------------------------------------------------------------------------------------------------------------------
[root@MyMasterHost mha4mysql-manager-0.53]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.609)
- DBD::mysql            ...loaded. (4.013)
- Time::HiRes           ...loaded. (1.9721)
- Config::Tiny          ...loaded. (2.12)
- Log::Dispatch         ...loaded. (2.26)
- Parallel::ForkManager ...loaded. (0.7.9)
- MHA::NodeConst        ...missing.
==> Auto-install the 1 mandatory module(s) from CPAN? [y]
■あ、先にNodeからインストールするのか。。。

------------------------------------------------------------------------------------------------------------------------
[root@MyMasterHost mha4mysql-node-0.53]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI        ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::node
[root@MyMasterHost mha4mysql-node-0.53]# make
cp lib/MHA/NodeUtil.pm blib/lib/MHA/NodeUtil.pm
cp lib/MHA/BinlogManager.pm blib/lib/MHA/BinlogManager.pm
cp lib/MHA/BinlogPosFinderElp.pm blib/lib/MHA/BinlogPosFinderElp.pm
cp lib/MHA/SlaveUtil.pm blib/lib/MHA/SlaveUtil.pm
cp lib/MHA/NodeConst.pm blib/lib/MHA/NodeConst.pm
cp lib/MHA/BinlogPosFindManager.pm blib/lib/MHA/BinlogPosFindManager.pm
cp lib/MHA/BinlogPosFinderXid.pm blib/lib/MHA/BinlogPosFinderXid.pm
cp lib/MHA/BinlogHeaderParser.pm blib/lib/MHA/BinlogHeaderParser.pm
cp lib/MHA/BinlogPosFinder.pm blib/lib/MHA/BinlogPosFinder.pm
cp bin/filter_mysqlbinlog blib/script/filter_mysqlbinlog
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/filter_mysqlbinlog
cp bin/apply_diff_relay_logs blib/script/apply_diff_relay_logs
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/apply_diff_relay_logs
cp bin/purge_relay_logs blib/script/purge_relay_logs
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/purge_relay_logs
cp bin/save_binary_logs blib/script/save_binary_logs
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/save_binary_logs
Manifying blib/man1/filter_mysqlbinlog.1
Manifying blib/man1/apply_diff_relay_logs.1
Manifying blib/man1/purge_relay_logs.1
Manifying blib/man1/save_binary_logs.1
[root@MyMasterHost mha4mysql-node-0.53]# make install
Installing /usr/local/share/perl5/MHA/SlaveUtil.pm
Installing /usr/local/share/perl5/MHA/NodeConst.pm
Installing /usr/local/share/perl5/MHA/NodeUtil.pm
Installing /usr/local/share/perl5/MHA/BinlogManager.pm
Installing /usr/local/share/perl5/MHA/BinlogHeaderParser.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFindManager.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFinderElp.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFinder.pm
Installing /usr/local/share/perl5/MHA/BinlogPosFinderXid.pm
Installing /usr/local/share/man/man1/purge_relay_logs.1
Installing /usr/local/share/man/man1/apply_diff_relay_logs.1
Installing /usr/local/share/man/man1/save_binary_logs.1
Installing /usr/local/share/man/man1/filter_mysqlbinlog.1
Installing /usr/local/bin/apply_diff_relay_logs
Installing /usr/local/bin/save_binary_logs
Installing /usr/local/bin/filter_mysqlbinlog
Installing /usr/local/bin/purge_relay_logs
Appending installation info to /usr/lib64/perl5/perllocal.pod
■うむ。すんなり。

------------------------------------------------------------------------------------------------------------------------

[root@MyMasterHost mha4mysql-manager-0.53]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI                   ...loaded. (1.609)
- DBD::mysql            ...loaded. (4.013)
- Time::HiRes           ...loaded. (1.9721)
- Config::Tiny          ...loaded. (2.12)
- Log::Dispatch         ...loaded. (2.26)
- Parallel::ForkManager ...loaded. (0.7.9)
- MHA::NodeConst        ...loaded. (0.53)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::manager
[root@MyMasterHost mha4mysql-manager-0.53]# make
cp lib/MHA/ManagerUtil.pm blib/lib/MHA/ManagerUtil.pm
cp lib/MHA/Config.pm blib/lib/MHA/Config.pm
cp lib/MHA/HealthCheck.pm blib/lib/MHA/HealthCheck.pm
cp lib/MHA/ServerManager.pm blib/lib/MHA/ServerManager.pm
cp lib/MHA/ManagerConst.pm blib/lib/MHA/ManagerConst.pm
cp lib/MHA/ManagerAdmin.pm blib/lib/MHA/ManagerAdmin.pm
cp lib/MHA/FileStatus.pm blib/lib/MHA/FileStatus.pm
cp lib/MHA/ManagerAdminWrapper.pm blib/lib/MHA/ManagerAdminWrapper.pm
cp lib/MHA/MasterFailover.pm blib/lib/MHA/MasterFailover.pm
cp lib/MHA/MasterMonitor.pm blib/lib/MHA/MasterMonitor.pm
cp lib/MHA/MasterRotate.pm blib/lib/MHA/MasterRotate.pm
cp lib/MHA/Server.pm blib/lib/MHA/Server.pm
cp lib/MHA/SSHCheck.pm blib/lib/MHA/SSHCheck.pm
cp lib/MHA/DBHelper.pm blib/lib/MHA/DBHelper.pm
cp bin/masterha_stop blib/script/masterha_stop
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_stop
cp bin/masterha_conf_host blib/script/masterha_conf_host
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_conf_host
cp bin/masterha_check_repl blib/script/masterha_check_repl
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_repl
cp bin/masterha_check_status blib/script/masterha_check_status
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_status
cp bin/masterha_master_monitor blib/script/masterha_master_monitor
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_monitor
cp bin/masterha_check_ssh blib/script/masterha_check_ssh
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_ssh
cp bin/masterha_master_switch blib/script/masterha_master_switch
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_switch
cp bin/masterha_secondary_check blib/script/masterha_secondary_check
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_secondary_check
cp bin/masterha_manager blib/script/masterha_manager
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_manager
Manifying blib/man1/masterha_stop.1
Manifying blib/man1/masterha_conf_host.1
Manifying blib/man1/masterha_check_repl.1
Manifying blib/man1/masterha_check_status.1
Manifying blib/man1/masterha_master_monitor.1
Manifying blib/man1/masterha_check_ssh.1
Manifying blib/man1/masterha_master_switch.1
Manifying blib/man1/masterha_secondary_check.1
Manifying blib/man1/masterha_manager.1
[root@MyMasterHost mha4mysql-manager-0.53]# make install
Installing /usr/local/share/perl5/MHA/MasterRotate.pm
Installing /usr/local/share/perl5/MHA/MasterMonitor.pm
Installing /usr/local/share/perl5/MHA/ServerManager.pm
Installing /usr/local/share/perl5/MHA/ManagerConst.pm
Installing /usr/local/share/perl5/MHA/FileStatus.pm
Installing /usr/local/share/perl5/MHA/SSHCheck.pm
Installing /usr/local/share/perl5/MHA/Config.pm
Installing /usr/local/share/perl5/MHA/Server.pm
Installing /usr/local/share/perl5/MHA/MasterFailover.pm
Installing /usr/local/share/perl5/MHA/DBHelper.pm
Installing /usr/local/share/perl5/MHA/HealthCheck.pm
Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm
Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm
Installing /usr/local/share/perl5/MHA/ManagerUtil.pm
Installing /usr/local/share/man/man1/masterha_check_status.1
Installing /usr/local/share/man/man1/masterha_conf_host.1
Installing /usr/local/share/man/man1/masterha_master_monitor.1
Installing /usr/local/share/man/man1/masterha_check_ssh.1
Installing /usr/local/share/man/man1/masterha_secondary_check.1
Installing /usr/local/share/man/man1/masterha_check_repl.1
Installing /usr/local/share/man/man1/masterha_stop.1
Installing /usr/local/share/man/man1/masterha_master_switch.1
Installing /usr/local/share/man/man1/masterha_manager.1
Installing /usr/local/bin/masterha_master_monitor
Installing /usr/local/bin/masterha_conf_host
Installing /usr/local/bin/masterha_check_ssh
Installing /usr/local/bin/masterha_master_switch
Installing /usr/local/bin/masterha_check_repl
Installing /usr/local/bin/masterha_stop
Installing /usr/local/bin/masterha_check_status
Installing /usr/local/bin/masterha_manager
Installing /usr/local/bin/masterha_secondary_check
Appending installation info to /usr/lib64/perl5/perllocal.pod
■うむうむ。いいかんじ。

------------------------------------------------------------------------------------------------------------------------

[root@MyMasterHost mha4mysql-manager-0.53]# masterha_check_ssh --conf=/etc/app1.cnf
Mon Jun 25 13:06:19 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 25 13:06:19 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Jun 25 13:06:19 2012 - [info] Reading server configurations from /etc/app1.cnf..
Mon Jun 25 13:06:19 2012 - [info] Starting SSH connection tests..
Mon Jun 25 13:06:20 2012 - [debug]
Mon Jun 25 13:06:19 2012 - [debug]  Connecting via SSH from root@127.0.0.1(127.0.0.1:22) to root@127.0.0.1(127.0.0.1:22)..
Warning: Permanently added '127.0.0.1' (RSA) to the list of known hosts.
Mon Jun 25 13:06:19 2012 - [debug]   ok.
Mon Jun 25 13:06:20 2012 - [debug]
Mon Jun 25 13:06:20 2012 - [debug]  Connecting via SSH from root@127.0.0.1(127.0.0.1:22) to root@127.0.0.1(127.0.0.1:22)..
Mon Jun 25 13:06:20 2012 - [debug]   ok.
Mon Jun 25 13:06:20 2012 - [info] All SSH connection tests passed successfully.
■事前にSSHのNonpass設定と、レプリケーションは構築済み
  Master -> localhost:3306
  Slave1 -> localhost:5002
  Slave2 -> localhost:5003

------------------------------------------------------------------------------------------------------------------------

[root@MyMasterHost mha4mysql-manager-0.53]# masterha_check_repl --conf=/etc/app1.cnf
Mon Jun 25 14:31:52 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 25 14:31:52 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Jun 25 14:31:52 2012 - [info] Reading server configurations from /etc/app1.cnf..
Mon Jun 25 14:31:52 2012 - [info] MHA::MasterMonitor version 0.53.
Mon Jun 25 14:31:52 2012 - [error][/usr/local/share/perl5/MHA/Server.pm, ln277] Checking slave status failed on 127.0.0.1(127.0.0.1:5002). err=Got error when executing SHOW SLAVE STATUS. Access denied; you need (at least one of) the SUPER,REPLICATION CLIENT privilege(s) for this operation
Mon Jun 25 14:31:52 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations.  at /usr/local/share/perl5/MHA/ServerManager.pm line 269
Mon Jun 25 14:31:52 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.
Mon Jun 25 14:31:52 2012 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
■ん・・?なんか権限がおかしいの?
 app1.cnfのhostname指定を127.0.0.1→mogemasuに直してみる。

------------------------------------------------------------------------------------------------------------------------
[root@MyMasterHost mha4mysql-manager-0.53]# masterha_check_repl --conf=/etc/app1.cnf
Mon Jun 25 14:35:39 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 25 14:35:39 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Mon Jun 25 14:35:39 2012 - [info] Reading server configurations from /etc/app1.cnf..
Mon Jun 25 14:35:39 2012 - [info] MHA::MasterMonitor version 0.53.
Mon Jun 25 14:35:40 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln255] Got MySQL error when connecting MyMasterHost(49.212.145.136:5002) :1045:Access denied for user 'repl'@'MyMasterHost' (using password: YES), but this is not mysql crash. Check MySQL server settings.
 at /usr/local/share/perl5/MHA/ServerManager.pm line 251
Mon Jun 25 14:35:40 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln255] Got MySQL error when connecting MyMasterHost(49.212.145.136:5003) :1045:Access denied for user 'repl'@'MyMasterHost' (using password: YES), but this is not mysql crash. Check MySQL server settings.
 at /usr/local/share/perl5/MHA/ServerManager.pm line 251
Mon Jun 25 14:35:40 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln263] Got fatal error, stopping operations
Mon Jun 25 14:35:40 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations.  at /usr/local/share/perl5/MHA/MasterMonitor.pm line 298
Mon Jun 25 14:35:40 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.
Mon Jun 25 14:35:40 2012 - [info] Got exit code 1 (Not master dead).
■権限を与えたはずなのになぜ

------------------------------------------------------------------------------------------------------------------------

Tue Jun 26 12:53:52 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 12:53:52 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 12:53:52 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 12:53:52 2012 - [info] MHA::MasterMonitor version 0.53.
Tue Jun 26 12:53:53 2012 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln622] Master 127.0.0.1:3306 from which slave 49.212.145.136(49.212.145.136:5002) replicates is not defined in the configuration file!
Tue Jun 26 12:53:53 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations.  at /usr/local/share/perl5/MHA/MasterMonitor.pm line 298
Tue Jun 26 12:53:53 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.
Tue Jun 26 12:53:53 2012 - [info] Got exit code 1 (Not master dead).
■メッセージが退化した・・・なんぞ。。。

------------------------------------------------------------------------------------------------------------------------
-- 五 年 後 --

今後もごちゃごちゃconfをいじる事数時間・・・
なるほど・・・
/etc/app1.cnfの書き方が間違っていた。
[server default]=[server1] のマスターDBの事かと思ったら全然違った。はずかしい(´・ω・)

[root@MyMasterHost ~]# more /etc/app1.cnf
[server default]
# mysql user and password
user=repl
password=repl
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=127.0.0.1
[server2]
hostname=127.0.0.1
port=5002
[server3]
hostname=127.0.0.1
port=5003
■こんな感じで書いてやる必要があった。

[root@MyMasterHost ~]# masterha_check_repl --conf=/etc/app1.cnf
Tue Jun 26 14:04:16 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 14:04:16 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 14:04:16 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 14:04:16 2012 - [info] MHA::MasterMonitor version 0.53.
Tue Jun 26 14:04:17 2012 - [info] Dead Servers:
Tue Jun 26 14:04:17 2012 - [info] Alive Servers:
Tue Jun 26 14:04:17 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:04:17 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:04:17 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 14:04:17 2012 - [info] Alive Slaves:
Tue Jun 26 14:04:17 2012 - [info]   127.0.0.1(127.0.0.1:5002)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:04:17 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:04:17 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:04:17 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:04:17 2012 - [info] Current Alive Master: 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:04:17 2012 - [info] Checking slave configurations..
Tue Jun 26 14:04:17 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5002).
Tue Jun 26 14:04:17 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5003).
Tue Jun 26 14:04:17 2012 - [info] Checking replication filtering settings..
Tue Jun 26 14:04:17 2012 - [info]  binlog_do_db= , binlog_ignore_db=
Tue Jun 26 14:04:17 2012 - [info]  Replication filtering check ok.
Tue Jun 26 14:04:17 2012 - [info] Starting SSH connection tests..
Tue Jun 26 14:04:18 2012 - [info] All SSH connection tests passed successfully.
Tue Jun 26 14:04:18 2012 - [info] Checking MHA Node version..
Tue Jun 26 14:04:19 2012 - [info]  Version check ok.
Tue Jun 26 14:04:19 2012 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jun 26 14:04:19 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 14:04:19 2012 - [info] Master MHA Node version is 0.53.
Tue Jun 26 14:04:19 2012 - [info] Checking recovery script configurations on the current master..
Tue Jun 26 14:04:19 2012 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000014
Tue Jun 26 14:04:19 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1)..
Failed to save binary log: Binlog not found from /var/lib/mysql,/var/log/mysql! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.
 at /usr/local/bin/save_binary_logs line 117
        eval {...} called at /usr/local/bin/save_binary_logs line 66
        main::main() called at /usr/local/bin/save_binary_logs line 62
Tue Jun 26 14:04:19 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln152] Master setting check failed!
Tue Jun 26 14:04:19 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln364] Master configuration failed.
Tue Jun 26 14:04:19 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations.  at /usr/local/bin/masterha_check_repl line 48
Tue Jun 26 14:04:19 2012 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.
Tue Jun 26 14:04:19 2012 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
■エラーはでてるけど、凄い進歩!!やるきでた!!!
  please set "master_binlog_dir=/path/to/binlog_directory_of_the_master"
  と言われているので、app1.cnfに master_binlog_dir=/var/lib/mysql_s1/data"を追記(デフォルトと変えている場合は、confで明示的に指定してやる必要があったんですねぇ)

------------------------------------------------------------------------------------------------------------------------
[root@MyMasterHost ~]# masterha_check_repl --conf=/etc/app1.cnf
Tue Jun 26 14:19:40 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 14:19:40 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 14:19:40 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 14:19:40 2012 - [info] MHA::MasterMonitor version 0.53.
Tue Jun 26 14:19:40 2012 - [info] Dead Servers:
Tue Jun 26 14:19:40 2012 - [info] Alive Servers:
Tue Jun 26 14:19:40 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:19:40 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:19:40 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 14:19:40 2012 - [info] Alive Slaves:
Tue Jun 26 14:19:40 2012 - [info]   127.0.0.1(127.0.0.1:5002)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:19:40 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:19:40 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:19:40 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:19:40 2012 - [info] Current Alive Master: 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:19:40 2012 - [info] Checking slave configurations..
Tue Jun 26 14:19:40 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5002).
Tue Jun 26 14:19:40 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5003).
Tue Jun 26 14:19:40 2012 - [info] Checking replication filtering settings..
Tue Jun 26 14:19:40 2012 - [info]  binlog_do_db= , binlog_ignore_db=
Tue Jun 26 14:19:40 2012 - [info]  Replication filtering check ok.
Tue Jun 26 14:19:40 2012 - [info] Starting SSH connection tests..
Tue Jun 26 14:19:42 2012 - [info] All SSH connection tests passed successfully.
Tue Jun 26 14:19:42 2012 - [info] Checking MHA Node version..
Tue Jun 26 14:19:42 2012 - [info]  Version check ok.
Tue Jun 26 14:19:42 2012 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jun 26 14:19:42 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 14:19:43 2012 - [info] Master MHA Node version is 0.53.
Tue Jun 26 14:19:43 2012 - [info] Checking recovery script configurations on the current master..
Tue Jun 26 14:19:43 2012 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql_s1/data --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000018
Tue Jun 26 14:19:43 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1)..
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql_s1/data, up to mysql-bin.000018
Tue Jun 26 14:19:43 2012 - [info] Master setting check done.
Tue Jun 26 14:19:43 2012 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Jun 26 14:19:43 2012 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1 --slave_port=5002 --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --manager_version=0.53 --relay_log_info=/var/lib/mysql_s2/data/relay-log.info  --relay_dir=/var/lib/mysql_s2/data/  --slave_pass=xxx
Tue Jun 26 14:19:43 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql_s2/data/relay-log.info ... ok.
    Relay log found at /var/lib/mysql_s2/data, up to mysql-relay-bin.000022
    Temporary relay log file is /var/lib/mysql_s2/data/mysql-relay-bin.000022
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Jun 26 14:19:43 2012 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1 --slave_port=5003 --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --manager_version=0.53 --relay_log_info=/var/lib/mysql_s3/data/relay-log.info  --relay_dir=/var/lib/mysql_s3/data/  --slave_pass=xxx
Tue Jun 26 14:19:43 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql_s3/data/relay-log.info ... ok.
    Relay log found at /var/lib/mysql_s3/data, up to mysql-relay-bin.000022
    Temporary relay log file is /var/lib/mysql_s3/data/mysql-relay-bin.000022
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Jun 26 14:19:44 2012 - [info] Slaves settings check done.
Tue Jun 26 14:19:44 2012 - [info]
127.0.0.1 (current master)
 +--127.0.0.1
 +--127.0.0.1
Tue Jun 26 14:19:44 2012 - [info] Checking replication health on 127.0.0.1..
Tue Jun 26 14:19:44 2012 - [info]  ok.
Tue Jun 26 14:19:44 2012 - [info] Checking replication health on 127.0.0.1..
Tue Jun 26 14:19:44 2012 - [info]  ok.
Tue Jun 26 14:19:44 2012 - [warning] master_ip_failover_script is not defined.
Tue Jun 26 14:19:44 2012 - [warning] shutdown_script is not defined.
Tue Jun 26 14:19:44 2012 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
■ねんがんの へるすいずおーけーを てにいれたぞ!

------------------------------------------------------------------------------------------------------------------------
# masterha_manager --conf=/etc/app1.cnf
■いってみよー。

------------------------------------------------------------------------------------------------------------------------
# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:11691) is running(0:PING_OK), master:127.0.0.1
■別窓で確認してみたところ。
  走ってるっぽいじゃないか。

------------------------------------------------------------------------------------------------------------------------
# mysqld_multi stop 1
■マスターをぶち落としてみる。
  masterha_manager --conf=/etc/app1.cnf このプロセス叩いてた画面のプロンプトが、以下みたいなログをゲロって戻ってくる。
  どうやら一回切替たらmysqlmhaは停止する仕組みの模様
  勝手に行ったり来たりされても困るもんね。
Tue Jun 26 14:26:30 2012 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Jun 26 14:26:30 2012 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql_s1/data --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --binlog_prefix=mysql-bin
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql_s1/data, up to mysql-bin.000018
Tue Jun 26 14:26:30 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 14:26:33 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 14:26:33 2012 - [warning] Connection failed 1 time(s)..
Tue Jun 26 14:26:36 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 14:26:36 2012 - [warning] Connection failed 2 time(s)..
Tue Jun 26 14:26:39 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 14:26:39 2012 - [warning] Connection failed 3 time(s)..
Tue Jun 26 14:26:39 2012 - [warning] Master is not reachable from health checker!
Tue Jun 26 14:26:39 2012 - [warning] Master 127.0.0.1(127.0.0.1:3306) is not reachable!
Tue Jun 26 14:26:39 2012 - [warning] SSH is reachable.
Tue Jun 26 14:26:39 2012 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Tue Jun 26 14:26:39 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 14:26:39 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 14:26:39 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 14:26:39 2012 - [info] Dead Servers:
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info] Alive Servers:
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 14:26:39 2012 - [info] Alive Slaves:
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5002)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:26:39 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:26:39 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info] Checking slave configurations..
Tue Jun 26 14:26:39 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5002).
Tue Jun 26 14:26:39 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5003).
Tue Jun 26 14:26:39 2012 - [info] Checking replication filtering settings..
Tue Jun 26 14:26:39 2012 - [info]  Replication filtering check ok.
Tue Jun 26 14:26:39 2012 - [info] Master is down!
Tue Jun 26 14:26:39 2012 - [info] Terminating monitoring script.
Tue Jun 26 14:26:39 2012 - [info] Got exit code 20 (Master dead).
Tue Jun 26 14:26:39 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 14:26:39 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 14:26:39 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 14:26:39 2012 - [info] MHA::MasterFailover version 0.53.
Tue Jun 26 14:26:39 2012 - [info] Starting master failover.
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] * Phase 1: Configuration Check Phase..
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] Dead Servers:
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info] Checking master reachability via mysql(double check)..
Tue Jun 26 14:26:39 2012 - [info]  ok.
Tue Jun 26 14:26:39 2012 - [info] Alive Servers:
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 14:26:39 2012 - [info] Alive Slaves:
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5002)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:26:39 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:26:39 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Jun 26 14:26:39 2012 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Tue Jun 26 14:26:39 2012 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Jun 26 14:26:39 2012 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] * Phase 3: Master Recovery Phase..
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] The latest binary log file/position on all slaves is mysql-bin.000018:107
Tue Jun 26 14:26:39 2012 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5002)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:26:39 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:26:39 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info] The oldest binary log file/position on all slaves is mysql-bin.000018:107
Tue Jun 26 14:26:39 2012 - [info] Oldest slaves:
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5002)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:26:39 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:26:39 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Jun 26 14:26:39 2012 - [info]
Tue Jun 26 14:26:39 2012 - [info] Fetching dead master's binary logs..
Tue Jun 26 14:26:39 2012 - [info] Executing command on the dead master 127.0.0.1(127.0.0.1:3306): save_binary_logs --command=save --start_file=mysql-bin.000018  --start_pos=107 --binlog_dir=/var/lib/mysql_s1/data --output_file=/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000018 pos 107 to mysql-bin.000018 EOF into /var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/lib/mysql_s1/data/mysql-bin.000018 position 107 to tail(126).. ok.
 Concat succeeded.
saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog                                                                                                                  100%  126     0.1KB/s   00:00
Tue Jun 26 14:26:40 2012 - [info] scp from root@127.0.0.1:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog succeeded.
Tue Jun 26 14:26:40 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 14:26:40 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] * Phase 3.3: Determining New Master Phase..
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Jun 26 14:26:41 2012 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Jun 26 14:26:41 2012 - [info] Searching new master from slaves..
Tue Jun 26 14:26:41 2012 - [info]  Candidate masters from the configuration file:
Tue Jun 26 14:26:41 2012 - [info]  Non-candidate masters:
Tue Jun 26 14:26:41 2012 - [info] New master is 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:26:41 2012 - [info] Starting master failover..
Tue Jun 26 14:26:41 2012 - [info]
From:
127.0.0.1 (current master)
 +--127.0.0.1
 +--127.0.0.1
To:
127.0.0.1 (new master)
 +--127.0.0.1
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 26 14:26:41 2012 - [info] Sending binlog..
saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog                                                                                                                  100%  126     0.1KB/s   00:00
Tue Jun 26 14:26:41 2012 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog to root@127.0.0.1:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog succeeded.
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Jun 26 14:26:41 2012 - [info] Starting recovery on 127.0.0.1(127.0.0.1:5002)..
Tue Jun 26 14:26:41 2012 - [info]  Generating diffs succeeded.
Tue Jun 26 14:26:41 2012 - [info] Waiting until all relay logs are applied.
Tue Jun 26 14:26:41 2012 - [info]  done.
Tue Jun 26 14:26:41 2012 - [info] Getting slave status..
Tue Jun 26 14:26:41 2012 - [info] This slave(127.0.0.1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000018:107). No need to recover from Exec_Master_Log_Pos.
Tue Jun 26 14:26:41 2012 - [info] Connecting to the target slave host 127.0.0.1, running recover script..
Tue Jun 26 14:26:41 2012 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1  --slave_port=5002 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --timestamp=20120626142639 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Tue Jun 26 14:26:41 2012 - [info]
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog on 127.0.0.1:5002. This may take long time...
Applying log files succeeded.
Tue Jun 26 14:26:41 2012 - [info]  All relay logs were successfully applied.
Tue Jun 26 14:26:41 2012 - [info] Getting new master's binlog name and position..
Tue Jun 26 14:26:41 2012 - [info]  mysql-bin.000016:107
Tue Jun 26 14:26:41 2012 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=5002, MASTER_LOG_FILE='mysql-bin.000016', MASTER_LOG_POS=107, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Jun 26 14:26:41 2012 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Tue Jun 26 14:26:41 2012 - [info] ** Finished master recovery successfully.
Tue Jun 26 14:26:41 2012 - [info] * Phase 3: Master Recovery Phase completed.
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] * Phase 4: Slaves Recovery Phase..
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] -- Slave diff file generation on host 127.0.0.1(127.0.0.1:5003) started, pid: 12078. Check tmp log /var/log/masterha/app1/127.0.0.1_5003_20120626142639.log if it takes time..
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] Log messages from 127.0.0.1 ...
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 26 14:26:41 2012 - [info] End of log messages from 127.0.0.1.
Tue Jun 26 14:26:41 2012 - [info] -- 127.0.0.1(127.0.0.1:5003) has the latest relay log events.
Tue Jun 26 14:26:41 2012 - [info] Generating relay diff files from the latest slave succeeded.
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Jun 26 14:26:41 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] -- Slave recovery on host 127.0.0.1(127.0.0.1:5003) started, pid: 12080. Check tmp log /var/log/masterha/app1/127.0.0.1_5003_20120626142639.log if it takes time..
saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog                                                                                                                  100%  126     0.1KB/s   00:00
Tue Jun 26 14:26:42 2012 - [info]
Tue Jun 26 14:26:42 2012 - [info] Log messages from 127.0.0.1 ...
Tue Jun 26 14:26:42 2012 - [info]
Tue Jun 26 14:26:41 2012 - [info] Sending binlog..
Tue Jun 26 14:26:41 2012 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog to root@127.0.0.1:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog succeeded.
Tue Jun 26 14:26:41 2012 - [info] Starting recovery on 127.0.0.1(127.0.0.1:5003)..
Tue Jun 26 14:26:41 2012 - [info]  Generating diffs succeeded.
Tue Jun 26 14:26:41 2012 - [info] Waiting until all relay logs are applied.
Tue Jun 26 14:26:41 2012 - [info]  done.
Tue Jun 26 14:26:41 2012 - [info] Getting slave status..
Tue Jun 26 14:26:41 2012 - [info] This slave(127.0.0.1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000018:107). No need to recover from Exec_Master_Log_Pos.
Tue Jun 26 14:26:41 2012 - [info] Connecting to the target slave host 127.0.0.1, running recover script..
Tue Jun 26 14:26:41 2012 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1  --slave_port=5003 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --timestamp=20120626142639 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Tue Jun 26 14:26:42 2012 - [info]
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_3306_20120626142639.binlog on 127.0.0.1:5003. This may take long time...
Applying log files succeeded.
Tue Jun 26 14:26:42 2012 - [info]  All relay logs were successfully applied.
Tue Jun 26 14:26:42 2012 - [info]  Resetting slave 127.0.0.1(127.0.0.1:5003) and starting replication from the new master 127.0.0.1(127.0.0.1:5002)..
Tue Jun 26 14:26:42 2012 - [info]  Executed CHANGE MASTER.
Tue Jun 26 14:26:42 2012 - [info]  Slave started.
Tue Jun 26 14:26:42 2012 - [info] End of log messages from 127.0.0.1.
Tue Jun 26 14:26:42 2012 - [info] -- Slave recovery on host 127.0.0.1(127.0.0.1:5003) succeeded.
Tue Jun 26 14:26:42 2012 - [info] All new slave servers recovered successfully.
Tue Jun 26 14:26:42 2012 - [info]
Tue Jun 26 14:26:42 2012 - [info] * Phase 5: New master cleanup phease..
Tue Jun 26 14:26:42 2012 - [info]
Tue Jun 26 14:26:42 2012 - [info] Resetting slave info on the new master..
Tue Jun 26 14:26:42 2012 - [info]  127.0.0.1: Resetting slave info succeeded.
Tue Jun 26 14:26:42 2012 - [info] Master failover to 127.0.0.1(127.0.0.1:5002) completed successfully.
Tue Jun 26 14:26:42 2012 - [info]
----- Failover Report -----
app1: MySQL Master failover 127.0.0.1 to 127.0.0.1 succeeded
Master 127.0.0.1 is down!
Check MHA Manager logs at MyMasterHost for details.
Started automated(non-interactive) failover.
The latest slave 127.0.0.1(127.0.0.1:5002) has all relay logs for recovery.
Selected 127.0.0.1 as a new master.
127.0.0.1: OK: Applying all logs succeeded.
127.0.0.1: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
127.0.0.1: OK: Applying all logs succeeded. Slave started, replicating from 127.0.0.1.
127.0.0.1: Resetting slave info succeeded.
Master failover to 127.0.0.1(127.0.0.1:5002) completed successfully.

------------------------------------------------------------------------------------------------------------------------

マスターが切り替わった先の5002ポートのmysqlのステータスを確認してみるる。

# mysql -uroot -p -h127.0.0.1 -P5002
mysql> show slave status\G
Empty set (0.00 sec)
■なんとっ。

mysql> show master status\G
*************************** 1. row ***************************
            File: mysql-bin.000016
        Position: 107
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
■なんとっ。

mysql> show master status\G
*************************** 1. row ***************************
            File: mysql-bin.000016
        Position: 107
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
■ふむ。
------------------------------------------------------------------------------------------------------------------------

5003ポートのmysqlのレプリケーション元は5002になっているかな?

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 5002
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000016
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000016
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 409
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 2
1 row in set (0.00 sec)
■おおおおお。"Master_Port: 5002" 確認ができた。

------------------------------------------------------------------------------------------------------------------------

という事で、復旧作業を行ってみようと思う。
Master:127.0.0.1:5002 で引き続き
Slave1:127.0.0.1:5003
Slave2:127.0.0.1:3306 旧ますたー
のような感じにするため、change master to でSlave2をMasterにガシーン

mysql> CHANGE MASTER TO MASTER_HOST = 'localhost',
    -> MASTER_USER = 'repl',
    -> MASTER_PASSWORD = 'repl',
    -> MASTER_PORT = 5002,
    -> MASTER_LOG_FILE = 'mysql-bin.000016',
    -> MASTER_LOG_POS = 107;
Query OK, 0 rows affected (0.03 sec) 
mysql> start slave;
Query OK, 0 rows affected (0.00 sec) 
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: localhost
                  Master_User: repl
                  Master_Port: 5002
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000018
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000004
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000018
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 555
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 2
1 row in set (0.00 sec)
■完璧にスレーブが動いて幸せな感じ。

------------------------------------------------------------------------------------------------------------------------
構成を変更したので、app1.cnfも書き換えませう
[server default]
# mysql user and password
user=repl
password=repl
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
master_binlog_dir=/var/lib/mysql_s2/data
#master_binlog_dir=/var/lib/mysql_s1/data
#master_binlog_dir=/var/lib/mysql_s1/data
[server1]                                      ### Master:5002ポートの情報に書き換え
hostname=127.0.0.1
port=5002
[server2]                                      ### Slave1:3306ポートの情報に書き換え
hostname=127.0.0.1

[server3]                                      ### Slave2:5003ポートの情報のまま
hostname=127.0.0.1
port=5003
■次に障害が発生したら、最初のマスターにマスターが引き継がれるというわけよ。
  という事でテスト。

# masterha_check_repl --conf=/etc/app1.cnf
Tue Jun 26 14:47:13 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 14:47:13 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 14:47:13 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 14:47:13 2012 - [info] MHA::MasterMonitor version 0.53.
Tue Jun 26 14:47:13 2012 - [info] Dead Servers:
Tue Jun 26 14:47:13 2012 - [info] Alive Servers:
Tue Jun 26 14:47:13 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:47:13 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:47:13 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 14:47:13 2012 - [info] Alive Slaves:
Tue Jun 26 14:47:13 2012 - [info]   127.0.0.1(127.0.0.1:3306)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:47:13 2012 - [info]     Replicating from localhost(127.0.0.1:5002)
Tue Jun 26 14:47:13 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:47:13 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:47:13 2012 - [info] Current Alive Master: 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:47:13 2012 - [info] Checking slave configurations..
Tue Jun 26 14:47:13 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:3306).
Tue Jun 26 14:47:13 2012 - [warning]  relay_log_purge=0 is not set on slave 127.0.0.1(127.0.0.1:3306).
Tue Jun 26 14:47:13 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5003).
Tue Jun 26 14:47:13 2012 - [info] Checking replication filtering settings..
Tue Jun 26 14:47:13 2012 - [info]  binlog_do_db= , binlog_ignore_db=
Tue Jun 26 14:47:13 2012 - [info]  Replication filtering check ok.
Tue Jun 26 14:47:13 2012 - [info] Starting SSH connection tests..
Tue Jun 26 14:47:15 2012 - [info] All SSH connection tests passed successfully.
Tue Jun 26 14:47:15 2012 - [info] Checking MHA Node version..
Tue Jun 26 14:47:15 2012 - [info]  Version check ok.
Tue Jun 26 14:47:15 2012 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jun 26 14:47:16 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 14:47:16 2012 - [info] Master MHA Node version is 0.53.
Tue Jun 26 14:47:16 2012 - [info] Checking recovery script configurations on the current master..
Tue Jun 26 14:47:16 2012 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql_s2/data --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000018
Tue Jun 26 14:47:16 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1)..
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql_s2/data, up to mysql-bin.000018
Tue Jun 26 14:47:16 2012 - [info] Master setting check done.
Tue Jun 26 14:47:16 2012 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Jun 26 14:47:16 2012 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --manager_version=0.53 --relay_log_info=/var/lib/mysql_s1/data/relay-log.info  --relay_dir=/var/lib/mysql_s1/data/  --slave_pass=xxx
Tue Jun 26 14:47:16 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql_s1/data/relay-log.info ... ok.
    Relay log found at /var/lib/mysql_s1/data, up to mysql-relay-bin.000004
    Temporary relay log file is /var/lib/mysql_s1/data/mysql-relay-bin.000004
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Jun 26 14:47:16 2012 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1 --slave_port=5003 --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --manager_version=0.53 --relay_log_info=/var/lib/mysql_s3/data/relay-log.info  --relay_dir=/var/lib/mysql_s3/data/  --slave_pass=xxx
Tue Jun 26 14:47:16 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql_s3/data/relay-log.info ... ok.
    Relay log found at /var/lib/mysql_s3/data, up to mysql-relay-bin.000008
    Temporary relay log file is /var/lib/mysql_s3/data/mysql-relay-bin.000008
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Jun 26 14:47:17 2012 - [info] Slaves settings check done.
Tue Jun 26 14:47:17 2012 - [info]
127.0.0.1 (current master)
 +--127.0.0.1
 +--127.0.0.1
Tue Jun 26 14:47:17 2012 - [info] Checking replication health on 127.0.0.1..
Tue Jun 26 14:47:17 2012 - [info]  ok.
Tue Jun 26 14:47:17 2012 - [info] Checking replication health on 127.0.0.1..
Tue Jun 26 14:47:17 2012 - [info]  ok.
Tue Jun 26 14:47:17 2012 - [warning] master_ip_failover_script is not defined.
Tue Jun 26 14:47:17 2012 - [warning] shutdown_script is not defined.
Tue Jun 26 14:47:17 2012 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
■チェックも大丈夫っぽい

------------------------------------------------------------------------------------------------------------------------

という事で、
Master:127.0.0.1:5002
Slave1:127.0.0.1:3306
Slave2:127.0.0.1:5002
の構成で、今度はバックグラウンドで動くように、managerを起動させてみるよ。

[root@MyMasterHost ~]# masterha_manager --conf=/etc/app1.cnf &
[1] 12753
[root@MyMasterHost ~]# Tue Jun 26 14:51:16 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 14:51:16 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 14:51:16 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 14:51:16 2012 - [info] MHA::MasterMonitor version 0.53.
Tue Jun 26 14:51:16 2012 - [info] Dead Servers:
Tue Jun 26 14:51:16 2012 - [info] Alive Servers:
Tue Jun 26 14:51:16 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:51:16 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 14:51:16 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 14:51:16 2012 - [info] Alive Slaves:
Tue Jun 26 14:51:16 2012 - [info]   127.0.0.1(127.0.0.1:3306)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:51:16 2012 - [info]     Replicating from localhost(127.0.0.1:5002)
Tue Jun 26 14:51:16 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 14:51:16 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:51:16 2012 - [info] Current Alive Master: 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 14:51:16 2012 - [info] Checking slave configurations..
Tue Jun 26 14:51:16 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:3306).
Tue Jun 26 14:51:16 2012 - [warning]  relay_log_purge=0 is not set on slave 127.0.0.1(127.0.0.1:3306).
Tue Jun 26 14:51:16 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5003).
Tue Jun 26 14:51:16 2012 - [info] Checking replication filtering settings..
Tue Jun 26 14:51:16 2012 - [info]  binlog_do_db= , binlog_ignore_db=
Tue Jun 26 14:51:16 2012 - [info]  Replication filtering check ok.
Tue Jun 26 14:51:16 2012 - [info] Starting SSH connection tests..
Tue Jun 26 14:51:17 2012 - [info] All SSH connection tests passed successfully.
Tue Jun 26 14:51:17 2012 - [info] Checking MHA Node version..
Tue Jun 26 14:51:18 2012 - [info]  Version check ok.
Tue Jun 26 14:51:18 2012 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jun 26 14:51:18 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 14:51:18 2012 - [info] Master MHA Node version is 0.53.
Tue Jun 26 14:51:18 2012 - [info] Checking recovery script configurations on the current master..
Tue Jun 26 14:51:18 2012 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql_s2/data --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000018
Tue Jun 26 14:51:18 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1)..
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql_s2/data, up to mysql-bin.000018
Tue Jun 26 14:51:18 2012 - [info] Master setting check done.
Tue Jun 26 14:51:18 2012 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Jun 26 14:51:18 2012 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --manager_version=0.53 --relay_log_info=/var/lib/mysql_s1/data/relay-log.info  --relay_dir=/var/lib/mysql_s1/data/  --slave_pass=xxx
Tue Jun 26 14:51:18 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql_s1/data/relay-log.info ... ok.
    Relay log found at /var/lib/mysql_s1/data, up to mysql-relay-bin.000004
    Temporary relay log file is /var/lib/mysql_s1/data/mysql-relay-bin.000004
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Jun 26 14:51:19 2012 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1 --slave_port=5003 --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --manager_version=0.53 --relay_log_info=/var/lib/mysql_s3/data/relay-log.info  --relay_dir=/var/lib/mysql_s3/data/  --slave_pass=xxx
Tue Jun 26 14:51:19 2012 - [info]   Connecting to root@127.0.0.1(127.0.0.1:22)..
  Checking slave recovery environment settings..
    Opening /var/lib/mysql_s3/data/relay-log.info ... ok.
    Relay log found at /var/lib/mysql_s3/data, up to mysql-relay-bin.000008
    Temporary relay log file is /var/lib/mysql_s3/data/mysql-relay-bin.000008
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Tue Jun 26 14:51:19 2012 - [info] Slaves settings check done.
Tue Jun 26 14:51:19 2012 - [info]
127.0.0.1 (current master)
 +--127.0.0.1
 +--127.0.0.1
Tue Jun 26 14:51:19 2012 - [warning] master_ip_failover_script is not defined.
Tue Jun 26 14:51:19 2012 - [warning] shutdown_script is not defined.
Tue Jun 26 14:51:19 2012 - [info] Set master ping interval 3 seconds.
Tue Jun 26 14:51:19 2012 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Jun 26 14:51:19 2012 - [info] Starting ping health check on 127.0.0.1(127.0.0.1:5002)..
Tue Jun 26 14:51:19 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
■どうやら上手くいったようです。

------------------------------------------------------------------------------------------------------------------------

ねんのため一回3306さんにますたーを返してみよう。
# mysqld_multi stop 2
Tue Jun 26 14:59:53 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 14:59:53 2012 - [warning] Connection failed 1 time(s)..
Tue Jun 26 14:59:56 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 14:59:56 2012 - [warning] Connection failed 2 time(s)..
Tue Jun 26 14:59:59 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 14:59:59 2012 - [warning] Connection failed 3 time(s)..
Tue Jun 26 14:59:59 2012 - [warning] Master is not reachable from health checker!
Tue Jun 26 14:59:59 2012 - [warning] Master 127.0.0.1(127.0.0.1:5002) is not reachable!
Tue Jun 26 14:59:59 2012 - [warning] SSH is reachable.
Tue Jun 26 14:59:59 2012 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Tue Jun 26 14:59:59 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 14:59:59 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 14:59:59 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 15:00:00 2012 - [info] Dead Servers:
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:00:00 2012 - [info] Alive Servers:
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 15:00:00 2012 - [info] Alive Slaves:
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:3306)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:00:00 2012 - [info]     Replicating from localhost(127.0.0.1:5002)
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:00:00 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:00:00 2012 - [info] Checking slave configurations..
Tue Jun 26 15:00:00 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:3306).
Tue Jun 26 15:00:00 2012 - [warning]  relay_log_purge=0 is not set on slave 127.0.0.1(127.0.0.1:3306).
Tue Jun 26 15:00:00 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5003).
Tue Jun 26 15:00:00 2012 - [info] Checking replication filtering settings..
Tue Jun 26 15:00:00 2012 - [info]  Replication filtering check ok.
Tue Jun 26 15:00:00 2012 - [info] Master is down!
Tue Jun 26 15:00:00 2012 - [info] Terminating monitoring script.
Tue Jun 26 15:00:00 2012 - [info] Got exit code 20 (Master dead).
Tue Jun 26 15:00:00 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 15:00:00 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 15:00:00 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 15:00:00 2012 - [info] MHA::MasterFailover version 0.53.
Tue Jun 26 15:00:00 2012 - [info] Starting master failover.
Tue Jun 26 15:00:00 2012 - [info]
Tue Jun 26 15:00:00 2012 - [info] * Phase 1: Configuration Check Phase..
Tue Jun 26 15:00:00 2012 - [info]
Tue Jun 26 15:00:00 2012 - [info] Dead Servers:
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:00:00 2012 - [info] Checking master reachability via mysql(double check)..
Tue Jun 26 15:00:00 2012 - [info]  ok.
Tue Jun 26 15:00:00 2012 - [info] Alive Servers:
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 15:00:00 2012 - [info] Alive Slaves:
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:3306)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:00:00 2012 - [info]     Replicating from localhost(127.0.0.1:5002)
Tue Jun 26 15:00:00 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:00:00 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:00:00 2012 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln295] Last failover was done at 2012/06/26 14:05:42. Current time is too early to do failover again. If you want to do failover, manually remove /var/log/masterha/app1/app1.failover.complete and run this script again.
Tue Jun 26 15:00:00 2012 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR:  at /usr/local/bin/masterha_manager line 65
■ん・・・?エラーが。
  /var/log/masterha/app1/app1.failover.complete を手動でどかして再キックしる。って?
  念のため確認…

# mysql -uroot -p -h127.0.0.1 -P5003
Enter password:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 5002
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000020
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000012
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000020
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 3077
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 2
1 row in set (0.00 sec)
■5002みてるまんまっぽい

という事で、/var/log/masterha/app1/app1.failover.complete を消して、再度やってみよう。

Tue Jun 26 15:09:28 2012 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Jun 26 15:09:28 2012 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql_s2/data --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.53 --binlog_prefix=mysql-bin
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql_s2/data, up to mysql-bin.000020
Tue Jun 26 15:09:29 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 15:09:31 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 15:09:31 2012 - [warning] Connection failed 1 time(s)..
Tue Jun 26 15:09:34 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 15:09:34 2012 - [warning] Connection failed 2 time(s)..
Tue Jun 26 15:09:37 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 26 15:09:37 2012 - [warning] Connection failed 3 time(s)..
Tue Jun 26 15:09:37 2012 - [warning] Master is not reachable from health checker!
Tue Jun 26 15:09:37 2012 - [warning] Master 127.0.0.1(127.0.0.1:5002) is not reachable!
Tue Jun 26 15:09:37 2012 - [warning] SSH is reachable.
Tue Jun 26 15:09:37 2012 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Tue Jun 26 15:09:37 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 15:09:37 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 15:09:37 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 15:09:37 2012 - [info] Dead Servers:
Tue Jun 26 15:09:37 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:09:37 2012 - [info] Alive Servers:
Tue Jun 26 15:09:37 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 15:09:37 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 15:09:37 2012 - [info] Alive Slaves:
Tue Jun 26 15:09:37 2012 - [info]   127.0.0.1(127.0.0.1:3306)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:09:37 2012 - [info]     Replicating from localhost(127.0.0.1:5002)
Tue Jun 26 15:09:37 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:09:37 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:09:37 2012 - [info] Checking slave configurations..
Tue Jun 26 15:09:37 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:3306).
Tue Jun 26 15:09:37 2012 - [warning]  relay_log_purge=0 is not set on slave 127.0.0.1(127.0.0.1:3306).
Tue Jun 26 15:09:37 2012 - [info]  read_only=1 is not set on slave 127.0.0.1(127.0.0.1:5003).
Tue Jun 26 15:09:37 2012 - [info] Checking replication filtering settings..
Tue Jun 26 15:09:37 2012 - [info]  Replication filtering check ok.
Tue Jun 26 15:09:37 2012 - [info] Master is down!
Tue Jun 26 15:09:37 2012 - [info] Terminating monitoring script.
Tue Jun 26 15:09:37 2012 - [info] Got exit code 20 (Master dead).
Tue Jun 26 15:09:37 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 26 15:09:37 2012 - [info] Reading application default configurations from /etc/app1.cnf..
Tue Jun 26 15:09:37 2012 - [info] Reading server configurations from /etc/app1.cnf..
Tue Jun 26 15:09:37 2012 - [info] MHA::MasterFailover version 0.53.
Tue Jun 26 15:09:37 2012 - [info] Starting master failover.
Tue Jun 26 15:09:37 2012 - [info]
Tue Jun 26 15:09:37 2012 - [info] * Phase 1: Configuration Check Phase..
Tue Jun 26 15:09:37 2012 - [info]
Tue Jun 26 15:09:38 2012 - [info] Dead Servers:
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:09:38 2012 - [info] Checking master reachability via mysql(double check)..
Tue Jun 26 15:09:38 2012 - [info]  ok.
Tue Jun 26 15:09:38 2012 - [info] Alive Servers:
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:3306)
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:5003)
Tue Jun 26 15:09:38 2012 - [info] Alive Slaves:
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:3306)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:09:38 2012 - [info]     Replicating from localhost(127.0.0.1:5002)
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:09:38 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:09:38 2012 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jun 26 15:09:38 2012 - [info]
Tue Jun 26 15:09:38 2012 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Jun 26 15:09:38 2012 - [info]
Tue Jun 26 15:09:38 2012 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Jun 26 15:09:38 2012 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Tue Jun 26 15:09:38 2012 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Jun 26 15:09:38 2012 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Jun 26 15:09:38 2012 - [info]
Tue Jun 26 15:09:38 2012 - [info] * Phase 3: Master Recovery Phase..
Tue Jun 26 15:09:38 2012 - [info]
Tue Jun 26 15:09:38 2012 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Jun 26 15:09:38 2012 - [info]
Tue Jun 26 15:09:38 2012 - [info] The latest binary log file/position on all slaves is mysql-bin.000020:107
Tue Jun 26 15:09:38 2012 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:3306)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:09:38 2012 - [info]     Replicating from localhost(127.0.0.1:5002)
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:09:38 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:09:38 2012 - [info] The oldest binary log file/position on all slaves is mysql-bin.000020:107
Tue Jun 26 15:09:38 2012 - [info] Oldest slaves:
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:3306)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:09:38 2012 - [info]     Replicating from localhost(127.0.0.1:5002)
Tue Jun 26 15:09:38 2012 - [info]   127.0.0.1(127.0.0.1:5003)  Version=5.5.23-log (oldest major version between slaves) log-bin:enabled
Tue Jun 26 15:09:38 2012 - [info]     Replicating from 127.0.0.1(127.0.0.1:5002)
Tue Jun 26 15:09:38 2012 - [info]
Tue Jun 26 15:09:38 2012 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Jun 26 15:09:38 2012 - [info]
Tue Jun 26 15:09:38 2012 - [info] Fetching dead master's binary logs..
Tue Jun 26 15:09:38 2012 - [info] Executing command on the dead master 127.0.0.1(127.0.0.1:5002): save_binary_logs --command=save --start_file=mysql-bin.000020  --start_pos=107 --binlog_dir=/var/lib/mysql_s2/data --output_file=/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000020 pos 107 to mysql-bin.000020 EOF into /var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/lib/mysql_s2/data/mysql-bin.000020 position 107 to tail(126).. ok.
 Concat succeeded.
saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog                                                                                                                  100%  126     0.1KB/s   00:00
Tue Jun 26 15:09:38 2012 - [info] scp from root@127.0.0.1:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog succeeded.
Tue Jun 26 15:09:38 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 15:09:39 2012 - [info] HealthCheck: SSH to 127.0.0.1 is reachable.
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] * Phase 3.3: Determining New Master Phase..
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Jun 26 15:09:39 2012 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Jun 26 15:09:39 2012 - [info] Searching new master from slaves..
Tue Jun 26 15:09:39 2012 - [info]  Candidate masters from the configuration file:
Tue Jun 26 15:09:39 2012 - [info]  Non-candidate masters:
Tue Jun 26 15:09:39 2012 - [info] New master is 127.0.0.1(127.0.0.1:3306)
Tue Jun 26 15:09:39 2012 - [info] Starting master failover..
Tue Jun 26 15:09:39 2012 - [info]
From:
127.0.0.1 (current master)
 +--127.0.0.1
 +--127.0.0.1
To:
127.0.0.1 (new master)
 +--127.0.0.1
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 26 15:09:39 2012 - [info] Sending binlog..
saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog                                                                                                                  100%  126     0.1KB/s   00:00
Tue Jun 26 15:09:39 2012 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog to root@127.0.0.1:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog succeeded.
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Jun 26 15:09:39 2012 - [info] Starting recovery on 127.0.0.1(127.0.0.1:3306)..
Tue Jun 26 15:09:39 2012 - [info]  Generating diffs succeeded.
Tue Jun 26 15:09:39 2012 - [info] Waiting until all relay logs are applied.
Tue Jun 26 15:09:39 2012 - [info]  done.
Tue Jun 26 15:09:39 2012 - [info] Getting slave status..
Tue Jun 26 15:09:39 2012 - [info] This slave(127.0.0.1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000020:107). No need to recover from Exec_Master_Log_Pos.
Tue Jun 26 15:09:39 2012 - [info] Connecting to the target slave host 127.0.0.1, running recover script..
Tue Jun 26 15:09:39 2012 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --timestamp=20120626150937 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Tue Jun 26 15:09:39 2012 - [info]
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog on 127.0.0.1:3306. This may take long time...
Applying log files succeeded.
Tue Jun 26 15:09:39 2012 - [info]  All relay logs were successfully applied.
Tue Jun 26 15:09:39 2012 - [info] Getting new master's binlog name and position..
Tue Jun 26 15:09:39 2012 - [info]  mysql-bin.000021:107
Tue Jun 26 15:09:39 2012 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000021', MASTER_LOG_POS=107, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Jun 26 15:09:39 2012 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Tue Jun 26 15:09:39 2012 - [info] ** Finished master recovery successfully.
Tue Jun 26 15:09:39 2012 - [info] * Phase 3: Master Recovery Phase completed.
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] * Phase 4: Slaves Recovery Phase..
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] -- Slave diff file generation on host 127.0.0.1(127.0.0.1:5003) started, pid: 14113. Check tmp log /var/log/masterha/app1/127.0.0.1_5003_20120626150937.log if it takes time..
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] Log messages from 127.0.0.1 ...
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 26 15:09:39 2012 - [info] End of log messages from 127.0.0.1.
Tue Jun 26 15:09:39 2012 - [info] -- 127.0.0.1(127.0.0.1:5003) has the latest relay log events.
Tue Jun 26 15:09:39 2012 - [info] Generating relay diff files from the latest slave succeeded.
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Jun 26 15:09:39 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] -- Slave recovery on host 127.0.0.1(127.0.0.1:5003) started, pid: 14115. Check tmp log /var/log/masterha/app1/127.0.0.1_5003_20120626150937.log if it takes time..
saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog                                                                                                                  100%  126     0.1KB/s   00:00
Tue Jun 26 15:09:40 2012 - [info]
Tue Jun 26 15:09:40 2012 - [info] Log messages from 127.0.0.1 ...
Tue Jun 26 15:09:40 2012 - [info]
Tue Jun 26 15:09:39 2012 - [info] Sending binlog..
Tue Jun 26 15:09:40 2012 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog to root@127.0.0.1:/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog succeeded.
Tue Jun 26 15:09:40 2012 - [info] Starting recovery on 127.0.0.1(127.0.0.1:5003)..
Tue Jun 26 15:09:40 2012 - [info]  Generating diffs succeeded.
Tue Jun 26 15:09:40 2012 - [info] Waiting until all relay logs are applied.
Tue Jun 26 15:09:40 2012 - [info]  done.
Tue Jun 26 15:09:40 2012 - [info] Getting slave status..
Tue Jun 26 15:09:40 2012 - [info] This slave(127.0.0.1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000020:107). No need to recover from Exec_Master_Log_Pos.
Tue Jun 26 15:09:40 2012 - [info] Connecting to the target slave host 127.0.0.1, running recover script..
Tue Jun 26 15:09:40 2012 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=repl --slave_host=127.0.0.1 --slave_ip=127.0.0.1  --slave_port=5003 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.23-log --timestamp=20120626150937 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Tue Jun 26 15:09:40 2012 - [info]
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_127.0.0.1_5002_20120626150937.binlog on 127.0.0.1:5003. This may take long time...
Applying log files succeeded.
Tue Jun 26 15:09:40 2012 - [info]  All relay logs were successfully applied.
Tue Jun 26 15:09:40 2012 - [info]  Resetting slave 127.0.0.1(127.0.0.1:5003) and starting replication from the new master 127.0.0.1(127.0.0.1:3306)..
Tue Jun 26 15:09:40 2012 - [info]  Executed CHANGE MASTER.
Tue Jun 26 15:09:40 2012 - [info]  Slave started.
Tue Jun 26 15:09:40 2012 - [info] End of log messages from 127.0.0.1.
Tue Jun 26 15:09:40 2012 - [info] -- Slave recovery on host 127.0.0.1(127.0.0.1:5003) succeeded.
Tue Jun 26 15:09:40 2012 - [info] All new slave servers recovered successfully.
Tue Jun 26 15:09:40 2012 - [info]
Tue Jun 26 15:09:40 2012 - [info] * Phase 5: New master cleanup phease..
Tue Jun 26 15:09:40 2012 - [info]
Tue Jun 26 15:09:40 2012 - [info] Resetting slave info on the new master..
Tue Jun 26 15:09:40 2012 - [info]  127.0.0.1: Resetting slave info succeeded.
Tue Jun 26 15:09:40 2012 - [info] Master failover to 127.0.0.1(127.0.0.1:3306) completed successfully.
Tue Jun 26 15:09:40 2012 - [info]
----- Failover Report -----
app1: MySQL Master failover 127.0.0.1 to 127.0.0.1 succeeded
Master 127.0.0.1 is down!
Check MHA Manager logs at MyMasterHost for details.
Started automated(non-interactive) failover.
The latest slave 127.0.0.1(127.0.0.1:3306) has all relay logs for recovery.
Selected 127.0.0.1 as a new master.
127.0.0.1: OK: Applying all logs succeeded.
127.0.0.1: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
127.0.0.1: OK: Applying all logs succeeded. Slave started, replicating from 127.0.0.1.
127.0.0.1: Resetting slave info succeeded.
Master failover to 127.0.0.1(127.0.0.1:3306) completed successfully.
■あ、上手くいったようなメッセージ。
  確認してみよー。

[root@MyMasterHost ~]# mysql -uroot -p -h127.0.0.1 -P5003
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000021
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000021
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 409
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)
■すれーぶさんはおっけー!3306に向いてるね。

[root@MyMasterHost ~]# mysql -uroot -p -h127.0.0.1
mysql> show slave status\G
Empty set (0.00 sec)
mysql> show master status\G
*************************** 1. row ***************************
            File: mysql-bin.000021
        Position: 198
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)
1 row in set (0.00 sec)
■ふむ。すれーぶ機能は停止してるし、いいのではないだろうか。
  適当に書き込んでレプられてる事も見たので大丈夫そう。

何事もなく5002ポートさんをslave1に返り咲かせる

mysql> CHANGE MASTER TO MASTER_HOST = 'localhost',
MASTER_USER = 'repl',
MASTER_PASSWORD = 'repl',
MASTER_PORT = 3306,
MASTER_LOG_FILE = 'mysql-bin.000021',
MASTER_LOG_POS = 198;
mysql> slave start;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: localhost
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000021
          Read_Master_Log_Pos: 198
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000021
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 198
              Relay_Log_Space: 409
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)
■おっけーれす。

という事で纏め。
復旧作業時に /var/log/masterha/app1/app1.failover.complete を消そう。
/etc/app.cnfをちゃんと書き換えよう。
上記を忘れると二回目以降のfailoverが失敗する

という夢だったとさ。

== 第壱部 完 ==

0 件のコメント:

コメントを投稿