postgres主備切換之文件觸發方式詳解

Posted on 2020-12-30 by WalkonNet

本文測試參考PostgresSQL實戰一書。

本文檔測試環境：

主庫IP：192.168.40.130 主機名：postgres 端口：5442

備庫IP: 192.168.40.131 主機名：postgreshot 端口：5442

PostgreSQL9.0版本流復制主備切換隻能通過創建觸發文件方式進行，這一小節將介紹這種主備切換方式，測試環境為一主一備異步流復制環境，postgres上的數據庫為主庫，postgreshot上的數據庫為備庫，文件觸發方式的手工主備切換主要步驟如下：

1）配置備庫recovery.conf文件trigger_file參數，設置激活備庫的觸發文件路徑和名稱。

2）關閉主庫，建議使用-m fast模式關閉。

3）在備庫上創建觸發文件激活備庫，如果recovery.conf變成recovery.done表示備庫已經切換成主庫。

4）這時需要將老的主庫切換成備庫，在老的主庫的$PGDATA目錄下創建recovery.conf文件（如果此目錄下不存在recovery.conf文件，可以根據$PGHOME/share/recovery.conf.sample模板文件復制一個，如果此目錄下存在recovery.done文件，需將recovery.done文件重命名為recovery.conf），配置和老的從庫一樣，隻是primary_conninfo參數中的IP換成對端IP。

5）啟動老的主庫，這時觀察主、備進程是否正常，如果正常表示主備切換成功。

1、首先在備庫上配置recovery.conf，如下所示：

[postgres@postgreshot pg11]$ cat recovery.conf | grep -v '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.130 port=5442 user=replica application_name=pg1'      # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgreshot pg11]$

trigger_file可以配置成普通文件或隱藏文件，調整以上參數後需重啟備庫使配置參數生效。

2、關閉主庫，如下所示：

[postgres@postgres pg11]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@postgres pg11]$

3、在備庫上創建觸發文件激活備庫，如下所示：

[postgres@postgreshot pg11]$ ll recovery.conf 
-rwx------ 1 postgres postgres 5.9K Mar 26 18:47 recovery.conf
[postgres@postgreshot pg11]$ 
[postgres@postgreshot pg11]$ touch /home/postgres/pg11/trigger
[postgres@postgreshot pg11]$ ll recovery*
-rwx------ 1 postgres postgres 5.9K Mar 26 18:47 recovery.done
[postgres@postgreshot pg11]$

觸發器文件名稱和路徑需和recovery.conf配置文件trigger_file保持一致，再次查看recovery文件時，發現後輟由原來的.conf變成瞭.done

查看備庫數據庫日志，如下所示：

2019-03-26 23:30:19.399 EDT [93162] LOG: replication terminated by primary server
2019-03-26 23:30:19.399 EDT [93162] DETAIL: End of WAL reached on timeline 3 at 0/50003D0.
2019-03-26 23:30:19.399 EDT [93162] FATAL: could not send end-of-streaming message to primary: no COPY in progress
2019-03-26 23:30:19.399 EDT [93158] LOG: invalid record length at 0/50003D0: wanted 24, got 0
2019-03-26 23:30:19.405 EDT [125172] FATAL: could not connect to the primary server: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2019-03-26 23:30:24.410 EDT [125179] FATAL: could not connect to the primary server: could not connect to server: Connection refused
        Is the server running on host "192.168.40.130" and accepting
        TCP/IP connections on port 5442?
2019-03-26 23:31:49.505 EDT [93158] LOG: trigger file found: /home/postgres/pg11/trigger
2019-03-26 23:31:49.506 EDT [93158] LOG: redo done at 0/5000360
2019-03-26 23:31:49.506 EDT [93158] LOG: last completed transaction was at log time 2019-03-26 19:03:11.202845-04
2019-03-26 23:31:49.516 EDT [93158] LOG: selected new timeline ID: 4
2019-03-26 23:31:50.063 EDT [93158] LOG: archive recovery complete
2019-03-26 23:31:50.083 EDT [93157] LOG: database system is ready to accept connections

根據備庫以上信息，由於關閉瞭主庫，首先日志顯示連接不上主庫，接著顯示發現瞭觸發文件，之後顯示恢復成功，數據庫切換成讀寫模式。

這時根據pg_controldata輸出進行驗證，如下所示：

[postgres@postgreshot ~]$ pg_controldata | grep cluster
Database cluster state:        in production
[postgres@postgreshot ~]$

以上顯示數據庫角色已經是主庫角色，在postgreshot上創建一張名為test_alived的表並插入數據，如下所示：

postgres=# CREATE TABLE test_alived2(id int4);
CREATE TABLE
postgres=# INSERT INTO test_alived2 VALUES(1);
INSERT 0 1
postgres=#

4、準備將老的主庫切換成備庫角色，在老的主庫上配置recovery.conf，如下所示：

[postgres@postgres pg11]$ cat recovery.conf | grep -v '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.131 port=5442 user=replica application_name=pg2'      # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgres pg11]$

以上配置和postgreshot上的recovery.done配置文件基本一致，隻是primary_conninfo參數的host選項配置成對端主機IP。

之後在postgres主機用戶傢目錄創建~/.pgpass文件，如下所示：

[postgres@pghost1 ~]$ touch ~/.pgpass

[postgres@pghost1 ~]$ chmod 600 ~/.pgpass

並在~/.pgpass文件中插入以下內容：

[postgres@postgres ~]$ cat .pgpass
192.168.40.130:5442:replication:replica:replica
192.168.40.131:5442:replication:replica:replica
[postgres@postgres ~]

之後啟動postgres上的數據庫，如下所示：

[postgres@postgres ~]$ pg_ctl start
waiting for server to start....2019-03-26 23:38:50.424 EDT [55380] LOG: listening on IPv4 address "0.0.0.0", port 5442
2019-03-26 23:38:50.424 EDT [55380] LOG: listening on IPv6 address "::", port 5442
2019-03-26 23:38:50.443 EDT [55380] LOG: listening on Unix socket "/tmp/.s.PGSQL.5442"
2019-03-26 23:38:50.465 EDT [55381] LOG: database system was shut down in recovery at 2019-03-26 23:38:20 EDT
2019-03-26 23:38:50.465 EDT [55381] LOG: entering standby mode
2019-03-26 23:38:50.483 EDT [55381] LOG: consistent recovery state reached at 0/50003D0
2019-03-26 23:38:50.483 EDT [55381] LOG: invalid record length at 0/50003D0: wanted 24, got 0
2019-03-26 23:38:50.483 EDT [55380] LOG: database system is ready to accept read only connections
 done
server started
[postgres@postgres ~]$ 2019-03-26 23:38:50.565 EDT [55385] LOG: fetching timeline history file for timeline 4 from primary server
2019-03-26 23:38:50.588 EDT [55385] LOG: started streaming WAL from primary at 0/5000000 on timeline 3
2019-03-26 23:38:50.589 EDT [55385] LOG: replication terminated by primary server
2019-03-26 23:38:50.589 EDT [55385] DETAIL: End of WAL reached on timeline 3 at 0/50003D0.
2019-03-26 23:38:50.592 EDT [55381] LOG: new target timeline is 4
2019-03-26 23:38:50.594 EDT [55385] LOG: restarted WAL streaming at 0/5000000 on timeline 4
2019-03-26 23:38:50.717 EDT [55381] LOG: redo starts at 0/50003D0
 
[postgres@postgres ~]$ pg_controldata | grep cluster
Database cluster state:        in archive recovery
[postgres@postgres ~]$ 
 
postgres=# select * from test_alived2;
 id 
----
 1
(1 row)
 
postgres=#

同時，postgres上已經有瞭WAL接收進程，postgreshot上有瞭WAL發送進程，說明老的主庫已經成功切換成備庫，以上是主備切換的所有步驟。

為什麼在步驟2中需要幹凈地關閉主庫？數據庫關閉時首先做一次checkpoint，完成之後通知WAL發送進程要關閉瞭，WAL發送進程會將截止此次checkpoint的WAL日志流發送給備庫的WAL接收進程，備節點接收到主庫最後發送來的WAL日志流後應用WAL，從而達到瞭和主庫一致的狀態。

另一個需要註意的問題是假如主庫主機異常宕機瞭，如果激活備庫，備庫的數據完全和主庫一致嗎？此環境為一主一備異步流復制環境，備庫和主庫是異步同步方式，存在延時，這時主庫上已提交事務的WAL有可能還沒來得及發送給備庫，主庫主機就已經宕機瞭，因此異步流復制備庫可能存在事務丟失的風險。

主備切換之pg_ctl promote方式

上面介紹瞭以文件觸發方式進行主備切換，PostgreSQL9.1版本開始支持pg_ctl promote觸發方式，相比文件觸發方式操作更方便，promote命令語法如下：

pg_ctl promote [-D datadir]

-D是指數據目錄，如果不指定會使用環境變量$PGDATA設置的值。promote命令發出後，運行中的備庫將停止恢復模式並切換成讀寫模式的主庫。

pg_ctl promote主備切換步驟和文件觸發方式大體相同，隻是步驟1中不需要配置recovery.conf配置文件中的trigger_file參數，並且步驟3中換成以pg_ctl promote方式進行主備切換，如下：

1）關閉主庫，建議使用-m fast模式關閉。

2）在備庫上執行pg_ctl promote命令激活備庫，如果recovery.conf變成recovery.done表示備庫已切換成為主庫。

3）這時需要將老的主庫切換成備庫，在老的主庫的$PGDATA目錄下創建recovery.conf文件（如果此目錄下不存在recovery.conf文件，可以根據$PGHOME/share/recovery.conf.sample模板文件復制一個，如果此目錄下存在recovery.done文件，需將recovery.done文件重命名為recovery.conf），配置和老的從庫一樣，隻是primary_conninfo參數中的IP換成對端IP。

4）啟動老的主庫，這時觀察主、備進程是否正常，如果正常表示主備切換成功。以上是pg_ctl promote主備切換的主要步驟，這一小節不進行演示瞭，下一小節介紹pg_rewind工具時會給出使用pg_ctl promote進行主備切換的示例

pg_rewind

pg_rewind是流復制維護時一個非常好的數據同步工具，在上一節介紹流復制主備切換內容中講到瞭主要有五個步驟進行主備切換，其中步驟2是在激活備庫前先關閉主庫，如果不做步驟2會出現什麼樣的情況？下面我們舉例進行演示，測試環境為一主一備異步流復制環境，postgres上的數據庫為主庫，postgreshot上的數據庫為備庫。

主備切換

–備節點 recovery.conf 配置: postgreshot 上操作

備庫recovery.conf配置如下所示：

[postgres@postgreshot pg11]$ cat recovery.conf | grep -v '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.130 port=5442 user=replica application_name=pg1'      # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgreshot pg11]$

–激活備節點: postgreshot 上操作

檢查流復制狀態，確保正常後在備庫主機上執行以下命令激活備庫，如下所示

[postgres@postgreshot pg11]$ pg_ctl promote -D $PGDATA
waiting for server to promote.... done
server promoted
[postgres@postgreshot pg11]$ 
[postgres@postgreshot pg11]$

查看備庫數據庫日志，能夠看到數據庫正常打開接收外部連接的信息，這說明激活成功，檢查postgreshot上的數據庫角色，如下所示:

[postgres@postgreshot pg11]$ pg_controldata | grep cluster
Database cluster state:        in production
[postgres@postgreshot pg11]$

從pg_controldata輸出也可以看到postgreshot上的數據庫已成為主庫，說明postgreshot上的數據庫已經切換成主庫，這時老的主庫（postgres上的數據庫）依然還在運行中，我們計劃將postgres上的角色轉換成備庫，先查看postgres上的數據庫角色，如下所示

[postgres@postgres pg11]$ pg_controldata | grep cluster
Database cluster state:        in production
[postgres@postgres pg11]$

–備節點激活後，創建一張測試表並插入數據

postgres=# create table test_1(id int4);
CREATE TABLE
postgres=# insert into test_1(id) select n from generate_series(1,10) n;
INSERT 0 10
postgres=#

–停原來主節點: postgres 上操作

[postgres@postgres pg11]$ pg_controldata | grep cluster
Database cluster state:        in production
[postgres@postgres pg11]$ 
[postgres@postgres pg11]$ pg_ctl stop -m fast -D $PGDATA
2019-03-27 01:10:46.714 EDT [64858] LOG: received fast shutdown request
waiting for server to shut down....2019-03-27 01:10:46.716 EDT [64858] LOG: aborting any active transactions
2019-03-27 01:10:46.717 EDT [64858] LOG: background worker "logical replication launcher" (PID 64865) exited with exit code 1
2019-03-27 01:10:46.718 EDT [64860] LOG: shutting down
2019-03-27 01:10:46.731 EDT [64858] LOG: database system is shut down
 done
server stopped
[postgres@postgres pg11]$

–pg_rewind: postgres 上操作

[postgres@postgreshot pg11]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.40.131 port=5442 user=replica password=replica'
 
target server needs to use either data checksums or " = on"
Failure, exiting
[postgres@postgreshot pg11]$

備註：數據庫在 initdb 時需要開啟 checksums 或者設置 “wal_log_hints = on”，接著設置主，備節點的 wal_log_hints 參數並重啟數據庫。

[postgres@postgres pg11]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.40.131 port=5442 user=replica password=replica'
servers diverged at WAL location 0/70001E8 on timeline 5
rewinding from last common checkpoint at 0/6000098 on timeline 5
Done!
[postgres@postgres pg11]$ 
[postgres@postgres pg11]$

備註：pg_rewind 成功。

–調整 recovery.conf 文件: postgres 操作

[postgres@postgres pg11]$ mv recovery.done recovery.conf
[postgres@postgres pg11]$ 
[postgres@postgres pg11]$ cat recovery.conf | grep -v '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.131 port=5442 user=replica application_name=pg2'      # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgres pg11]$

–啟動原主庫， postgres 上操作

[postgres@postgres pg11]$ pg_ctl start -D $PGDATA
waiting for server to start....2019-03-27 01:14:48.028 EDT [66323] LOG: listening on IPv4 address "0.0.0.0", port 5442
2019-03-27 01:14:48.028 EDT [66323] LOG: listening on IPv6 address "::", port 5442
2019-03-27 01:14:48.031 EDT [66323] LOG: listening on Unix socket "/tmp/.s.PGSQL.5442"
2019-03-27 01:14:48.045 EDT [66324] LOG: database system was interrupted while in recovery at log time 2019-03-27 01:08:08 EDT
2019-03-27 01:14:48.045 EDT [66324] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2019-03-27 01:14:48.084 EDT [66324] LOG: entering standby mode
2019-03-27 01:14:48.089 EDT [66324] LOG: redo starts at 0/6000060
2019-03-27 01:14:48.091 EDT [66324] LOG: invalid record length at 0/7024C98: wanted 24, got 0
2019-03-27 01:14:48.096 EDT [66331] LOG: started streaming WAL from primary at 0/7000000 on timeline 6
2019-03-27 01:14:48.109 EDT [66324] LOG: consistent recovery state reached at 0/7024CD0
2019-03-27 01:14:48.110 EDT [66323] LOG: database system is ready to accept read only connections
 done
server started
[postgres@postgres pg11]$ 
[postgres@postgres pg11]$ pg_controldata | grep cluster
Database cluster state:        in archive recovery
[postgres@postgres pg11]$

–數據驗證, postgres 上操作

[postgres@postgres pg11]$ p
psql (11.1)
Type "help" for help.
 
postgres=# select count(*) from test_1;
 count 
-------
  10
(1 row)
 
postgres=#

備註：pg_rewind 成功，原主庫現在是以備庫角色啟動，而且數據表 test_1 也同步過來瞭。

pg_rewind 原理

The basic idea is to copy everything from the new cluster to the old cluster, except for the blocks that we know to be the same.

1)Scan the WAL log of the old cluster, starting from the last checkpoint before the point where the new cluster’s timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that were touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off.

2)Copy all those changed blocks from the new cluster to the old cluster.

3)Copy all other files like clog, conf files etc. from the new cluster to old cluster. Everything except the relation files.

4) Apply the WAL from the new cluster, starting from the checkpoint created at failover. (Strictly speaking, pg_rewind doesn’t apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay from that checkpoint and apply all the required WAL.)

補充：postgres主備搭建時踩坑點

搭建pg主備流復制時的踩坑集錦

1: socket 路徑問題報錯如下

你好！這是你第一次使用 **Markdown編輯器** 所展示的歡迎頁。如果你想學習如何使用Markdown編輯器,仔細閱讀這篇文章，瞭解一下Markdown的基本語法知識。解決方法：修改postgres.conf中unix_socket_permissions = ‘*’ 路徑修改為上述報錯中的路徑重啟即可

2:搭建主備時備庫的data目錄一定一定一定要使用主庫基礎備份出來的數據。可采用pg_basebackup 的方式，也可以采用tar包打包解包的方式進行基礎備份

如果備庫不小心已經初始化過請刪除data目錄下的* 並使用主庫的基礎備份重新啟動

3:備庫啟動時報錯 FATAL: no pg_hba.conf entry for replication connection from host “172.20.0.16”, user “repl” 之類的問題

例如 master：IP： *.1 standby：IP *.2 主備賬號repl

那麼在pg_hba.cnf中單單指明 host replication repl *.2 md5 是不行的

還需在此條記錄前面添加 host all all *.2 md5

首先要能訪問主庫才會資格使用repl賬號進行同步的步驟

以上為個人經驗，希望能給大傢一個參考，也希望大傢多多支持WalkonNet。如有錯誤或未考慮完全的地方，望不吝賜教。

postgres主備切換之文件觸發方式詳解

本文檔測試環境：

主備切換之pg_ctl promote方式

主備切換

pg_rewind 原理

搭建pg主備流復制時的踩坑集錦

推薦閱讀：

發佈留言取消回覆

近期文章

本文檔測試環境：

主備切換之pg_ctl promote方式

主備切換

pg_rewind 原理

搭建pg主備流復制時的踩坑集錦

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆