Postgresql Streaming Replication

From CoCCA Registry Services (NZ) Limited

Jump to: navigation, search



We will discuss Implementing streaming replication as a Backup and recovery strategy For ccTLD using postgresql 9.X based on CoCCA installationm you can see the other Backup strategies Disaster_Recovery_using_Offsite_backup_and_Data_escrow.

Why Streaming Replication

There are several options to backup a PostgreSQL database.


  The easy and standard pg_dump command it can take a long time to backup and restore for large database with pg_dump.


  Create a file system level backup by directly copying the files that PostgreSQL uses to store the data in the database,
  A file system level backup is fast, but it does have one major restriction: the database server must be shut down in order to get       
  an usable backup "tar czf backup.tar.gz /opt/cocca-8/postgresql/data/".


  Use PostgreSQL's write ahead log (WAL) to continuously archive the database.Continuous Archiving is a backup approach    
  that takes advantage of PostgreSQL's write ahead log (WAL). The log records every change made to the database's data files and forms the 
  basis of PostgreSQL's streaming replication. There are several advantages to continuous archiving backups discussed in PostgreSQL's    
  documentation on the subject, and you should notice The slave server will be overwritten.  Any data stored on it will be gone.
  We need to setup a simple PostgreSQL master slave configuration with file-based continuous archiving, for implement the replication 
  options,These steps assume we installed CoCCA and PostgreSQL using CoCCA installer Last Patches and CoCCA installer.

Configuration Master DB Server

Step 1

Configure SSH Keys for the PostgreSQL User When the WAL logs get switched out, we typically will want to be able to copy them over to our slave so that we have less risk of losing data in the event of a disaster. Rather than manually copying these files, we can automate the task. However, to automate the copying of these files, we need to create an SSH key on the master server that we will copy over to the slave so that later the master can copy its WAL files over after each switch.

        #su - postgres
        #ssh-keygen -b 2048
        #ssh-copy-id -i ~/.ssh/ postgres@slave.backup.ip

if you are running selinux on your OS maybe you need to run this command as postgres user:

        #restorecon -Rv ~/.ssh

Step 2

Create a PostgreSQL Replication User Create a user that the slave servers can use to contact PostgreSQL on the Master server

         #psql -U postgres
         #SELECT pg_switch_xlog();

Step 3 we should also edit the file pg_hba.conf to allow slaves to access the replication postgresql database, using the replicator user as the following:

   host    replication     replicator  slave.backup.ip/32      md5 

you can use the option “trust” instead the md5 then you should restrict the access to postgresql DB using iptables.

Step 4 Some changes to the Master Server’s PostgreSQL.conf file, using the following options:

wal_keep_segments value In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation, must be high enough ensure that old WAL segments are not recycled too early, for example, setting up such high value for "wal_keep_segments", no problem if you have enough space (example: 5000 * 16 = 80 G).

wal_level and max_wal_senders can't be changed without a server restart

max_wal_senders value The maximum number of simultaneously running WAL sender processes, maximum number of concurrent connections from standby servers, it must be less than max_connections.

wal_sender_delay value Specifies the delay between activity rounds for WAL sender processes, sleep milliseconds is interrupted by transaction commit.

wal_level 'hot_standby' Standby nodes can then be used for read-only query access. checkpoint_segments Value : default 3, Increasing this parameter can increase the amount of time needed for crash recovery, In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log.

checkpoint_timeout Value Known as the redo record from which it should start the REDO operation, increase the checkpoint_segments parameters to speed up your updates, the primary might remove a WAL segment still needed by the standby, in which case the replication connection will be terminated.

archive_mode on When archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive_command.

archive_command = 'rsync -a %p postgres@slave.backup.ip:/opt/cocca-8/postgresql/wals/%f </dev/null'

listen_addresses ‘*’ to be the slave server able to connect the master server at the port 5432, it should be listen at public ip and not just localhost “”.

Notice : At this step we should restart the Master server: /opt/cocca-8/postgresql/scripts/ctl stop/start

Step 5

Make the Base Backup from Master server to slave server We will use the rsync command to begin the base backup as an initial copy of the database onto the slave machine,as the following:

 su - postgres
 cd /opt/cocca-8/postgresql/bin/
 psql -c "SELECT pg_start_backup('backup', true)"
 rsync -av --exclude --exclude pg_xlog /opt/cocca-8/postgresql/data/ postgres@slave.backup.ip:/opt/cocca-8/postgresql/data/
 psql -c "SELECT pg_stop_backup()"

Configuration Slave DB Server

Step 1

At this step we need to create a folder to store the WAL files at the slave sever to a void lost any data, however, the standby server can recover by fetching the segment from the archive, if WAL archiving is in use completed WAL segments are sent to archive storage (wals folder) by setting archive_command, for this reason we will create the folder as the following: Create the WAL folder to save completed WAL, at the slave server

                su - postgres
                mkdir /opt/cocca-8/postgresql/wals/

Step 2

At the slave server we need to edit /opt/cocca-8/postgresql/data/postgresql.conf and enable hot_standby. hot_standby on Specifies whether run read-only queries while the server is in archive recovery or standby mode, useful both for replication purposes restoring a backup to a desired state with great precision, set to true on a standby server.


Configure the recovery at the slave server, by create a recovery command file " /opt/cocca-8/postgresql/data/recovery.conf" in the standby server, this file should contain the following lines:

       standby_mode = 'on'
      	primary_conninfo = 'host=master.backup.ip port=5432 user=replicator password=password';trigger_file = '/tmp/pgsql.trigger'
       restore_command = 'cp -f /opt/cocca-8/postgresql/wals/%f %p </dev/null'
       archive_cleanup_command = '/opt/cocca-8/postgresql/bin/pg_archivecleanup /opt/cocca-8/postgresql/wals/ %r'

At this step we should restart the postgresql server at the slave server. /opt/cocca-8/postgresql/scripts/ctl stop/start

Common Error on Slave Side

The most error at slave DB postgresql engine is :

  PANIC: could not locate a valid checkpoint record

to solve this issues we need to reset the pg_xlog folder at the slave server side using the following command:

  /opt/cocca-8/postgresql/bin/pg_resetxlog -f /opt/cocca-8/postgresql/data

Monitoring Streaming Replication between Postgresql instances

To monitor the Replication between Master and Slave, we can use number of SQL query, for example we can calculate the replication lag by comparing the current WAL write location on the primary with the last WAL location received/replayed by the standby. which can be retrieved using pg_current_xlog_location on the primary and the pg_last_xlog_receive_location/pg_last_xlog_replay_location on the standby. there are a lot of script monitor the Replication based on the previous SQL statements. Example:

$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)


(1 row)

$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host)


(1 row)

you can use the following script to monitor Replication between Master and Slave, you need to configure you Master IP and Port for postgresql, Slave IP and Port postgresql and registry database name.

Download the following file: File:Check

Promote Slave postgresql to Master Postgresql server using one command

In case of Master server failure, we can convert the slave server to be the Master, this can be done easly in postgresql, to convert slave to master:

- kill master server
- promote the slave server by excute the following command:
  /opt/cocca-8/postgresql/bin/pg_ctl -D /opt/cocca-8/postgresql/data promote 

we should notice when we convert the salve to master with promote command , the file /opt/cocca-8/postgresql/data/recovery.conf will renamed to:


Then we should change the option : hot_standby on the file /opt/cocca-8/postgresql/data/postgresq.conf on New Master server, from hot_standby on to hot_standby off , this just to enable the postgresql to be read/write.

Convert Old Master server to Slave

To convert old master to slave, just copy the recover.done from new master server to new slave server, then rename it to recovery.conf in old data directory in old master, then add the trigger file info and change the connection info to new Master server, then save.

Personal tools