PSA: I haven’t seen a solution to this issue online, so posting it here for search engines to index.
PostgreSQL is an ACID-compliant database, but filesystem corruption can still prevent it from starting.
In my case, a linux test instance running in a Virtualbox VM was not cleanly stopped after a power failure.
The result was that data/pg_stat_tmp/ was corrupted, and an invalid replication checkpoint value was written, resulting in PostgreSQL refusing to start.
If you see the following error in data/log/:
2018-05-19 06:18:38.437 UTC [2647] PANIC: replication checkpoint has wrong magic 1767992667 instead of 307747550
2018-05-19 06:18:38.446 UTC [2595] LOG: startup process (PID 2647) was terminated by signal 6: Aborted
2018-05-19 06:18:38.446 UTC [2595] LOG: aborting startup due to startup process failure
2018-05-19 06:18:38.448 UTC [2595] LOG: database system is shut down
The solution after you fix any filesystem problems (if you don’t have a slave):
- backup your config: cp -p data/postgresql.conf data/postgresql.conf.bkp
- in data/postgresql.conf set max_logical_replication_workers = 0
- start PostgreSQL to automatically recover, then stop it
- restore your config: cp -p data/postgresql.conf.bkp data/postgresql.conf
- restart PostgreSQL
- read the logs in /data/log/ to understand if there are other problems.
Although you can fix indexes like the following with REINDEX INDEX idx_name
, you may just want to re-install PostgreSQL from backup at this point:
2018-05-19 17:30:18.052 UTC [11795] ERROR: index "idx_16573_primary" contains unexpected zero page at block 1733
2018-05-19 17:30:18.052 UTC [11795] HINT: Please REINDEX it.
If you have a slave, you will likely have to:
- stop the slave first
- do the above
- make a new backup from the master and rebuild your slave.
- start the new slave.
PostgreSQL Manual: 19.6. Replication