Database backups, data loss, and steps we've made to lessen future impacts
Posted: Wed Aug 05, 2015 2:09 pm
Hey all,
As many of you know, our server's host's database server crashed on Monday. This affected all of their customers who used that database server. After the crash, they restored their latest server backup which was 7 days old. Eventually, they restored the daily backups, which we ended up using the daily backup taken Monday morning around 6am (ET).
After we recovered using the morning's database, we ended up losing about 14-16 hours worth of posts, topics, etc. Those hours just happen to include all daytime posts (which just happens to be our busy/peak hours of the day).
On Monday, we were relying on 2 different daily backups: the server host (which usually ran around 6am ET, retains 30 backups) and a forums plugin (which usually ran around 6-7pm ET, retains 10 backups). I had planned for webserver failures, with backups of source code, files, etc. but a database server failure is much, much rarer occurrence in the IT world.
So the site's admin & moderator team has implemented our own 3rd backup routine: hourly database backups. The backups are scheduled to run on the hour and we retain the last 48 (2 days) worth of hour backups on the server (in addition to the daily backups retained). We also upload the hourly backups to a DropBox account, in case of Armageddon, Ragnarok, T-virus escapes Raccoon City, or whatever.
I just wanted to inform everyone of what happened and tell you what steps we've taken to prevent another disaster like we experienced on Wormhole Monday.
thanks,
Shurijo
As many of you know, our server's host's database server crashed on Monday. This affected all of their customers who used that database server. After the crash, they restored their latest server backup which was 7 days old. Eventually, they restored the daily backups, which we ended up using the daily backup taken Monday morning around 6am (ET).
After we recovered using the morning's database, we ended up losing about 14-16 hours worth of posts, topics, etc. Those hours just happen to include all daytime posts (which just happens to be our busy/peak hours of the day).
On Monday, we were relying on 2 different daily backups: the server host (which usually ran around 6am ET, retains 30 backups) and a forums plugin (which usually ran around 6-7pm ET, retains 10 backups). I had planned for webserver failures, with backups of source code, files, etc. but a database server failure is much, much rarer occurrence in the IT world.
So the site's admin & moderator team has implemented our own 3rd backup routine: hourly database backups. The backups are scheduled to run on the hour and we retain the last 48 (2 days) worth of hour backups on the server (in addition to the daily backups retained). We also upload the hourly backups to a DropBox account, in case of Armageddon, Ragnarok, T-virus escapes Raccoon City, or whatever.
I just wanted to inform everyone of what happened and tell you what steps we've taken to prevent another disaster like we experienced on Wormhole Monday.
thanks,
Shurijo