On 08/12/2019, between 08:09am - 08:16am, there was a hard outage across multiple components of Statuspage. This event was triggered by an errant database migration.
As part of an application change pertaining to authentication changes, we needed to drop some legacy attributes from primary database, requiring an ad-hoc database migration. We expected this migration to be a transparent change unfortunately it lead to a ~ 7 minute outage across our services.
Database migrations that involve actions such as "dropping columns" introduce a table level lock. While this is not the first time we have worked on database migrations of this nature, on this particular day we had increased activity from some of our worker jobs that were causing intermittent "locks" thereby queuing the database migration. This lead to an increased time to process the actual migration script, and blocking subsequent calls to those specific database tables. This inadvertent block attributed to a 7 minute outage across our services.
Reliability and uptime for our services remain top priority. We will be hardening our ad-hoc database migration process such that it takes into consideration increased database activity and avoids impacting subsequent operations to our services.
We apologize for the disruption in our service as a result of this incident and thank you for trusting us with your incident communication. Please use this form to contact us, incase you have any further questions regarding this outage.