Errors when visiting pages, logging in to Manage portal
Incident Report for Atlassian Statuspage
Postmortem

We kicked off a deployment to production at 1746 UTC that was then discovered to contain bad code that resulted in 500 errors across the Statuspage.io platform. This issue affected status pages, the Manage portal, and API calls.

At 1800 UTC the misbehaving version of deployed code hit live servers and all status pages began 500ing shortly thereafter. We detected the issue at around 1803 UTC and made the call to rollback at 1805 UTC.

Unfortunately, it turned out our CI-automated rollback process was in a broken state. It was not until 1812 UTC that this process hard failed and we then opted to kick off a manual rollback, which brought the site back online at 1814 UTC.

Since this incident, we have optimized our rollback policy and process to ensure that we respond to situations like this more immediately, and the rollback itself is more swift. Furthermore, we also have a regularly scheduled event to test our rollback process and ensure it's still working the way we expect. Finally, we have recently shipped additional quality checks that are applied to our production rollout process to ensure broken deployments can't make it to live servers in the first place.

Posted Oct 09, 2017 - 09:58 PDT

Resolved
Pages and Manage portal logins remain in a reliable state. This Incident is resolved.
Posted Sep 20, 2017 - 11:43 PDT
Monitoring
We have rolled back a recent deployment, and pages are being successfully delivered. We will continue to monitor services to ensure reliable access.
Posted Sep 20, 2017 - 11:18 PDT
Investigating
We are investigating errors that are occurring when visiting some pages, and when attempting to login to the Manage portal
Posted Sep 20, 2017 - 11:09 PDT
This incident affected: Hosted Pages (HTTP Pages, HTTPS Pages, Status Embed Widget, Public API, Shortlinks) and Management (Web Portal, Authenticated API, DNS Validation, SSL Provisioning, Billing).