Multiple sites showing down/under maintenance

Incident Report for Atlassian Statuspage

Postmortem

Earlier this month, several hundred Atlassian customers were impacted by a site outage. We have published a Post-Incident Review which includes a technical deep dive on what happened, details on how we restored customers sites, and the immediate actions we’ve taken to improve our operations and approach to incident management.

https://www.atlassian.com/engineering/post-incident-review-april-2022-outage

Posted Apr 29, 2022 - 13:34 PDT

Resolved

We have restored impacted Statuspage customer sites and the service is operating normally.

If you need assistance, please reply to your support ticket so that our engineers can work with you. If you have any trouble accessing your support ticket, contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu)

Our teams will be working on a detailed Post Incident Report to share publicly by the end of April.

Posted Apr 17, 2022 - 15:06 PDT

Monitoring

Posted Apr 17, 2022 - 12:12 PDT

Update

We have now restored 99% of users impacted by the outage and have reached out to all affected customers.
Our teams are available to help customers with any concerns. If you need assistance, please reply to your support ticket so that our engineers can work with you.
If you have any trouble accessing your support ticket, contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

Posted Apr 16, 2022 - 21:19 PDT

Update

We have now restored 85% of users impacted by the outage and will continue to get sites back to customers for validation, over the weekend.
As we hand your restored site over to you for validation, please reach out to our teams should you find any issues so that our support engineers can work to get you fully operational.

You can contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

Posted Apr 16, 2022 - 13:12 PDT

Update

We have now restored 78% of users impacted by the outage as we continue to move with more speed and accuracy. Our teams will continue to restore sites through the weekend, and we expect to have all sites restored no later than end of day Tuesday, April 19th PT. As we restore your site and hand it over to you for validation, please reach out to our teams should you find any issues so that our support engineers can work to get you fully restored.
You can contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

Posted Apr 15, 2022 - 18:42 PDT

Update

We have made significant progress over the last 24 hours and have now restored functionality for 62% of users impacted by the outage.

We have also doubled the size of the batches we are pushing through the restoration process, which was a result of optimizing automated processes as well as accelerating our restoration speed. Our global engineering teams continue to work 24/7, and we expect to progress quickly through technical restoration of remaining customer sites over the weekend.

If you do not have access to your open ticket, please contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

Posted Apr 15, 2022 - 13:13 PDT

Update

We have now restored functionality for 55% of users impacted by the outage.
With automation in full effect, we have significantly increased the pace at which we are conducting technical restoration of affected customer sites, and we have reduced the time required for the validation of restored sites by half.
If you are still experiencing an outage and do not have access to your open ticket, please contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

Posted Apr 14, 2022 - 20:06 PDT

Update

We have now restored functionality for 53% of users impacted by the outage.

As outlined in yesterday’s update, we are restoring affected customers using a three step process:

1. Technical restoration of affected sites
2. Internal validation of restored sites
3. Validating with affected customers before enabling their users

By automating some of our validation steps, we have now reduced time for internal validation of restored sites by half, which allows our support engineers to more quickly engage restored customers for validation and full site handover.

If you are still experiencing an outage and do not have access to your open ticket, please contact us at https://support.atlassian.com/contact/#/ (choose the Billing, Payments, & Pricing options from the drop down menu).

Posted Apr 14, 2022 - 13:14 PDT

Update

We have restored functionality for 49% of users impacted by the outage. We are taking a batch-based approach to restoring customers, and to-date, this process has been semi-automated. We are beginning to shift towards a more automated process to restore sites. That said, there are still a number of steps required before we hand a site to customers for review and acceptance. We are restoring affected customers identified by a mix of multiple variables including site size, complexity, edition, tenure, and several other factors in groups of up to 60 at a time. The full restoration process involves our engineering teams, our customer support teams, and our customer, and has three steps:
1. Technical restoration involving meta-data recovery, data restores across a number of services, and ensuring the data across the different systems is working correctly for product and ecosystem apps
2. Verification of site functionality to ensure the technical restoration has worked as expected
3. Lastly, working directly with the affected customer to enable them to verify their data and functionality before enabling for their users
We have also contacted all customers who are *up next* for step 3 in the site restoration process described above. These customers are aware that they are next in queue through their support ticket and/or via a support engineer.
We have proactively reached out to technical contacts and system admins at all impacted customers, and opened support tickets for each of them. However, we learned that some customers have not yet heard from us or engaged with our support team. If you are experiencing an outage and do not have access to your open ticket, please contact us through our (choose the Billing, Payments, & Pricing options from the drop down menu): https://support.atlassian.com/contact/#/
For more information from our engineering team, please read our update from our CTO, Sri Viswanath: https://www.atlassian.com/engineering/april-2022-outage-update

Posted Apr 14, 2022 - 08:53 PDT

Update

The team is moving through the restoration process this week and is accelerating toward recovery. Functionality for 40% of impacted users has been restored.

Posted Apr 12, 2022 - 06:48 PDT

Update

A small number of Atlassian customers continue to experience service outages and are unable to access their sites. Our global engineering teams are working 24/7 to make progress on this incident. At this time, we have rebuilt functionality for over 35% of the users who are impacted by the service outage, with no reported data loss. The rebuild stage is particularly complex due to several steps that are required to validate sites and verify data. These steps require extra time, but are critical to ensuring the integrity of rebuilt sites. We apologize for the length and severity of this incident and have taken steps to avoid a recurrence in the future.

Posted Apr 11, 2022 - 09:04 PDT

Update

Posted Apr 11, 2022 - 05:11 PDT

Update

Posted Apr 11, 2022 - 02:01 PDT

Update

A dedicated team continue to work 24/7 to expedite service recovery. Restoration of all customers remains our top priority. We hear and appreciate all the feedback from our valued customers and are taking every necessary step to both restore full service and ensure site integrity as soon as possible.

Posted Apr 10, 2022 - 17:50 PDT

Update

We are still working 24/7 to restore service to affected customers. We have restored partial access for some customers and will be continuing to restore access into next week.

Posted Apr 10, 2022 - 12:45 PDT

Update

We continue to work 24/7 to restore service to affected customers. We have restored partial access for some customers and will be continuing to restore access into next week.

Posted Apr 10, 2022 - 06:14 PDT

Update

Our teams are committed to restoring each customer’s service as soon as possible and are working through the weekend toward recovery.

Posted Apr 10, 2022 - 02:35 PDT

Update

Our teams are committed to restoring each customer’s service as soon as possible and are working through the weekend toward recovery.

Posted Apr 09, 2022 - 21:05 PDT

Update

The restoration process is underway. At this time we have no new significant updates, but the team continues to work around the clock to bring our customers back online.

Posted Apr 09, 2022 - 15:23 PDT

Update

The restoration process is underway. At this time we have no new significant updates, but the team continues to work around the clock to bring our customers back online.

Posted Apr 09, 2022 - 11:25 PDT

Update

Our team is working 24/7 to progress through site restoration work. Core functionality has been restored across a number of sites. We are continuously improving the process with the aim of accelerating the restoration process from here.

Posted Apr 09, 2022 - 07:33 PDT

Update

Posted Apr 09, 2022 - 03:52 PDT

Update

The team is continuing the restoration process through the weekend and working toward recovery. We are continuously improving the process based on customer feedback and applying those learnings as we bring more customers online.

Posted Apr 08, 2022 - 18:36 PDT

Update

Restoration work to restore sites is underway and will continue into the weekend. We are taking a controlled and hands-on approach as we gather feedback from customers to ensure the integrity of these site restorations.

Posted Apr 08, 2022 - 13:28 PDT

Update

Posted Apr 08, 2022 - 10:11 PDT

Update

We have started successfully restoring sites and continue to work on restoration to a wider cohort of customers. We are taking a controlled and hands-on approach as we gather feedback from customers to ensure the integrity of these site restorations.

Posted Apr 08, 2022 - 08:30 PDT

Update

Posted Apr 08, 2022 - 03:59 PDT

Update

Posted Apr 08, 2022 - 00:53 PDT

Update

We continue to work on partial restoration to a cohort of customers. The plan is to take a controlled and hands-on approach as we gather feedback from customers to ensure the integrity of this first round of restorations remains the same from our last update

Posted Apr 07, 2022 - 18:27 PDT

Update

We continue to work on partial restoration to the first cohort of customers. The plan to take a controlled and hands-on approach as we gather feedback from customers to ensure the integrity of this first round of restorations remains the same from our last update.

Posted Apr 07, 2022 - 14:51 PDT

Update

We are beginning partial restoration to a cohort of customers. The early stages of this process will be controlled and hands-on, as we work with customers live to get feedback and ensure that restoration is working correctly before we accelerate the process for the next cohort. We will continue to post updates here as we move the process along.

Posted Apr 07, 2022 - 11:43 PDT

Update

We are continuing work in the verification stage on a subset of instances. Once reenabled, support will update accounts via opened incident tickets. Restoration of customer sites remains our first priority and we are coordinating with teams globally to ensure that work continues 24/7 until all instances are restored.

Posted Apr 07, 2022 - 05:26 PDT

Update

Posted Apr 07, 2022 - 02:34 PDT

Update

Posted Apr 06, 2022 - 21:19 PDT

Update

We are continuing to work on the resolution of the incidents for some Statuspage, Jira Work Management, Jira Service Management, Confluence, Jira Software, Atlassian Access, Jira Product Discovery, and Opsgenie Cloud customers.

Posted Apr 06, 2022 - 14:26 PDT

Update

We have partially reactivated the Statuspages of affected customers. The hosted pages should be up, and the API capabilities have been restored so affected customers can use this to manage their pages while work is done to restore access to the manage portal. We have defined two processes to resolution of the issues impacting some customers. These processes each involve multiple stages of work. We are currently working on one of the processes and we will provide more detail as we progress through resolution.

Posted Apr 05, 2022 - 14:11 PDT

Identified

The issue has been identified and a fix is being implemented.

Posted Apr 05, 2022 - 06:45 PDT

This incident affected: Hosted Pages (HTTP Pages, HTTPS Pages) and Management (Web Portal, Authenticated API).