On Feb 21st between 03:00am PST and 09:11am PST, a performance degradation occurred for our notifications pipeline. This led to delays for both email and sms notifications.
The issue was caused by an unexpected influx of jobs being submitted to our notification processing queue. We identified these jobs as having been generated due to a malicious traffic pattern against an endpoint which had not been properly rate limited.
While impact was mitigated by 6am PST, performance of our notifications pipeline was only fully restored at 9:11am PST by a code update that discarded these jobs from our background worker queues.
We have different types of background worker queues processing our various asynchronous workloads. One such queue is dedicated for notifications pipeline. Due to the time-sensitive nature of notification delivery, this queue has a very distinct set of priority settings, that ensure that notification jobs get processed as fast as possible.
After some investigation, we found that the specific class of job getting enqueued was generated by a cleanup job that had been erroneously prioritized. In addition to that, the cleanup jobs getting enqueued would be generated as a result of requests to one of our endpoints that was not subject to any rate limits. That could allow malicious actors to generate these jobs unimpeded.
While malicious traffic is a natural part of running any web service, being able to withstand it without causing service degradation for our customers is crucial. We’re constantly working to improve the resiliency characteristics of Statuspage. For this particular incident, these are some of the steps we’re taking
• Audit our background jobs and ensure that there are no additional jobs which would unduly take priority over our notification jobs
• Add rate limiting around the previously identified endpoint
We apologize for the disruption in service as a result of this incident and thank you for trusting us with your incident communication. If you have any questions relating to this incident, please do not hesitate to contact us at hi@statuspage.io.