Resolved
The team has been monitoring the delivery pods and is no longer seeing any errors or stuck deliveries.
All previously stuck notifications have been successfully retried.
Monitoring
We have identified a small number of notifications that have been impacted and the delivery jobs have failed. The team is working on retrying those deliveries.
The source of the incident is from one of our notifications api instances became unhealthy, causing 27 jobs to remain in a terminating state, and thus they were not able to retry.
The incident status is being updated to monitoring, and any new deliveries going out should succeed.
Investigating
The investigation is still ongoing. So far we have 27 delivery jobs that we know are stuck, but we are still still determining the scope of the delays.
Investigating
We are still investigating the issue. Around 9am this morning, we noticed an increase in lag on email deliveries. That has now transitioned into stuck delivery jobs that are not processing.
The team is investigating a way to get the delivery jobs to continue, but until then expect potential delays and a potential need to resend emails.
We will provide another update in 30 minutes.