Increased HTTP 5xx error rate
Incident Report for OneSignal
This incident has been resolved.
Posted May 26, 2020 - 11:43 PDT
The backlog is reducing steadily and we expect to be clear in 10 minutes.
Posted May 26, 2020 - 11:28 PDT
We expect delivery job backlog to be resolved in ~35 minutes. Player updates and delete will also be slow during this interval.
Posted May 26, 2020 - 10:53 PDT
A fix has been implemented and we are monitoring the results.
Posted May 26, 2020 - 10:45 PDT
Deliveries are significantly delayed. We are processing the backlog of delivery tasks now.
Posted May 26, 2020 - 10:25 PDT
We have identified the connection pooling issue and fixed it. Rollout will take 5 minutes
Posted May 26, 2020 - 10:23 PDT
We are still having problems restoring connectivity to the databases for customers with apps starting with 20 through 3f and 80 through 8f.
Posted May 26, 2020 - 10:05 PDT
We are experiencing connection pool related problems with processing delivery jobs. The dashboard and API are operational
Posted May 26, 2020 - 09:37 PDT
We are continuing to work on a fix for this issue.
Posted May 26, 2020 - 09:26 PDT
We have a problem with the configuration of the database replica server promoted serving customers with app ids starting with 20 through 3f. We have identified the configuration problem, fixed it, and are restarting the webservers to refresh connection pools.
Posted May 26, 2020 - 09:25 PDT
We have promoted the database shards and restarted the site.
Posted May 26, 2020 - 09:13 PDT
We are promoting a replica for some database shards to which requires a rolling restart of all webservers. All apps may see some 500 errors for the dashboard while the restart proceeds.
Posted May 26, 2020 - 08:41 PDT
We are currently investigating this issue.
Posted May 26, 2020 - 07:55 PDT
This incident affected: Dashboard and API, Offline Job Processing, Delivery, and Analytics & Update Processing.