Residual Errors From Incident
Incident Report for Eleos Technologies
Postmortem

From 2022-11-18 22:12 UTC until 2022-11-18 22:42 UTC (30 minutes total), the Eleos Mobile Platform experienced a partial outage caused by a sudden increase in latency and error rates when performing writes to one of our primary data stores. During the incident period, some drivers experienced intermittent failures when using the Eleos mobile app, similar to the behavior seen when the app is offline. Similarly, some Platform Dashboard users experienced slowness and failures when attempting to view or change app settings and content. API clients, such as integrations attempting to send outbound messages to drivers, would have experienced higher-than-normal error rates. Because of the nature of the underlying issue and our high-availability architecture, not all users would have experienced or noticed errors during the 30 minute period.

Although our monitoring immediately detected the issue and the on-call engineer responded quickly, it took 27 minutes before the first customer-facing update to the status page occurred. This delay undermines the value of the status page, and we’re revising our incident handling procedures accordingly to better emphasize earlier communication.

This initial incident resulted in an additional data consistency issue affecting a small number of users, which persisted over the weekend until a server fix was deployed at 2022-11-21 17:44 UTC.

Drivers affected by this additional data consistency issue were unable to receive updated app data after they modified (e.g., viewed or deleted) a subset of messages that were sent during the incident on the 18th. The server fix resolved this error without additional driver or customer action.

A more detailed narrative and root cause analysis is available from your account executive upon request.

Posted Dec 06, 2022 - 14:39 UTC

Resolved
We’ve deployed a fix for this issue to production and confirmed that affected users are now seeing successful syncs. The fix resolves the issue server-side. Drivers do not need to log out and back in to see resolution.
Posted Nov 21, 2022 - 18:01 UTC
Identified
We are aware of an issue affecting a small number of Eleos Platform mobile app users. This issue prevents the app from synchronizing new or updated data, such as messages, from the device to our servers. Logging out and back will briefly work around this issue, but the issue will manifest again shortly.

We have identified the cause of this issue, and we are working to put a fix in place in production as soon as possible. We expect to have this fix deployed in the next 30 minutes and will follow up at that time.
Posted Nov 21, 2022 - 16:51 UTC
This incident affected: Eleos Platform (Mobile Apps).