Degraded Performance of Phrase Analytics (TMS) and Phrase Data between May 4, 2026 and May 26, 2026
Postmortem
### Introduction
We would like to share details about an incident that affected **Phrase Analytics** and Phrase Data between May 4, 2026 and Jun 1, 2026. During this period, segment-based analytics metrics - including character and word counts \(completed, confirmed and locked\), words processed with CJK source language - displayed as zero for data created or updated within the affected window. This post-mortem outlines what happened, when it was resolved, and the steps we are taking to prevent a recurrence.
### Timeline
May 4, 2026 - A configuration change to our analytics data ingestion pipeline unintentionally caused it to re-process all historical data rather than only new changes. This placed significant additional load on our message streaming infrastructure, where current changes were competing with past changes and serialised at the end of processing queue.
May 7, 2026 - The increased load led to storage pressure within the streaming cluster, causing analytics processing to slow down.
May 14, 2026 - We were continuously fixing capacity and work assignments in the streaming cluster to mitigate increased load and allow lag processing.
May 21, 2026 at 1:30 PM CEST - We identified the root cause why the lag is not being processed in timely manner and took corrective action: the ingestion pipeline was stopped and restarted in the incremental mode, and additional streaming infrastructure capacity was provisioned.
May 22, 2026 at 4:00 AM CEST - New segment data began flowing correctly into the analytics system.
May 22, 2026 - A secondary defect was identified and fixed: the pipeline incorrectly handled a specific sequence of segment deletion events within the replay window, which could cause a small number of records to be missed.
May 26, 2026 at 10:20 AM CEST - Customer-facing impact ended. Analytics was processing new data correctly and all current-period data was visible.
May 27, 2026-Jun 1, 2026 - Historical data from May 4, 2026–May 21, 2026 was backfilled across all processing nodes. When verifying we noticed missing data still for some segments on May 10, 2026, May 14, 2026 and May 15, 2026.
Jun 1, 2026-Jun 9, 2026 - With revised and fixed data pipeline we re-run data to backfill missing data.
Jun 9, 2026 - Validated and confirmed all data were backfilled.
### Root Cause
The incident was caused by a configuration change on May 4, 2026 that accidentally triggered a full re-synchronization of all historical analytics data instead of the intended incremental update. The resulting data volume significantly exceeded the capacity of our message streaming infrastructure. As the system fell behind processing the backlog, analytics statistics stopped being written, causing zeros to appear in customer-facing reports.
### Actions to Prevent Recurrence
1. **Data backfilled and verified** - Historical analytics data for the May 4, 2026–May 21, 2026 window has been restored across all processing nodes and verified against source systems.
2. **Streaming infrastructure scaled** - Capacity of the message streaming cluster has been increased to provide headroom for high-load scenarios and prevent disk pressure from recurring.
3. **Pipeline defects fixed** - A secondary bug affecting a small number of deleted segments in the replay window has been resolved.
4. **Performance improvement underway** - A fix is in progress to dramatically reduce processing time for jobs with unusually large segment volumes, reducing the risk of future processing delays.
5. **Analytics data freshness monitoring** - We are adding internal alerting so that any future delays in the analytics pipeline are detected and acted on before customers are impacted.
May 21, 9:42 AMResolved Jun 10, 1:01 PMMinorAnalyticsAnalytics