Intermittent API errors in US2 cluster
This incident has been resolved.
Home
/
Status
/
Service
Realtime API for building live dashboards, notifications, and chat.
Source
auto
Category
Development
Adapter
STATUSPAGE IO
Verified
Pending review
Current state
Operational
Checked 18m ago
11
Components
0
Active incidents
0
Maintenance
33.33%
90d uptime
Intermittent API errors in US2 cluster
Apr 6, 1:52 AM
Normalized official status-page data for incidents, maintenance, components, and history.
33.33%
Known uptime
3 known history days
11
Components tracked
0 outage, 0 degraded
50
Incidents indexed
0 active right now
33
Maintenance windows
0 active or scheduled
Components with the most recent status-page events.
Beams
Operational
Beams dashboard
Operational
Channels Dashboard
Operational
Channels Pusher.js CDN
Operational
Channels REST API
Operational
Component changes, incidents, and maintenance windows grouped by day.
operational
degraded
outage
maintenance
unknown
1
operational days
0
degraded days
2
outage days
0
maintenance days
87
unknown days
Latest outages and degradations detected from the official status page.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.
All affected accounts have been restored to their correct subscription levels. We sent an email yesterday to all impacted customers. If you notice any issues with your subscription, please contact us.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.
## **Root Cause Analysis: Elevated API Errors and Outage in AP4 Cluster** **Incident Date:** October 20, 2025 **Status:** Resolved ### **Summary** Between **October 20 and October 21, 2025**, customers using the **AP4 cluster** experienced elevated API errors, latency, and message publishing failures. The issue primarily affected the **Channels API**, preventing customers from publishing new messages and leading to degraded real-time functionality for end-users. During system recovery and while implementing mitigations from a previous incident on Oct 18th, a **misconfigured Redis container** in the AP4 cluster failed to start correctly, preventing caching operations needed for API requests. This misconfiguration went undetected by proactive monitoring, delaying full recovery until October 21 at 17:43 UTC. ### **Impact** Throughout the incident period, customers in the **AP4 cluster** experienced: * **High API error rates** when attempting to publish messages through the Channels API * **Failed or delayed message delivery** for connected clients * **Temporary downtime** for end-customer applications relying on real-time messages Other clusters remained operational, though some minor latency was observed in isolated regions due to dependencies on shared services. ### **Root Cause** This incident resulted from **a chain of events** involving both external and internal factors: 1. \*\*Major AWS Outage \(October 20\)\*\*A large-scale **AWS outage in the US-East region** disrupted multiple dependent systems, impacting several Pusher clusters. 2. **Misconfigured Redis Container \(October 21\)** As systems in the AP4 cluster attempted to scale during recovery, one of the backend **Redis cache containers** failed to start due to a **misconfigured environment variable**. This prevented Redis from initializing properly, resulting in API operations failing or timing out. 3. **Monitoring Gap** Existing monitoring did not capture the **Redis startup failure** because the specific failure mode occurred after initialization checks had passed. This delayed internal detection until API error rates increased and customer impact was observed. 4. **Delayed Customer Communication** Initial updates to customers were delayed while the team triaged the issue and verified the failure pattern, prolonging the time before external notification. ### **Detection and Response** The issue was detected through a combination of **monitoring alerts** showing elevated error rates and **customer reports** of publishing failures. **Timeline of Events:** * **October 20** – AWS outage began, affecting multiple Pusher clusters leading to increased delays and errors. **October 20, evening UTC** – Pusher clusters began recovery as AWS services were restored. * **October 21, 15:06 UTC** – Internal monitoring detected elevated API errors in AP4; engineers began investigation. Incident was unrelated to prior AWS outage. * **October 21, 15:09 UTC** – Root cause identified as a failed Redis caching container. * **October 21, 15:27 UTC** – Restoration of Redis connections underway. * **October 21, 15:29 UTC** – Fix implemented; cluster began gradual recovery. * **October 21, 17:39 UTC** – Full stabilization confirmed across AP4 nodes. * **October 21, 17:43 UTC** – Incident marked resolved after sustained recovery. ### **Resolution** To restore full functionality, the engineering team: * Corrected the **Redis container configuration** preventing startup * Restarted and validated cache services across all AP4 nodes * Confirmed API endpoints were fully operational and message publishing resumed * Monitored latency and error metrics to confirm sustained stability ### **Preventative Actions** To reduce recurrence risk and improve detection and response, Pusher is implementing the following: * **Enhanced Redis Monitoring:** Extending monitoring coverage to detect Redis startup and post-init failures. * **Customer Communication Enhancements:** Improving internal escalation and communication processes to ensure faster external updates. ### **Next Steps and Commitment** We recognize the importance of reliable API performance for our customers. Our teams are conducting a full review of caching dependencies and configuration management across all clusters to prevent similar incidents. We sincerely apologize for the disruption caused by this event and appreciate your patience as we worked through a complex multi-day recovery scenario. Pusher remains committed to transparency, reliability, and continuous improvement in service resilience.
Scheduled and completed maintenance windows are separated from incidents.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
Uptimus tracks the official Pusher status page, normalizes upstream events, and separates incidents from scheduled maintenance.
Official source
https://status.pusher.com
Adapter
STATUSPAGE IO
Alert streams
Incidents, component changes, and maintenance windows.
Public SEO page
Indexable status history for users searching outage information.
Regional reports can be layered on top of official provider status when user signals are available.
Showing 1 to 11 of 11 tracked components.
| Component | Status | Type | Last changed |
|---|---|---|---|
Channels REST API The API your servers use to publish messages and query channel state, described here: https://pusher.com/docs/rest_api | Operational | Component | Not recorded |
Channels WebSocket client API The API which Pusher Channels end-users connect to, generally for subscribing to events | Operational | Component | Not recorded |
Channels Dashboard dashboard.pusher.com: Where you can view statistics and manage your apps. | Operational | Component | Not recorded |
Channels Stats Integrations Integration of Pusher Channels with Datadog and other 3rd party providers | Operational | Component | Not recorded |
Channels presence channels | Operational | Component | Not recorded |
Channels Webhooks | Operational | Component | Not recorded |
Channels Pusher.js CDN The CDN backing js.pusher.com, serving the javascript client library to end users. | Operational | Component | Not recorded |
Beams | Operational | Component | Not recorded |
Beams dashboard | Operational | Component | Not recorded |
Marketing Website The website at www.pusher.com | Operational | Component | Not recorded |
Payment API The API that handle charging, plan upgrade, downgrade and cancellation | Operational | Component | Not recorded |
Follow outages, degraded components, and maintenance updates in your Uptimus workspace with email, push, and webhook alerts.
Official provider components
Incident and maintenance separation
Workspace alerts and webhooks
Related status pages based on category, adapter type, and operational history.
Pusher is currently marked as Operational in Uptimus based on the latest official status page check.
Supported status page providers are checked continuously by our scraper scheduler. The public page is cached briefly for SEO and performance.
No. Uptimus stores incidents and maintenance windows separately when the upstream provider exposes enough detail.
Yes. Create an Uptimus workspace, follow this provider, and choose email, push, or webhook notifications.