Monitoring Systems — False Alarms
This incident has been resolved.
Home
/
Status
/
Service
Managed RabbitMQ servers hosted in the cloud.
Source
auto
Category
Cloud
Adapter
STATUSPAGE IO
Verified
Pending review
Current state
Operational
Checked 12m ago
39
Components
0
Active incidents
1
Maintenance
7.14%
90d uptime
Monitoring Systems — False Alarms
May 30, 10:25 PM
Normalized official status-page data for incidents, maintenance, components, and history.
7.14%
Known uptime
14 known history days
39
Components tracked
0 outage, 0 degraded
50
Incidents indexed
0 active right now
50
Maintenance windows
1 active or scheduled
Components with the most recent status-page events.
AWS
Operational
AWS ec2-af-south-1
Operational
AWS ec2-ap-east-1
Operational
AWS ec2-ap-northeast-1
Operational
AWS ec2-ap-northeast-2
Operational
Component changes, incidents, and maintenance windows grouped by day.
operational
degraded
outage
maintenance
unknown
1
operational days
0
degraded days
12
outage days
1
maintenance days
76
unknown days
Latest outages and degradations detected from the official status page.
This incident has been resolved.
Resolved — Microsoft has confirmed that the Azure West Europe Virtual Machines incident is resolved. The impact window was 14:09–14:13 UTC on 23 May 2026, during which a limited number of customers may have experienced connection failures or unexpected Virtual Machine restarts. The Azure environment self-healed and the service has been confirmed restored. Affected CloudAMQP instances should now be operating normally.
# Legacy host metrics integrations degraded Incident window: 2026-05-21 18:37 UTC – 2026-05-22 04:05 UTC \(9h 28m\) Affected service: Legacy metric integrations \(CloudWatch, Datadog, Librato, New Relic, Splunk, Stackdriver\). Host metrics affected, not broker metrics, nor was internal monitoring or console graphs. ## Summary For ~9.5 hours, the service that collects host-level metrics \(CPU, memory, disk, network\) for legacy third-party integrations entered a crash loop and stopped shipping data. ## Root Cause An internal credential-signing service was migrated to a new container runtime the afternoon before. Due to a bug with environment variables the collector could not renew it's credentials. The collector's existing credential remained valid for several hours, so the failure only surfaced when it tried to renew. ## Resolution On-call staff detected the failure early morning, developers helped restore the service and collector recovered at 04:05 UTC. ## Prevention * All services, both signing and collector report failed authentication attempts earlier and with higher severity * Metrics pipeline alarm thresholds was tightened so a similar drop in data triggers alarms faster.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.
We had an abnormal increase in traffic causing timeouts and slow responses for customer.cloudamqp.com, we increased capacity and are back to normal. This did not affect any running clusters, just provisioning and configuration updates.
This incident has been resolved.
This incident has been resolved.
Broker metrics and metrics integrations degradation Window: 2026-04-23 03:25 – 05:35 UTC \(2h 10m\) ## Impact Broker metrics \(connections, channels, queues, consumers, message rates, node and netsplit state\) were degraded, progressively dropping until recovery at 05:35 UTC. Downstream effects: * Queue, Consumer, Connection, Channel, Connection flow, and Netsplit alarms: threshold breaches during the window may not have triggered, or triggered late. Alarms already tripped before 03:25 UTC kept their state. * Metrics Integrations \(Datadog, CloudWatch, New Relic, Splunk, Dynatrace, etc.\): delivery of broker metrics was reduced for the duration of the window. Alarms configured on the receiving side may likewise have missed or lagged. Not affected: server metrics \(CPU, memory, disk\) and their corresponding alarms, Notice alarms, broker availability, and the metric graphs shown in the CloudAMQP console \(these use a separate data path and continued to update normally\). ## Timeline \(UTC\) * 03:25 — broker-metrics collection begins degrading * 03:35 — ~30% of broker-metric samples missing * 03:50 — ~50% plateau * 05:25 — collector workers cycled and re-initialised * 05:35 — full throughput restored ## Root cause Broker metrics are polled from each cluster's management HTTP API by a pool of collector workers and published to an internal message bus that both the Alarms service and Metrics Integrations consume from. At 03:25 UTC this service became unresponsive without raising an error. Affected workers silently stopped polling while remaining alive to the platform, so automatic restart did not trigger. On-call staff began investigating around 05:00 UTC; force-restarting the service restored metric flow. The exact trigger has not been identified. Our focus is on ensuring the condition is detected and handled promptly if it recurs. ## What we are changing * Added metrics and alarms specific to this service, with high-urgency paging for on-call. * Reliability improvements in the service itself to detect and restart stalled work. ## What you can do CloudAMQP offers a new generation of metrics integrations based on Prometheus, with a re-engineered pipeline that does not rely on these centralised services — each server forwards data directly to your endpoint. These have been running in production for some time and have proved very reliable. * [https://www.cloudamqp.com/blog/prometheus-metrics-integrations.html](https://www.cloudamqp.com/blog/prometheus-metrics-integrations.html) * [https://www.cloudamqp.com/docs/monitoring\_metrics\_datadog\_v3.html](https://www.cloudamqp.com/docs/monitoring_metrics_datadog_v3.html) * Background: [https://www.cloudamqp.com/blog/decentralized-observability-with-open-telemetry-part-1.html](https://www.cloudamqp.com/blog/decentralized-observability-with-open-telemetry-part-1.html)
Scheduled and completed maintenance windows are separated from incidents.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
Maintenance will begin as scheduled in 60 minutes.
The scheduled maintenance has been completed.
update has completed
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
The scheduled maintenance has been completed.
All clusters are now using "logstream2" as the active queue+shovel to send logs, "logstream" has been removed. We have monitored the new system and everything works as expected.
Uptimus tracks the official Cloudamqp status page, normalizes upstream events, and separates incidents from scheduled maintenance.
Official source
https://status.cloudamqp.com
Adapter
STATUSPAGE IO
Alert streams
Incidents, component changes, and maintenance windows.
Public SEO page
Indexable status history for users searching outage information.
Regional reports can be layered on top of official provider status when user signals are available.
Showing 1 to 25 of 39 tracked components.
| Component | Status | Type | Last changed |
|---|---|---|---|
AWS | Operational | Group | Not recorded |
DigitalOcean | Operational | Group | Not recorded |
AWS ec2-ap-northeast-1 | Operational | Component | Not recorded |
DigitalOcean AMS2 | Operational | Component | Not recorded |
Shared servers | Operational | Component | Not recorded |
AWS ec2-us-west-2 | Operational | Component | Not recorded |
Dedicated servers Issues affecting multiple customers with the plan Big Bunny or larger | Operational | Component | Not recorded |
AWS ec2-us-west-1 | Operational | Component | Not recorded |
Backend Account provisioning, server monitoring etc. | Operational | Component | Not recorded |
Metrics | Operational | Component | Not recorded |
AWS ec2-us-east-2 | Operational | Component | Not recorded |
Heroku Frontend application and API servers | Operational | Component | Not recorded |
AWS ec2-us-east-1 | Operational | Component | Not recorded |
DNS AWS route53 | Operational | Component | Not recorded |
AWS ec2-sa-east-1 | Operational | Component | Not recorded |
Help Scout Email Processing | Operational | Component | Not recorded |
AWS ec2-eu-west-3 | Operational | Component | Not recorded |
AWS ec2-eu-west-2 | Operational | Component | Not recorded |
Google Cloud Platform Google Compute Engine | Operational | Component | Not recorded |
AWS ec2-eu-west-1 | Operational | Component | Not recorded |
DigitalOcean AMS3 | Operational | Component | Not recorded |
AWS ec2-eu-central-1 | Operational | Component | Not recorded |
DigitalOcean FRA1 | Operational | Component | Not recorded |
AWS ec2-ca-central-1 | Operational | Component | Not recorded |
DigitalOcean LON1 | Operational | Component | Not recorded |
Follow outages, degraded components, and maintenance updates in your Uptimus workspace with email, push, and webhook alerts.
Official provider components
Incident and maintenance separation
Workspace alerts and webhooks
Related status pages based on category, adapter type, and operational history.
Cloudamqp is currently marked as Operational in Uptimus based on the latest official status page check.
Supported status page providers are checked continuously by our scraper scheduler. The public page is cached briefly for SEO and performance.
No. Uptimus stores incidents and maintenance windows separately when the upstream provider exposes enough detail.
Yes. Create an Uptimus workspace, follow this provider, and choose email, push, or webhook notifications.