By now you have probably heard about an S3 outage...Service Disruption...that happened on Feb 28th.
During the outage the AWS Service Health Dashboard was not operational. Although the page was physically up, AWS was unable to update the service health to reflect the outage caused by S3. The AWS Service Health Dashboard depends on S3 to operate properly and hence during the S3 outage the dashboard was not working properly. AWS has recently resolved this operational dependency.
The Blumetech AWS Managed Service uses the AWS Service Health Dashboard to report on outages for customers. Our service automatically "scrapes" the webpage, looks for outages on AWS, and then e-mails customers about the failure. This has become a very useful service for customers and quickly notifies them of service disruption. However, during the S3 event, since this dashboard was not working, our service was not able to e-mail customers about the problem.
Last week we resolved this issue. Our managed service now uses API calls to AWS to confirm service status. This process is much less brittle compared to the old screen "scraping" process. We are now confident that future, unplanned outages will be accurately detected with customers being properly and quickly notified.
Comments
You can follow this conversation by subscribing to the comment feed for this post.