mike | Shared With: Everyone - Aug 11 2008 | Gmail, reliability
Ouch. Gmail down. "Don't worry, be happy".
Quoted: We’re sorry, but your Gmail account is currently experiencing errors. You won’t be able to use your account while these errors last, but don’t worry, your account data and messages are safe. Our engineers are working to resolve this issue.
...
Please try accessing your account again in a few minutes.
mike | Shared With: Everyone - Jul 08 2008 | google, docs, saas, reliability
I was also going to comment on the irony of a Google reputation survey, the same morning that Google Docs has been down for about 1 hour! People are twittering about it at the rate of about 2 tweets per minute:
http://summize.com/search?q=google+docs+down
But now back up after being down for 1 hour.
mike | Shared With: Everyone - Jul 08 2008 | google, reliability, docs, saasGoogle docs is down! This really sucks. I was happily editing in offline mode (using Gears), but didn't notice that the service was down until I tried to create a NEW document (which you can only do online!).
I turned off online access, and POOF, I can't see anything but the service unavailable page. AND, there is no way to GO BACK to editing offline again.
I just hope when it comes back, that I haven't lost my offline updates to docs I was working on.
mike | Shared With: Everyone - Feb 16 2008 | amazon, s3, reliability, web development
Amazon gave this explanation of Friday's 2 hour outage of S3 on this forum post. While their overall response was fairly quick, their communication was lacking. Note that the AWS blog STILL has no post about this event - forcing developers to scan the forums looking for a thread about a site-wide service outage!
Quoted: Here’s some additional detail about the problem we experienced earlier today.
...
Early this morning, at 3:30am PST, we started seeing elevated levels of authenticated requests from multiple users in one of our locations. While we carefully monitor our overall request volumes and these remained within normal ranges, we had not been monitoring the proportion of authenticated requests. Importantly, these cryptographic requests consume more resources per call than other request types.
...
Shortly before 4:00am PST, we began to see several other users significantly increase their volume of authenticated calls. The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place. In addition to processing authenticated requests, the authentication service also performs account validation on every request Amazon S3 handles. This caused Amazon S3 to be unable to process any requests in that location, beginning at 4:31am PST. By 6:48am PST, we had moved enough capacity online to resolve the issue.
...
As we said earlier today, though we're proud of our uptime track record over the past two years with this service, any amount of downtime is unacceptable. As part of the post mortem for this event, we have identified a set of short-term actions as well as longer term improvements. We are taking immediate action on the following: (a) improving our monitoring of the proportion of authenticated requests; (b) further increasing our authentication service capacity; and (c) adding additional defensive measures around the authenticated calls. Additionally, we’ve begun work on a service health dashboard, and expect to release that shortly.
...
Sincerely,
The Amazon Web Services Team
mike | Shared With: Everyone - Feb 16 2008 | amazon, s3, web development, storage, reliability
Is S3 a single point of failure for Web 2.0 companies? One of the 3 S-3 data centers went down for 2 hours on Friday morning. Given that people noticed a complete outage - requests seem NOT to have failed over to the other centers.
Amazon seems serious about responding to this - but seems like they have a fundamental system problem.
Quoted: Bits is a blog about technology, innovation and society from The New York Times.
mike | Shared With: Everyone - Feb 12 2008 | blackberry, rim, service, reliability, phone
mike | Shared With: Everyone - Aug 19 2007 | space, news, reliabilityI hadn't heard of the Space Composites accident this summer that killed three people. Here's an article comparing the reliability and response of private vs. government safety procedures.
mike | Shared With: Everyone - Aug 16 2007 | skype, reliability, outage, voip
mike | Shared With: Everyone - Jul 30 2007 | blogs, google, analytics, reliability, uptime
mike | Shared With: Everyone - Jul 30 2007 | web services, reliability, shelfari, books
Related Content from Around Faves
-
1 FaverViewed: 6 TimesQuoted: Web applications that need to access services protected by a user's Google or Google Apps (hosted) account can do so using the Authentication Proxy service.
- .David. - yesterday2 FaversViewed: 1 Time
- Tosh - 12 days ago1 FaverViewed: 10 Times


