mike | Shared With: Everyone - Jan 30 2009 | bookmarks, social, data center, reliability
Extensive outage at ma.gnolia.com.
Quoted: Dear Ma.gnolia Community Members or Visitor,
...
Early on the West-coast morning of Friday, January 31st, Ma.gnolia experienced every web service's worst nightmare: data corruption and loss. For Ma.gnolia, this means that the service is offline and members' bookmarks are unavailable, both through the website itself and the API. As I evaluate recovery options, I can't provide a certain timeline or prognosis as to to when or to what degree Ma.gnolia or your bookmarks will return; only that this process will take days, not hours.
...
I will of course keep you appraised here and in our Twitter account.
...
Most importantly, I apologize to all of you who have made Ma.gnolia a home for your bookmarks and community. I know that many of you rely on Ma.gnolia in your day to day work and play flow to safely host you bookmarks, keeping them available around the clock, and that this is a difficult disruption.
...
Sincerely,
Larry
mike | Shared With: Everyone - Aug 11 2008 | Gmail, reliability
Ouch. Gmail down. "Don't worry, be happy".
Quoted: We’re sorry, but your Gmail account is currently experiencing errors. You won’t be able to use your account while these errors last, but don’t worry, your account data and messages are safe. Our engineers are working to resolve this issue.
...
Please try accessing your account again in a few minutes.
mike | Shared With: Everyone - Jul 08 2008 | google, docs, saas, reliability
I was also going to comment on the irony of a Google reputation survey, the same morning that Google Docs has been down for about 1 hour! People are twittering about it at the rate of about 2 tweets per minute:
http://summize.com/search?q=google+docs+down
But now back up after being down for 1 hour.
mike | Shared With: Everyone - Jul 08 2008 | google, reliability, docs, saasGoogle docs is down! This really sucks. I was happily editing in offline mode (using Gears), but didn't notice that the service was down until I tried to create a NEW document (which you can only do online!).
I turned off online access, and POOF, I can't see anything but the service unavailable page. AND, there is no way to GO BACK to editing offline again.
I just hope when it comes back, that I haven't lost my offline updates to docs I was working on.
mike | Shared With: Everyone - Feb 16 2008 | amazon, s3, reliability, web development
Amazon gave this explanation of Friday's 2 hour outage of S3 on this forum post. While their overall response was fairly quick, their communication was lacking. Note that the AWS blog STILL has no post about this event - forcing developers to scan the forums looking for a thread about a site-wide service outage!
Quoted: Here’s some additional detail about the problem we experienced earlier today.
...
Early this morning, at 3:30am PST, we started seeing elevated levels of authenticated requests from multiple users in one of our locations. While we carefully monitor our overall request volumes and these remained within normal ranges, we had not been monitoring the proportion of authenticated requests. Importantly, these cryptographic requests consume more resources per call than other request types.
...
Shortly before 4:00am PST, we began to see several other users significantly increase their volume of authenticated calls. The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place. In addition to processing authenticated requests, the authentication service also performs account validation on every request Amazon S3 handles. This caused Amazon S3 to be unable to process any requests in that location, beginning at 4:31am PST. By 6:48am PST, we had moved enough capacity online to resolve the issue.
...
As we said earlier today, though we're proud of our uptime track record over the past two years with this service, any amount of downtime is unacceptable. As part of the post mortem for this event, we have identified a set of short-term actions as well as longer term improvements. We are taking immediate action on the following: (a) improving our monitoring of the proportion of authenticated requests; (b) further increasing our authentication service capacity; and (c) adding additional defensive measures around the authenticated calls. Additionally, we’ve begun work on a service health dashboard, and expect to release that shortly.
...
Sincerely,
The Amazon Web Services Team
mike | Shared With: Everyone - Feb 16 2008 | amazon, s3, web development, storage, reliability
Is S3 a single point of failure for Web 2.0 companies? One of the 3 S-3 data centers went down for 2 hours on Friday morning. Given that people noticed a complete outage - requests seem NOT to have failed over to the other centers.
Amazon seems serious about responding to this - but seems like they have a fundamental system problem.
Quoted: Bits is a blog about technology, innovation and society from The New York Times.
mike | Shared With: Everyone - Feb 12 2008 | blackberry, rim, service, reliability, phone
mike | Shared With: Everyone - Aug 19 2007 | space, news, reliabilityI hadn't heard of the Space Composites accident this summer that killed three people. Here's an article comparing the reliability and response of private vs. government safety procedures.
mike | Shared With: Everyone - Aug 16 2007 | skype, reliability, outage, voip
mike | Shared With: Everyone - Jul 30 2007 | blogs, google, analytics, reliability, uptime
Related Content from Around Faves
reliability
-
We inadvertently provisioned a few database machines with MyISAM instead of InnoDB, and it has been a nightmare. I strongly advice against using MyISAM -- ever.
With MyISAM, we periodically get these incorrect "duplicate key" errors that don't go away until you run the lenghty "repair table" command that somehow fixes everything.
1 FaverViewed: 3 TimesQuoted: Use MyISAM when: The data isn't too critical ( "unreliable and slow, related to table size, table repair process" )
...
Use InnoDB for tables when: The table will be big (100Mb+ - "For reliability and performance, we use InnoDB for almost everything at Wikipedia - we just can't afford the downtime implied by MyISAM use and check table for 400GB of data when we get a crash." ) - mike - Aug 11 20081 FaverViewed: 1 Time
- mike - Apr 14 20073 FaversViewed: 6 Times


