Google Docs Outage Explained by Google

Earlier this week on Wednesday the Google Docs service went offline. The outage lasted about 30 minutes and caused a number of people not to be able to access documents stored online.

Google has now given a reason for the outage which essentially revolves around an update which caused a memory bug which then caused each server to not release RAM moving a request on to another server which then filled up and then another etc…

The bug came from a change which Google implemented to give better collaboration to users but unfortunately it was uploaded and caused the memory issues.

Every time a Google Doc is modified, a machine looks up the servers that need to be updated. Due to the memory management bug, the lookup machines didn’t recycle their memory properly after each lookup, causing them to eventually run out of memory and restart. While they restarted, their load was picked up by the remaining lookup machines – making them run out of memory even faster. This meant that eventually the servers couldn’t properly process a large fraction of the requests to access document lists, documents, drawings, and scripts which led to the outage you saw on Wednesday.

The total outage was 29 minutes which included 24 minutes of rolling back the update and 5 minutes for things to settle down.

You may recall that Microsoft also had an issue this week with its Office 365, Skydrive and Hotmail services all failing for several hours.

For most people these kind of outages don’t generally cause a problem but for business and power users it can be extremely frustrating not having access when you need it. Most people use the free Google Docs service but there’s also the Google Apps version which is paid for by users.

Full details can be found at Google.

