Google has explained the reasons for the massive outage in its services that took place last week. Recall that on December 14 of this year, users around the world for 47 minutes could not access Gmail, YouTube and other Google services.
As the company explained, the cause of the failure was an error in the automated quota management system that powers the Google User ID service. The service maintains unique identifiers for each account and handles authentication credentials for OAuth tokens and cookies. This data is stored in a distributed database that uses Paxos protocols to coordinate updates. For security reasons, the service is programmed to reject requests if stale data is found.
One of the automated tools used to manage the quotas of various resources allocated to services contained a bug that caused a problem with the authentication results and caused the services to fail.
“As part of the migration of the User ID service to the new quota system, a change was made in October to register a User ID with the new quota system, but parts of the previous quota system remained in place, which caused the use of the User ID service to be reported as 0. The grace period for applying quota limits delayed the impact, but eventually expired, which triggered automatic quota systems to reduce the quota for User IDs and triggered this incident, ”Google said in a notice.