Facebook explained the Massive Failure of its Services
As a reminder, Facebook, Instagram, WhatsApp, Messenger and Oculus VR went offline yesterday and remained unavailable for six hours. It was rumoured that a failed Border Gateway Protocol (BGP) update was the cause of the service outage, and an official statement from the company confirms this.
Santosh Janardhan, Facebook's VP of Engineering and Infrastructure, apologized to users for the "inconvenience" and explained that changing the router configuration settings caused the connection between Facebook's data centres to fail.
"Our engineering teams have learned that changes to the configuration settings of the backbone routers that coordinate network traffic between our data centres have resulted in communication disruptions. This network traffic failure has caused a cascading effect in our data centres and the shutdown of our services," said Janardhan.
This explanation confirms information previously provided by Cloudflare, which traced the issue back to a BGP outage that affected traffic routing. At the time, some speculated that a common DNS configuration error was the cause of the failure, but this theory was soon dismissed as DNS services were functioning but not responding.
"Our services are now back online and we’re actively working to fully return them to regular operations. We want to make clear at this time we believe the root cause of this outage was a faulty configuration change. We also have no evidence that user data was compromised as a result of this downtime." - Janardhan added.
The outage also affected Facebook's internal tools, Janardhan said, making it difficult to diagnose and fix the problem. Facebook has dispelled rumours of a hacker attack and stressed that the crash was caused by a problematic change to configuration settings and user data was not affected.