Facebook disappeared – A bolt from the blue

Share on facebook
Share on twitter
Share on linkedin
Read the blog to learn why Facebook disappeared in October 2021. On October 5th, Facebook faced massive disruptions due to a technical glitch.
Top Data Pipeline Tools in 2021

Top 5 databases to consider for 2021

For a short moment, everyone pondered, “Can Facebook also disappear? Well, yes! In today’s digital age, everyone is getting addicted to social networking sites like Facebook and Instagram. The moment Facebook stopped working, life came to a halt. What is this “Facebook disappearing” story? Let’s check out the Facebook disappearing story here in this blog.
On October 5th, Facebook faced massive disruptions due to a technical glitch. It caused an inordinate downtime which began at 11:40 a.m. ET, taking down the app and all Facebook services, leaving millions of users on the hunt for an ‘interaction dopamine rush’ elsewhere. The sites finally went back online only around 6 p.m.the same day. The outage lasted a few hours, but Facebook had lost $50 billion in the first 2 hours itself.

There was an error named "Facebook DNS lookup returning SERVFAIL" at 15:51 UTC on October 5th, and everyone was concerned that our DNS resolver 1.1.1.1 was malfunctioning.
Albeit, as Facebook officials prepared to update their official page, they became aware of something more alarming.

This issue led many social media users to turn to Twitter to air their grievances. Social networking instantly erupted in a blaze, confirming what their engineers had discovered. Facebook, as well as its subsidiary services WhatsApp and Instagram, were all down! Their DNS domains were no longer resolving, and their infrastructure IP addresses were not accessible.
It was as if somebody had suddenly “torn the wires” from their data centers, cutting them off from the Web. It wasn’t a DNS problem in and of itself, but it was the first sign of a broader Facebook shutdown. It all happened the first time that Facebook had been unavailable for more than six hours.
So, what went wrong?
Internally, Facebook posts details on what happened. They witnessed the BGP and DNS issues outside, but the problem started with a configuration change that impacted the internal core. It resulted in the disappearance of Facebook and other sites and internal Facebook workers having problems restoring the connectivity.
BGP
BGP, or Border Gateway Protocol, is a system that allows autonomous systems (AS) on the internet to communicate routing information. The enormous routers that keep the internet running contain massive, continuously updating lists of possible routes for delivering network packets to their eventual destinations. The Internet gateways would be unable to function without BGP, and the internet would cease to exist.
According to Facebook’s engineering replacements, the issue was caused by “configuration modifications on the underlying routers that manage network traffic across data centers,”. Thus, there was a “ripple effect on the way Facebook’s data centers communicated, halting its services.”
Given the length of the outage, the answer is likely to be “not quickly.” Facebook wanted to make sure it was pushing the appropriate material and that the internet had widely picked it up. In other words, They tried to make sure their maps were correct and everyone could see them.
They discovered that Facebook had ceased broadcasting routes to their DNS prefixes at 15:58 UTC. It indicated that, at the very least, Facebook’s DNS servers were down. As a result, Cloudflare’s 1.1.1.1 DNS resolver could no longer address facebook.com’s IP address concerns. Facebook keeps a record of all BGP notifications and changes in the worldwide network. The data obtained gives them a picture of how the internet is linked and where traffic is moving.
Facebook’s DNS servers fell, and Cloudflare engineers were in a conference a minute later, puzzled as to why 1.1.1.1 couldn’t resolve facebook.com and worried there was an issue with their servers.
The impacting DNS
DNS resolvers all across the world have ceased to handle domain names. It occurs because DNS, like other Internet systems, has its routing algorithm. Whenever anyone enters https://facebook.com into their window, the DNS resolver, which is in charge of converting names into IP Addresses, checks whether it has anything in its cache and executes it.
If that doesn’t work, it attempts to get the information from the domain servers hosted by the organization that controls the domain. A SERVFAIL returns if the nameservers are unavailable or do not reply for any other issues resulting in the browser displaying a warning to the user
The DNS resolvers could not link to nameservers because Facebook stopped broadcasting their DNS prefix routes via BGP. As a result, prominent public DNS resolvers such as 1.1.1.1, 8.8.8.8, and others began issuing (and caching) SERVFAIL answers. Human behavior and application logic now take over, causing a second enormous impact. The result is a flood of extra DNS traffic.
It occurred in parts since apps could not catch an error as an answer and began retrying abruptly. Another reason is end-users will not accept an error as a reply and will start reloading sites or closing and restarting their apps, often forcefully.
Since Facebook and other social networking sites are so large, DNS resolvers worldwide are suddenly processing maximum times as many queries causing potential lag and expiration difficulties to different platforms. However, 1.1.1.1 was designed Free, Confidential, Quick, and Scalable; it served users with minimal disruption.
People started looking for alternatives and wanted to learn more about what was happening. Users saw an uptick in DNS queries to Twitter, Signal, and other texting and social media services when Facebook went down.
The unexpected events are a quick reminder that the internet is a vast and interconnected system with billions of algorithms and devices. It works for approximately five billion active users worldwide because of trust, standardization, and collaboration among entities.
Bottom Line

After a hustling of hours, at 21:00 UTC, Facebook’s connection experienced renewed BGP movement that spiked at 21:17 UTC. The service of the DNS server ‘facebook.com’ on Cloudflare’s DNS resolver 1.1.1.1 ceased at 15:50 UTC and then resumed at 21:20 UTC. Ultimately, Facebook returned to the global internet, and DNS started working.

Recent Posts