XaiJu
beneater
beneater

patreon


Why was Facebook down for five hours?

Hey everyone,

I'm still working on some tweaks for the hardware timer video, so I still haven't published that on YouTube yet. But last week, as you may know, Facebook's entire service was offline for about five hours.

The explanations that came out talk about DNS and BGP as somehow related to the outage, but I found a lot of people don't really know how these protocols work. So I thought it might be interesting to make a video about it with a lot more detail. Well here it is.

Since this is somewhat of a current event, I'm looking to get the video out soon, so I don't expect making too many big changes. But as always, I appreciate any feedback before I publish it.

Thanks again for your support, and let me know what you think about networking topics. The short-term plan is to return to hardware videos—I2C is likely next. But there's definitely a lot of networking stuff I could cover if there's interest.

Thanks,

-Ben

Why was Facebook down for five hours?

Comments

I really miss your content, hope you are not done with it

tremors08

This guy fell of the grid

Eric jacobs

Are you coming back? Come back!!

Evan Thayer

I'm interested in knowing what router simulator you are using. I'm slowly losing patience with GNS3 :(

David Edelman

My point was rather minor. I don't think anything you said about the root servers was technically wrong, but the implication that there are 13 servers was there. You only mention assigning multiple servers to a single IP later on. As for the end nodes: it's not of particular importance for the video. It sounds like you were not aware of that fact. At least for the Akamai network (which, it appears, facebook is no longer using), the routing to "close router" is done at the DNS level. Once you resolve the cname, you're down to plain ol' IP routing. That's also the reason for the very short TTL, so the network can re-route you if a region (the Akamai name for a hosted server cabinet) goes down.

Shachar Shemesh

i actually thought it would be greater than 50%. the fact that it is less than 40% is mind blowing. after i commented dropped my car off to be inspected and walking home and i was thinking, how many 'devices' does the US Government that were designed years if not decades ago that were not designed for IPv6? and they be somewhere and 'just work' and over time they have mostly been forgotten about. I do think a video on the evolution of IPv6 and how it could be fully implemented would be interesting . and how NAT (Network address translation) had on alleviating the need for IPv6.

Michael McDonnell

IPv6 has come a long way in the last 10 years: https://www.google.com/intl/en/ipv6/statistics.html It's funny to think of the reaction if I had done this entire video using only IPv6. It would have been completely doable and virtually identical except for the addresses.

Ben Eater

Yeah, it looks like the route was still there. I almost mentioned it in the video, but it was one of many side tangents I ended up cutting since I would have been speculating about what would happen. My guess though is that you're right. From everything we've heard, Facebook's issues at the time were bigger and withdrawing routes to the DNS servers was just a single symptom.

Ben Eater

Aren't we saying the same thing? There are 13 root server anycast addresses. That's what I was trying to say, anyway.

Ben Eater

Wow, getting a video on this topic was a really pleasant surprise! I guess I should go check out your older videos :) One thing I'm really wondering about and was so far unable to get a plausible answer on: what would happen if someone put the actual IP for "www.facebook.com" into their /etc/hosts during that outage? As far as I can tell from your table, that IP was still routable, right? I guess most likely you'd get an HTTP 500 or something similar because some backend component would be unable to resolve IPs for some of it's dependencies, but it would be fun to try.

George Miroshnykov

I'll just mention that there are 14 root *addresses*, but more servers than that. And when you say that the same organizations manage the COM servers: there is some overlap, but the list is not identical. "Each of those addresses is probably served by different servers in different locations" (7:05) Actually, no. The protocol to do that is called "Anycast", is BGP based, and is only used for UDP protocols. In particular, DNS. The geographical load balancing was done by directing you to the c10r subdomain to begin with. That's why you get very short TTL on the node IP address.

Shachar Shemesh

For me, a few more sentences would be helpful between the transtion from DNS to BGP to prepare the viewer for understanding the great, mysterious Internet. It seems to me from this video that the Internet is basically built out of bridged together networks like the ones we are used to in our homes that connect to our ISP. They use something called BGP to manage routes through the system. My reaction was, "Eek! I don't understand Internet routing!" :)

Ryan Helinski

I like how you mentioned IPv6 and then just moved on back to IPv4, just like almost everyone else has done since 1995. I remember reading about IPv6 in the mid to late 90's I still have the book. I am amazed that it has not fully been implemented. But I assume that it would really be a 'Ralph breaks the internet' type thing. if IPv4 was phased out , legacy software would stop working. considering the internet has been piece mealed together for decades, i would think it would be a major refit to do it. when the US phased out analog TV anyone who got over the air broadcasts with an analog tv had to get a converter, but eventually it stuck. But, that took the US government (FCC) to mandate that. and no one. owns the internet. and great video. I have setup a few local cached DNS over the decades but always just cached it off a another DNS , even if it was an authoritative DNS for a domain name, I would cache it off another server and override the entries for the specific DNS . I never knew (or really looked into) the hierarchy of DNS / NS .

Michael McDonnell

Nice video. It might be fun to dive a bit more into the details, perhaps in the spirit of the paper division algorithm for the 6502 to convert to ASCII.

Rob Nichols

Networking topics are good. Yes more networking content 👌

Eric jacobs

This video was great and helped me understand BGP much better. Thanks a lot!

Thomas Nyberg

I'd watch any of your content, but *by far* my favourite are those that create something (in hardware or software). So network explanations would engage me more if they were layered on hardware you'd built :-) Not easy, but you asked about what interests us. As others say, keep up the amazing work, whatever you decide to do.

David Dawkins

Based on Bens work in the past I'm not that surprised that he managed to get access. Now what I wanna know is did he have it already on hand or did he ask a friend.

Michael Timbrook

Nothing I could add that would be interesting. Would just add drama to an already spicy situation

Michael Timbrook

its not that bonkers, I had casual access to... wilder things XD

ikiris

The most remarkable part of this is that you casually have access to a live ATT router, and nobody is talking about how absolutely bonkers that is. Keep up the amazing work.

Warren Garabrandt

Lol try not to violate any ndas complaining about what was wrong 😂

ikiris

Love deep dives on networking. I think there’s a lot to talk about on how these protocols came to being. As someone who works in infrastructure… at Facebook. I’m looking forward to watching this.

Michael Timbrook


More Creators