XaiJu
3blue1brown
3blue1brown

patreon


New video: Bayes' theorem, and making probability intuitive

What I initially thought would be a simple massaging of an old unpublished video quickly turned into doing a completely new project.  I think it's for the best, too, as there were enough things I didn't quite like about the previous attempt at a Bayes video 

I'm hesitant to make promises, because I've been down the road before, but I have a lot of thoughts for probability content which I'm fairly excited to start executing on.  Maybe instead of specific promises, I'll just say that the probability of some kind of series on probability manifesting is at an all-time high.

----

For those who have been keeping track, you may recall the other main project which was underway is on e and the general exponential function.  That will still happen for sure, at the moment there exists a comically large amount of writing in my files for what will ultimately get pared down to a ~25-minute video.  After spending too much time on it, though, the whole thing started to feel a bit overworked, like dough that's been kneaded too long (is that a thing?).  Any of you who do some writing will recognize the feeling of looking over some words, and completely losing empathy with what it's like to read/hear them for the first time.

I decided what's best here is to put it in a drawer for a bit, then edit it with clean eyes before putting in the time to animate it.  In the meantime, I hope you enjoy probability!

-Grant

New video:  Bayes' theorem, and making probability intuitive

Comments

In 15 minutes this video did what the last 4 months of class has failed to do. It's why I became a patron. Please keep up this amazing work.

The best lesson that I learned from your videos is math is really visual (at least for me). I keep closing my eyes and seeing your bayes square of four areas. Its amazing that my conceptual understanding is a single visual.

Could you please point us to some good reference material which covers various types of statistical distributions in detail? Of course I am looking forward to the upcoming videos on probability.

Hi Grant!

I love your videos, but I'm going to nitpick your visualization of the librarian/farmer problem for a second. The question asks: "Which of the following do you find more likely: Steve is a librarian, or Steve is a farmer?" Your visuals then show a probability of Steve being a librarian vs. a farmer with those two probabilities adding up to 100%. But assuming Steve was randomly selected from the general population (say, in the USA), there's absolutely no way those probabilities will ever add up to 100%! It would be something like a 1% chance of him being a farmer and a 0.0005% percent chance of him being a librarian. The unspoken third option, "He's neither," should have priors of 98-99%. At the end of the video, you addressed the other elephant in the room, the fact that you don't know which population Steve is drawn from. If the researchers know for a fact that he's either a librarian or a farmer, as your visualization depicts, then this becomes extremely important, because you don't know the researchers' methodology for picking Steve. For all we know, they could have recruited 50 librarians and 50 farmers and given us a description of a random one of them. My point is that there's a pretty big difference between, "Here's Steve. Which is the higher probability, that he's a librarian, or that he's a farmer?" vs. "We know for a fact that Steve is either a librarian or a farmer, but we're not going to tell you how we know that. Which one is more likely, given his personality?" The former is closer to what Kahneman and Tversky actually asked in their research, as far as I can tell, but your video depicts the latter. That said: the whole bit about thinking about probability as geometry, and likelihoods as areas, is great, and I'll probably be using that a lot when trying to explain Bayes' theorem to people.

Kronopath

Question, is there some pedagogical reason/theory reversing the order of those 3 questions helps create a deeper understanding of the concept?

And here: https://youtu.be/monh8vliBHA?t=688 "it has to be smaller" should maybe be "it can't be bigger" (since the subset could be the same size as the superset)

Alex Loftus

At this point in time: https://youtu.be/monh8vliBHA?t=613 It might have been useful for you to say something like "and the summed area of both bars is the probability of the evidence".

Alex Loftus

Wonderful! Perhaps a bit repetitive.

Akiva Weinberger

Hi Sean, It's not a bad suggestion, but the convention for many is to just write P(H), so students will see that elsewhere, and there is some meaningful cost in screen real estate. That might sound silly, but as you put together videos it actually comes to matter a lot. I'll keep this in mind, though, for times when it might be unclear in the future.

3blue1brown

Hey, thanks for saying that. Like I said, I'm feeling pretty good about what's coming down the pipeline.

3blue1brown

i forgot to comment here: it's very clear comparing this video with the previous unreleased one exactly what the magic ingredient was that you felt you were missing at the time, and it's equally clear that you have since found it. :)

issa

I'm really excited for the content of this series. Would really love some discussion of the measure theory underneath and things like conjugate priors and why they matter. Really appreciate this series.

Excellent video, I'm excited to see that you'll also be including Bayesian Networks and the generalizations of H and E beyond boolean states. I found "Risk Assessment and Decision Analysis with Bayesian Networks" a fantastically comprehensive treatment on the subject, if you need extra material. I think your video is fine as is, but I have one small suggestion about notation. During my studies, I recall it being clearer when values are binary to explicitly assign values to variables in the probability expression. For example, in John K. Krushke's work "Doing Bayesian Data Analysis," he writes boolean states as such: P(H=true | E=true) This, to me, is more clear as P(H | E) ignores "selection" of value from a set of states. Additionally, it helps to make explicit what NOT(H) means when you show the expanded evidence in the denominator. Your notation is clear for continuous and large state spaces, but for toy examples to audiences new to the subject, it might be helpful. My two cents. Thank you for making this, it's one of my favorite subjects! Great work!

I think it's easier to visualise the formula as P(H|E)*P(E) = P(E|H)*P(H). The left side is probability of H if we know that E holds times probability of E, which is essentially probability that both H and E hold at the same time, but so is the right side

Anton Novikov

Grant, great video! This a nit, but at 11:24 the subset doesn't have to be smaller than the set. It can't be larger of course, but without more information it doesn't seem that equality is ruled out. In case you're looking for suggestions as you go deeper into statistics, I use Bayesian inference for work and there's one aspect I've never been able understand intuitively. In the testing we do, I have to make use of the beta distribution as the conjugate prior to the binomial distribution. I get how this works mathematically, but the concept of a conjugate prior distribution has never really clicked for me.

Actually, Steve is a data scientist, but he has done a bit of farming and has his own library.

Steve

Are you going to be mentioning odd ratios like P(A)/P(B) (and the wonderfully symmetric form of Bayes in their case) in the final video? I feel that, especially in the farmer/librarian example, they might be really useful, since theoretically, the two cases of "farmer" and "librarian" don't fill up the entire probability space (or are disjunct, for that matter), so reasoning about absolute P(H), P(E) and P(H|E) might cause some problems, at least with pedantic people like myself :P

Alien Valkyrie

Over-working something? Oh yeah ... that's why my papers take too long to come out. That's something I'm working on ... learning how to stop and let things go as they are, instead of honing them to perfection.

Magnasium

Bakers don't really care whether you have trouble digesting their gluten, but yes, even bakers have the concept of an over-kneaded dough. Dough starts out messy and sticky, then turns smooth and supple, almost glossy, as the gluten develops. Push it too far--over-develop the gluten--and instead of a stretchy matrix of gluten strands you end up with tight little balls of gluten-snarls that start exuding, crumb-like, from your dough. Fortunately, it's hard to knead a dough that far, at least by hand.

jason black

I've been waiting for this, and am not disappointed. Thank you! I like how you immediately point out that in practice you always are using the law of total probability (without having to say it), too often that's left for a follow up topic when it is essential for practical applications.

The book that really tattooed Bayes theorem for me was The Signal and the Noise, by Nate Silver (which I read after reading the Drunkards Walk, another great book about randomness). I'd add that there seem to be a repeated scene, when you give the three examples which look like an editing issue.

Leo Barlach

At about 8:18, the π-figure appears in a weird way in the bottom right corner - as if it was about 90% transparent or sth.

Sascha Baer

Whole-heartedly agree. Jargon has a role for experts, sure, but it should be heavily resisted in early explanations.

3blue1brown

Awesome to hear! I'm sure I'll only scratch the surface of what you'll be getting into, but feel free to share notes as you get into topics which excite you.

3blue1brown

Oh, whoa, thanks for the catch! I'm not sure what happened there, but I'll be sure to fix it...

3blue1brown

This is just a minor issue, but isn't it a case that in the list of patrons in the end of the video you completely skip letters with diacritical signs? I saw a name "Adam Dnek" there but I am not even sure it is my name :D I understand that maybe showing the diacritical sings wouldn't fit in the style of the video, but it would be better to remove just the sings than the whole letters. Otherwise cool video as always! It's nice to see that you didn't quit the old projects completely.

Adam Dřínek

Side note on the relationship of natural language and math - check out this MIT Tech Review article "...their approach essentially treats mathematics as a natural language" https://www.technologyreview.com/s/614929/facebook-has-a-neural-network-that-can-do-advanced-math/?utm_medium=tr_social&utm_campaign=site_visitor.unpaid.engagement&utm_source=Twitter#Echobox=1576595855

Dough that is been kneaded too long has lots of gluten. Therefore it is hard to digest. Hence, your analogy is spot on.

The deeper I get into my PhD the more and more I realize how crucial a solid understanding of probability and statistics is. I've never felt strong in it, so I'm about to dig a lot deeper and get some visual understanding which I know works best for me. This series is coming at a perfect time for me. Next term I'm taking statistical inference, woo! Thank you Grant!!

YES!!!!! I'm about to start my master's in a field involving heavy probability and statistics, so this post comes at the absolute perfect time. THANK YOU GRANT!!!

Alex Loftus

Great video. very helpful. Looking forward to the next one too! Btw, here’s a succinct non-psychologist’s criticism of the steve-linda example (one that helped me appreciate the difference between natural language and math)...“Natural-language sentences aren't logical propositions. "Is a bank teller" simply has different meanings in the two sentences.” Source: https://twitter.com/DavidDeutschOxf/status/1109101738039144450?s=20

YES! Probability videos!! <3

By far the clearest and most compelling explanation of Bayes theorem I have ever seen. Thank you!

Another consideration for visualizing why Linda isn't as likely to be both a bank teller and active in the feminist movement: The overlapping area of the rectangles probability areas have to be smaller than either of the rectangles. You show this ~11:22 as areas of ellipses, but it could be worth considering the overlapping rectangles. Fantastic Video!!! =D

Thanks for all the great content Grant! I appreciate everything you do for the math community.

I think slowly I'm building up the importance of intuition in maths, even very rigorous formal maths. The content I understand best has always been content you (or other fantastic content creators) have covered, and it's because you focus on the intution, while I find university lecturers always focus on the boring proofs, while giving no reason why they should be true. One thing I always advocate for is making things simpler and explaining words - what sounds scarier, a homeomorphism or a "same mapping"? It's small things like that that make a different, and you always manage to get that across fantastically. For that, I thank you very much Grant!

_ericBG

Grant, I know you lost the spark during the last probability series but this is great :) I think you may enjoy covering some of the practical aspects of Bayesian approaches - MCMC is interesting in an of itself, but the mechanics of HMC, NUTS, and the shape of the posterior etc all seems so visual and right up your alley! Looking forward to more content in 2020 <3

This is how I wish I had learnt Bayesian statistics. I learnt the formula but not why it was useful. I never understood why it was useful, and never knew I could think of it geometrically as I did introductory statistics with coin flips and a tree. Thank you so much for this.

At 8:38 in the video the formula at the top shifts slightly. Is that an ever so slight animation error?

Perhaps indicate that you some over all possible hypotheses in the denominator, in this case just two. Also it might be interesting to see why people's prior might cause them to get more and more extreme opinions: https://blogs.cornell.edu/info2040/2015/11/09/extreme-opinions-and-bayes-theorem/ Last, maybe a stochastic process like a Direchlet Process and linked to that a Chinese Restaurant Process through De Finetti exchangeability theorem would be a nice one in this series.

I've done some writing, and yes, I know the kneaded-too-long feeling all too well. One thing that helps me in those situations is to split the project in half, and try to "finish" just the first half. Often I'll find that I was closer to having both halves completed than I realized. When I knead too long, it's often because the project has grown out of bounds, or I'm trying to dive more deeply than I originally planned. Refocusing on trying finish a part, instead of creating the whole, makes me sharpen boundary lines that might have gotten blurry over time. Your videos are fantastic, by the way. I've been a practicing engineer for 25 years now, and thought that the math tactility and intuition I cultivated as a student was gone forever. Your work has reawakened and inspired me to learn it all over again.

Here's my attempt at explaining Bayes' to myself a while back https://heliosphan.org/bayes-derivation.html I really like your 'areas within a unit square' approach though, and I suspect that same approach will prove fruitful in other probability videos if you choose to do more (hope so). Here's a possible direction for a future video - estimating the bias of a coin... https://heliosphan.org/estimating-biased-coin.html ... in this post I also used the unit square approach, and a sprinkling of Bayes' too!

of course this comes at the *end* of the semester with the statistics class XD ...I jest, keep up the great work!

RedAgent14

It really is incredibly intuitive when you show it visually like that. My favorite application of Bayes rule is John Ioannidis' paper on why most published research findings are probably wrong. One thing that always trips me up is that some Bayes explainers (Arbital for example) like to talk in terms of odds ratios instead of probability - I wonder if you could briefly go into what the advantages are to thinking in those terms?

Lauren Steely

Who knew you could visualize Bayes' theorem? Brilliant.

Such a great video. I've been wanting to grok Baye's theorem for a while now and this was the ticket. I'm not going to practice on a problem that is useful to me: if someone raises in poker in a certain situation what is the chance they have a good hand?

Great video! Bayes' theorem is always so hard to make sense of. The area visualization is a great tip. Thanks!

This is great! Going to look up how those people used it for treasure hunting now.

Yes! I have a little script for a footnote video to this effect, since it's probably the quickest explanation of the fact. The one downside is that while it does a wonderful job showing why the formula is true, it gives little insight about when it will be useful. That's why I chose to frame this particular video around evidence/hypothesis style examples.

3blue1brown

I have considered it, and hope to get to it one day. As you imply, though, there are indeed a lot of requests and a lot of potential good topics.

3blue1brown

This is a great explanation of Bayes' Theorem, thanks! I find it easier to remember the theorem in a different way. We often think of the formula P(A and B)= P(A)P(B|A) as meaning: first find the probability that A happened, and then find the probability of B once A has already happened. But that's misleading, because the formula is also true if the two things aren't happening in a particular order, like being a librarian/farmer and having/not-having a personality trait. This means the formula has to be completely symmetrical: P(A and B) = P(A)P(B|A) = P(B)P(A|B) Solving for P(B|A) gives Bayes' Formula. I personally never even memorize the formula itself—I just calculate each of the probabilities in my formula above (one of which, of course, requires finding the sum in your video), and divide to find the one I want.

Jason Taff

Yes you can knead dough too long, if you do not want it to rise that much (kneading makes the gluten molecules stick to each other and thus trap CO₂ (from the yeast consuming carbohydrates) better. Waiting too long to bake - because of kneading excessively - will also make the bread flat. Dough gets sticky and more flour must be added when kneading - the bread will get heavier. Etc.

Gregor Shapiro

Sorry, I'll be the guy making a suggestion for another subject... but have you considered making a video on the SVD? Your linear algebra series is really insightful: under my suggestion, we've watched the whole series as a formation at work, and I'm sure it opened my coworkers eyes. And you were so close, up the eigenvalues and eigenvectors... And the SVD is such a useful tool that's not often taught in engineering classes. We had to find another video for the SVD for the formation, but it was longer, and not on the same intuitive level at all. So... If you ever need even more projects, I hope you will consider it!

Vincent Zalzal

Awesome video - I think I now understand Bayes' Theorem. Thanks, Grant!

Probability is something I could never wrap my head around. If these series don't beat it into me I don't know what will.

Probability squared must give a value albeit a lower one.

Gregor Shapiro


More Creators