Priors / Bayesian Reasoning / Conditional Probabilities Mental Model (Be A Filter, Not A Sponge)

If this is your first time reading, please check out the overview for Poor Ash’s Almanack, a free, vertically-integrated resource including a latticework of mental models, reviews/notes/analysis on books, guided learning journeys, and more.

Priors / Bayesian Reasoning / Conditional Probabilities Mental Model (Be A Filter, Not A Sponge)

If you only have three minutes, this introductory section will get you up to speed on the Priors / Bayesian Reasoning / Conditional Probabilities mental model.

The concept in one quote:

You have to evaluate each hypothesis in the ‘light of the evidence’ of what you already know about it. - R. A. Fisher Click To Tweet

(via Jordan Ellenberg’s “ How Not To Be Wrong” – HNW review + notes)

The concept in one sentence: the probability of X given Y is not the same as the probability of Y given X; this is mathematically obvious but profoundly unintuitive, leading us to dramatically overweight incremental data points the world gives us.

Key takeaways/applications: having the right beliefs about the world – “priors” – allows us to make more accurate assessments of incoming information.

Three brief examples of priors / Bayesian Reasoning / conditional probabilities:

Be a filter, not a sponge.  In beloved teen flick The Perks of Being A Wallflower,” Mr. Anderson – the English teacher of impressionable high school freshman Charlie Kelmeckis – hands Charlie a copy of Ayn Rand’s The Fountainheadwith exactly the right advice: be a filter, not a sponge.  

(Rand is dose-dependent: some Rand is very good; too much is very bad.  I say that as the winner of the 2013 Atlas Shrugged essay contest.  I was 19; don’t judge me too hard.)

Mr. Anderson was implicitly talking about the idea of priors and Bayesian reasoning – being open-minded and avoiding strong ideology is important, but we also can’t walk around sponging up every idea we hear, else we’ll get a bunch of nonsense in our heads.  Priors are our filters.

The weather is the source of all our problems.  It is a beloved trope among investors that retailers and restaurants love to blame the weather for their woes – it is not uncommon to hear executives complain it’s too cold and rainy one quarter, and too unseasonably hot the next. Given the base rate of weather events occurring every year, and management teams complaining about them, Bayesian reasoning and conditional probabilities help us figure out not to weight those protests too highly.

The thieves don’t deserve a seat at the security-system table.  It’s important to consider all sides of an issue – but not when some sides have no merit.  How do we figure this out?  Base ratesand priors – if someone tells you they’re a rain god, claps their hands, and rain falls out of the sky, don’t believe them.  Even if they do it for a week in a row.  They just got lucky.  As I discuss in the sample size model, improbably things are probable given the number of events in our world.  There’s no causality mechanism, so you’d be mistaking  correlation for causation.

Dr. Paul Offit makes this point in “ Deadly Choices ( VAX review + notes), observing that scientific data on the safety of vaccines – perhaps the safest and most cost-effective public health intervention in medical history – is so strong and unequivocal that anti-vaccine conspiracy theorists shouldn’t be given a platform by the media or the government to spread their views.

As Offit puts it,

“[Seth] Mnookin argued that the anti-vaccine movement’s desire to have a seat at the table in discussions about vaccines is analogous to the Ku Klux Klan wanting to have a seat at the table in discussions about race relations.”

If this sounds interesting/applicable in your life, keep reading for unexpected applications and a deeper understanding of how this interacts with other mental models in the latticework.

However, if this doesn’t sound like something you need to learn right now, no worries!  There’s plenty of other content on Poor Ash’s Almanack that might suit your needs. Instead, consider checking out our learning journeys, our discussion of the schemasleep, or social connection mental models, or our reviews of great books like “ Uncontainable” ( UCT review + notes), “ Internal Time” ( IntTm review + notes), or “ The Genius of Birds” ( Bird review + notes).

Conditional Probabilities, Priors, and Bayesian Reasoning: A Deeper Look

“You’d like to say that your beliefs are based on evidence alone, not on some prior preconceptions you walked in the door with.  

But let’s face it – no one actually forms their beliefs this way. If an experiment […] slowed the growth of […] cancer […] by putting patients inside a plastic replica of Stonehenge, would you grudgingly accept that [vibrational earth energy was curative?]  

You would not, because that’s nutty. You’d think Stonehenge probably got lucky.

You have different priors about those two theories, and as a result you interpret the evidence differently, despite it being numerically the same.”

Ellenberg, throughout his wonderful book “ How Not To Be Wrong ( HNW review + notes), discusses the limits of the p-value based statistical model.  Nate Silver, in “ The Signal and the Noise ( SigN review + notes), goes a step further, exploring how and why the Bayesian approach is superior to the frequentist approach.  Silver’s book is more thorough on Bayesian reasoning, while Ellenberg’s provides more context on the traditional “frequentist” statistical, p-value approach.

Ellenberg and Silver both provide great, thorough examples of the mathematics and logic of Bayesian reasoning; I encourage you to buy (and read) both books. 

Additionally, Philip Tetlock’s “ Superforecasting ( SF review + notes) explores how ordinary individuals like you and me can use the Bayesian reasoning process – in addition to  disaggregation and  probabilistic thinking – to make more accurate predictions than experts, in their own fields.

This model is going to be structured a little differently from many because it’s somewhat more conceptually dense.  I apologize in advance, but it’s important to understand the theory before we can think about the important conclusions (i.e., the ones Ellenberg references above.)

It’s important to define three terms:

Priors.  Priors are your existing beliefs about the world that should, in many cases, conform to statistical base rates.  For example, one prior that many of us already hold might be “if it is raining, there is a nearly 100% chance that there are clouds overhead.”

Conditional probabilities.  A conditional probability is the probability of X, given that we already know Y.  It is important to note that conditional probabilities cannot be reversed.  For example, the prior above is an example of a conditional probability.  Given that it is raining, it is then nearly 100% probable that there are clouds in the sky.  But this obviously does not work in reverse: given that there are clouds in the sky, the probability of rain is much lower.  Conditional probabilities are often expressed in the form of P (X | Y) – the probability of x, given y.

Bayesian reasoning.  Bayesian reasoning is a mathematical process of responding to new data points by assessing conditional probabilities, given your priors.

Ellenberg, Tetlock, and Silver all provide their own examples of Bayesian reasoning and conditional probabilities – Ellenberg’s example about terrorists and Silver’s example about panties are both hilarious, by the way.

Here, I’ll provide a somewhat different one that is related to my own field – investing.  One of the most notorious mental models in investing is the planning fallacy.

If you want to understand the planning fallacy, look no further than Don Norman, who postulated it as Norman’s Law in “ The Design of Everyday Things ( DOET review + notes):

The day a product development process starts, it is behind schedule and above budget. - Don Norman Click To Tweet

Thanks to complexity and nonlinearity, the bigger the project, the more likely it is to fall prey to Norman’s Law.  However, for mathematical simplicity, to explore the idea of conditional probabilities, let’s assume that 90% of all projects are likely to be late and over budget, while only 10% are completed as promised.  In other words, if T = timely and L = late, then P (T) = 10%, and P (L) = 90%.

So, if we were to draw a little table with a sample size of 10,000 projects, it would look something like this:

On Time, On Budget Late, Over Budget Total Projects
1,000 9,000 10,000

Well, that’s not very helpful.  Yet.  It will be, once we add a row.

Say that you’re, like me, an investor – and the management team of a publicly traded company states on a conference call that a big, important project is on time and on budget.  (If you prefer, you can reframe this as – say you’re a manager inside a company, and a team that you’re overseeing comes to you and reports that a big, important project is on time and on budget.)

The obvious question now is: should you believe them?  How likely is it that the project is actually on time or on budget?

Now we have to expand that table we started creating above.  We need a little more information, though: this is where conditional probabilities come in.

As dedicated mental models learners, we know that overconfidence exists – it’s why the planning fallacy exists in the first place!

So let’s say that if a project is actually on time and on budget, there are few false positives – meaning  if a project really is on time, then let’s assume that 95% of the time, the management team would communicate to investors (or their bosses) that it’s on time, and 5% of the time, they would say that it’s not on time.

In other words, if we treat “C” as cheerful report and “D” as doleful report, P (C | T) is 95% – the probability of a cheerful report, given the project is actually on time.

And P (D | T) is 5% – the probability of a doleful report, given that the project is actually on time.

So we can fill in our table a little bit more.  Notice that we’re just taking the conditional probabilities and splitting the 1,000 projects previously in column T into two rows that add up to 1,000.

On Time, On Budget (T) Late, Over Budget (L) Total Projects
Cheerful Report (C) 950 ? ?
Doleful Report (D) 50 ? ?
Total Projects 1,000 9,000 10,000

Now let’s turn to the other scenario.  Say that the project is, in fact, running late and over budget.  Again, managers tend to be overconfident, so let’s assume that 70% of managers will assume they can “make up time” as they go along, while only 30% will acknowledge that the project is, in fact, challenged.

In other words, the probability of a cheerful report, given that the project is actually late – P (C | L) – is 70%.  The probability of a doleful report, given that the project is late – P (D | L) – is 30%.  So 70% of our 9,000 total late projects will have an associated cheerful report, and the other 30% will have a doleful report.

So now we can fill in our table completely, to get back to our 10,000 total projects (of which, remember, 9,000 are late and 1,000 are on time.)

On Time, On Budget (T) Late, Over Budget (L) Total Projects
Cheerful Report (C) 950 6,300 7,250
Doleful Report (D) 50 2,700 2,750
Total Projects 1,000 9,000 10,000

Here’s where the super, super important insight comes into play: now that we have this table set up, we can answer the question we set out to originally explore.  Given that the management team told us that the project’s going swimmingly, what’s the actual probability that it’s on time?

Well, that’s not hard to do.  We focus on the row that says “management cheerful.”  Basic probability dictates that we put the number of desired outcomes in the numerator (950) and the number of total outcomes in the denominator (7,250).

In other words, P (T | C) is merely ~13% – which is a hell of a ways off from P (C | T) (95%), but actually pretty gosh-darn close to our original base rate likelihood of timely completion, P (T) without any additional information, which was 10%.

Let that sink in for a minute.  A favorable report from management, given the numbers above, should only boost our confidence in timely project completion by 3%.

If you’re anything like me, that’s probably a hugely unintuitive conclusion.

Bayesian reasoning is the process of constantly updating our priors by running calculations like the above.

Takeaways from Bayesian Reasoning: Overconfidence,  IdeologyMargin of Safety ,  Correlation vs. Causation, Causality

There are three clear takeaways from this.  One obvious one that all smart investors do is margin of safety – model all projects taking longer, costing more, and yielding less of a payoff than the management team tells you.

The second is, as Nate Silver goes into extensively in “ The Signal and the Noise ( SigN review + notes), the importance of understanding causality and using that knowledge to set our priors:

“Frequentist methods – in striving for immaculate statistical procedures that can’t be contaminated by the researcher’s bias – keep him hermetically sealed off from the real world.  

These methods discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that the Bayesian method demands in the form of a prior probability.  

Thus, you will see apparently serious papers published on how toads can predict earthquakes, or how big-box stores like Target beget racial hate groups, which apply frequentist tests to produce ‘statistically significant’ research findings.”

Silver cites the famous Ioannidis paper – Why Most Published Research Findings are False – and gives the analysis a Bayesian spin.  (Ellenberg disucsses this as well in How Not To Be Wrong ( HNW review + notes), and I tackle it in a little more depth in the notes to HNW.  Ellenberg also goes deeper into p-hacking and other interesting statistical concepts.)

The idea here is to not mistake correlation for causation.  As Silver laments earlier in the book,

numbers have no way of speaking for themselves.  We speak for them. We imbue them with meaning.

[…] It is when we deny our role in the process that the odds of failure rise.  Before we demand more of our data, we need to demand more of ourselves.

Silver provides the most thoughtful analysis of this topic that I’ve ever seen, and I can’t recommend “ The Signal and the Noise ( SigN review + notes) highly enough.

The important point is having the right priors at the right confidence level – if important priors (like “vaccines are safe and effective”) are given too low of a value, then we’re apt to be whipsawed by completely irrelevant datapoints.  

Like Charlie’s teacher Mr. Anderson says: be a filter, not a sponge.

Silver cites the example of mammograms, where false positives are so prevalent among younger women that they’re generally not recommended.  There’s an example of this in Dr. Jerome Groopman’s “ How Doctors Think ( HDT review + notes) – which we’ll explore a bit more in the next section.

One doctor interviewed by Groopman:

“emphasizes to his interns and residents in the ER that they should not order a test unless they know how that test performs in a patient with the condition they assume he has.  

That way, they can properly weight the result in their assessment.”

A final takeaway is that we should be much slower to respond to data points than our naturally overconfident selves tend to do, what with our storytelling tendencies.  Going back to Silver’s “ The Signal and the Noise ( SigN review + notes), Silver notes that:

Usually we focus on the newest or most immediately available information, and the bigger picture gets lost. - Nate Silver Click To Tweet

Examples of this pop up all over the place.

Richard Thaler points out in “ Misbehaving ( M review + notes) that trading volume in the stock market is far too high for all trades to make rational sense – and anyone who’s spent time around investors knows that many in the financial world have a tendency to dramatically overreact to incremental data points.

In a less financial example, the base rate of my experience with people – over the course of my lifetime – is that they’re ultimately self-interested and are (in most cases) friends of convenience rather than friends of commitment.

Yet for a long time, I still went into relationships with a “fast friends” approach – I’d quickly grow to like and trust people, and assume that their behavior early on would be representative of their behavior later – which, it turns out, it almost always wasn’t.

Application / impact: Bayesian reasoning – focusing on our priors, and understanding conditional probabilities – can prevent us from being whipsawed by incremental data points that don’t actually give us much useful information.

The Dose-Dependency of Bayesian Reasoning

Sharp readers might have noticed a couple flaws, or sticking points, in what we’ve been talking about.

The first, and most obvious: if bad priors are weighted too highly, then we run into the ideology problem.  And this, in fact, represents how many people reason to begin with: Tavris/Aronson quip in “ Mistakes were Made (but not by me) ( MwM review + notes) that if you tell people a policy proposal comes from the opposite political party, you might as well ask people if they will favor a policy proposed by Osama bin Laden.”

However, if you’re reading this site and you’ve made it this far into one of the more advanced models, then I think you’re probably open-minded enough that that’s less of a problem.

The bigger challenge is that most of us aren’t actually going to walk around doing this sort of math in our heads.  That’s for two reasons: one, in many situations computing these types of conditional probabilities is too slow to be of use.  If we see what looks like brake lights up ahead, we’re not gonna think about the base rates of whether we should slow down or not.  We’re just gonna start slowing down.

Second, and more importantly in analytical situations: you will notice that I, more or less, pulled those priors out of my butt.  I am not alone. Ellenberg and Silver do the same thing for their examples in “ How Not To Be Wrong ( HNW review + notes) and “ The Signal and the Noise ( SigN review + notes).

Their calculations are both amusing, and great in theory – but it is difficult to determine with any quantitative certainty exactly how many terrorists are in America, or exactly how likely it is that your cheating spouse’s secret lover would accidentally leave their underwear behind in your bedroom.

Those are what the hilarious Matt Levine at Bloomberg would call “dumb fake numbers.”  They don’t really exist.  Gathering data to figure out what they are would take so much time that the opportunity cost of precision would be way too high: we don’t really need to know what the exact base rate is.  We just need to have a conversation with our significant other about where that underwear came from.

Thankfully, it turns out that we don’t have to have these numbers – or be quantitatively precise – to get mileage out of Bayesian reasoning.  I get mileage out of it, and that conditional probability table I built for you is probably only the second time I’ve done such an exercise (the first being when I originally read Silver’s book.)  

Indeed, in the aforementioned “ Superforecasting ( SF review + notes), Philip Tetlock observes:

“[Despite being numerate, superforecasters] rarely crunch the numbers so explicitly.  

What matters far more to the superforecasters than Bayes’ theorem is Bayes’ core insight of gradually getting closer to the truth by constantly updating in proportion to the weight of the evidence.”

Finally, it’s worth noting that Bayesian reasoning doesn’t apply in every situation.  One of Tetlock’s other observations in “ Superforecasting is:

when it comes to things like terrorist attacks, people are far more concerned about misses than false alarms.”  

I previously referenced Dr. Jerome Groopman’s “ How Doctors Think ( HDT review + notes), one of my all-around favorite books on dealing with uncertainty and cognitive biases.  Groopman’s book contains many examples of doctors using Bayesian-esque reasoning, but Groopman is generally critical of it.

On page 151, for example, Groopman notes the sample size challenge I mentioned above:

“The [Bayesian] calculation… has the doctor choose the path with the highest number emerging from the formula.  

Of course, much of what doctors like Lock deal with it unique; there is no set of published studies from which decision-analysis researchers can derive a probability.”

In other words, for many medical conditions, a base rate is nonexistent – as I discuss in the sample size mental model, one problem is that averages are not always useful (if there are heterogeneous clusters within the data – think about averaging the wealth of Bill Gates and a homeless camp.)

"How Doctors Think" by Jerome GroopmanGroopman notes throughout “ How Doctors Think” that humans aren’t statistics, and that some of the worst misdiagnoses occur in complicated, niche cases when doctors too easily assume that the seemingly most likely diagnosis is the right one..

Groopman goes on to cite some work by Donald Schon of MIT.  Real physicians face:

“Divergent situations where… relying on a large database to assign probabilities to a certain diagnosis, or the outcome of a certain treatment, completely breaks down.”

This is analogous to many of the problems we face in our own fields.  Henry Petroski, for example, discusses in “ To Engineer is Human ( TEIH review + notes) how there’s usually little helpful data in the performance of boundary-pushing designs – which, of course, given the nature of arms races, is most engineering designs.  As we keep demanding our devices get ever more powerful yet smaller or lighter-weight, old base rates may not apply.

Phil Rosenzweig comes to similar conclusions about businesses in “ The Halo Effect ( Halo review + notes).  While it seems like there are a lot of businesses out there, in reality there aren’t – about 3,600 domestic companies are listed on U.S. stock exchanges as of the end of 2017 (per Bloomberg), which would be classified well into “orphan disease” territory if it was a medical condition.

Once you think about all the confounding factors, there’s not really a good base rate for many business decisions – for example, a sizable merger – simply because there just haven’t been enough done, under similar conditions, from which to draw quantitatively precise conclusions.

A final point – that I discuss in more depth in the cognition / intuition / habit / stress model, and elsewhere – is that pure, deliberate analysis isn’t the solution to every problem.  Groopman notes throughout “ How Doctors Think ( HDT review + notes) that doctors using both cognition andintuition tends to result in better diagnoses than using one or the other alone.

Charles Duhigg notes in “ The Power of Habit ( PoH review + notes) that football players make worse decisions when they think rather than react – because, as Laurence Gonzales explains in “ Deep Survival ( DpSv review + notes), cognition is often too slow in dire circumstances.

The idea here isn’t to give up.  I’m fond of this one Peter Thiel quote from “ Zero to One ( Z21 review + notes):

If you expect an indefinite future ruled by randomness, you’ll give up on trying to master it. - Peter Thiel Click To Tweet

The point isn’t to throw up our hands and say that we can never use Bayesian reasoning.  The point is, as with anything else, to not be a man with a hammer – it’s a very useful tool, and I’ve found it to be a wonderful practice of thinking that has enhanced my cognition, but it also has its practical limitations.

Application / impact: be aware that base rates don’t exist for everything, and in some situations, Bayesian reasoning can actually lead you astray.