Causation / Multicausality / Disaggregation (Incl. Correlation, Swiss Cheese, Peter Thiel X | Y, Lollapaloozas)

If this is your first time reading, please check out the overview for Poor Ash’s Almanack, a free, vertically-integrated resource including a latticework of mental models, reviews/notes/analysis on books, guided learning journeys, and more.

Causation / Multicausality / Disaggregation Mental Model: Executive Summary

If you only have three minutes, this introductory section will get you up to speed on the causation / multicausality / disaggregation mental model.

Causation / multicausality / disaggregation in one quote:

I never solve the problem I am asked to solve, |because it's| not the real, fundamental, root problem. It's usually a symptom... it's amazing how often people solve the problem... without bothering to question it. - Don Norman Click To Tweet

A quick overview of causation / multicausality / disaggregation: our understanding of causation is often overly swayed by “ storytelling” as well as correlations that aren’t causations (ex. hard work is correlated with success, but does not on its own cause success independent of other factors).  Consultants and other structured thinkers use a “ MECE” framework (mutually exclusive, collectively exhaustive) to segment problems into analysis.

While there are limits to this approach, it’s a starting point for evaluating situations.  By inversion, aligning multiple causes can cause multiplicative “Lollapalooza” effects.

Two brief examples of causation / multicausality / disaggregation:

How do you understand heredity?  The same way you eat an elephant: one bite (of peas) at a time.  

Sam Kean’s “ The Violinist’s Thumb ( TVT review + notes) provides an excellent overview of how Gregor Mendel used disaggregation to determine causation:

“this focus on separate, independent traits allowed Mendel to succeed where other heredity-minded horticulturists had failed.  

Had Mendel tried to describe, all at once, the overall resemblance of a plant to its parents, he would have had too many traits to consider […] but by narrowing his scope to one trait at a time, Mendel could see that each trait must be controlled by a separate factor.”

Is it cheaper to give apartments to the homeless?  Some evidence reviewed by Megan McArdle in The Up Side of Down ( UpD review + notes) suggests that might, counterintuitively, be the case. 

The costs of the homeless to society are multicausal, and by disaggregation, the biggest line item is hospital visits – which, as David Oshinsky provides a window into in “ Bellevue ( BV review + notes), are often more for basic personal care like meals and showers than for actual medical problems.

McArdle notes that there’s thus a high opportunity cost to letting the homeless remain on the street; it might be cheaper to get them apartments.  Naturally, there would be n-order impacts to such a policy – McArdle explores the concept of “moral hazard” – but this sort of analysis is at least an interesting starting point.

If this sounds interesting/applicable in your life, keep reading for unexpected applications and a deeper understanding of how this interacts with other mental models in the latticework.

However, if this doesn’t sound like something you need to learn right now, no worries!  There’s plenty of other content on Poor Ash’s Almanack that might suit your needs. Instead, consider checking out our learning journeys, our discussion of theinversionmargin of safety, or Bayesian reasoning mental models, or our reviews of great books like “ Polio: An American Story” ( PaaS review + notes), “ The Autobiography of Benjamin Franklin” ( ABF review + notes), or “ Deep Survival” by Laurence Gonzales ( DpSv review + notes).

Correlation vs. Causation: Why Hard Work Doesn’t Cause Success, and Vaccines Don’t Cause Autism

Ice cream sales and forest fires are correlated because both occur more often in the summer heat. But there is no causation; you don’t light a patch of the Montana brush on fire when you buy a pint of Haagen-Dazs. - Nate Silver Click To Tweet

Nate Silver’s  The Signal and the Noise” ( SigN review + notes) does a great job of breaking apart correlation and causation.  It’s an interesting topic, but you’re welcome to skip this section if you’re familiar with it (it’s fairly basic) and move on to some of the more intriguing parts of the model below.

Silver notes that in the Big Data era, there’s somewhat of a trend toward the idea that correlations are everything: i.e., the notion thatwe don’t need to know why the black box works… it just needs to work.

Silver is critical of this approach, exploring its drawbacks and the reasons to focus on causality (see also the Bayesian reasoning mental model.)  So is Michael Mauboussin, who notes in “ The Success Equation ( TSE review + notes):

Statisticians who are serious about their craft are acutely aware of the limitations of analysis. Knowing what you can know and knowing what you can’t know are both essential ingredients of deciding well. - Michael Mauboussin Click To Tweet

I think most people are familiar with the idea of correlation vs. causation, so I won’t waste too many words on it: I’ll just point out a few important tenets and examples, and you can pick the rest up on your own.

To start with, a correlation between one thing and the other doesn’t tell you anything about the direction of the causal relationship.  You have to ascertain that by other means.  Yet people often mistake correlations for causal relationships.  For example, many people hold the nonsensical view that one can do something such as “sleep too much” – that beyond a certain point, more sleep is worse for your health, productivity, and happiness.

This isn’t even remotely true.  Dr. Matthew Walker explains why in his brilliant “ Why We Sleep (Sleep review + notes) – my candidate for most important book of the century (no hyperbole).  Walker notes that sleep is not really  dose-dependent:

“No biological mechanisms that show sleep to be in any way harmful have been discovered.”

Nine hours is much better than eight, which is much better than seven, and below seven you’re in the high danger zone.  Given that, why can you find a correlation between very high sleep and mortality?  Walker notes that there’s a correlation vs. causation problem in mortality data associated with people who sleep a lot: many of the people who are sleeping a lot are seriously or terminally ill, and obviously we sleep more when we’re ill for a variety of reasons.  That doesn’t mean that the sleep is the cause of the negative health outcomes; in fact, it’s the other way around.

Anecdotally, for example, one of my friends’ family members who is struggling with advanced multiple sclerosis (MS) and another undiagnosed condition is currently “sleeping about 12 hours a day.”  Assuming she is representative of those with her specific condition, certainly a study that included subjects like her would find a correlation between length of sleep and higher mortality, but it’s very difficult to tell a plausible story in which the sleep is the cause of her health issues – rather, it’s the other way around.

Another important example relates to what’s called a “post hoc” fallacy – short for Latin “post hoc, ergo propter hoc,” which is fancy verbiage for “after this, therefore because of this.”

Humans are very good at storytelling.  As Don Norman points out in “ The Design of Everyday Things ( DOET review + notes):

[… humans’ conceptual models] are often constructed from fragmentary evidence, with only a poor understanding of what is happening, and with a kind of naive psychology that postulates causes, mechanisms, and relationships even where there are none.”

The quote was in relation to how consumer products work, but it works equally well relative to other lines of human thinking.

One of the best examples of this is the vaccines-cause-autism conspiracy theory, that has somewhat faded from the limelight but still has a following, despite the only research supporting it being retracted for being completely fraudulent.  

Dr. Paul Offit’s “ Deadly Choices ( VAX review + notes) – which we’ll return to later – explores how this sort of post-hoc, correlation vs. causation mistake led many parents to (errantly) believe that their childrens’ vaccines were the cause of subsequently diagnosed neurological disorders.

Offit cites, for example, a data review by professor Gerald Golden in 1990 that highlights the faulty thinking:

“Analysis of the recent literature, however, does not support the existence of [a pertussis-caused encephelopathy] syndrome, and suggests that neurological events after immunization are chance temporal associates.”

Given the number of children who are vaccinated (i.e. close to all of them), and the number of vaccines they receive, and the timing of onset and diagnosis of various neurological issues, it is a statistical certainty that some number of children will begin displaying symptoms of – and thus be diagnosed with – a neurological condition shortly after receiving a vaccine.  But it is, as Golden notes, a “chance temporal associate.”

One judge, ruling on an MMR case, quoted Samuel Johnson:

It is incident I am afraid, in physicians above all men, to mistake subsequences for consequences. - Samuel Johnson Click To Tweet

Indeed, epidemiological studies with a huge sample size – as well as natural experiments that served as a/b tests – demonstrated that there was absolutely no causal link.

In fact, this issue was actually addressed decades prior, with the Francis Field Trial for the Salk polio vaccine.  Oshinsky’s “ Polio: An American StoryPaaS review + notes) discusses how some kids who got the polio shot died of polio, but it had to be determined whether they had already contracted polio prior to receiving the vaccine – which in most cases it seems like they did.

We will come back to this topic later.

Here’s the second example, which is vastly counterintuitive for many people, especially thanks to the inexplicable popularity of nonsense like the “ grit” mentality.  Hard work doesn’t cause success.

“But wait,” you say.  “Everyone I know who is successful works hard.  You can’t be successful without hard work.”

Yes and no.  Success is certainly correlated with hard work; hard work is also required for many of the potential factors that cause success – like being good at a tangible skill.

But as Jordan Ellenberg points out in “ How Not To Be Wrong ( HNW review + notes), correlation isn’t transitive.  That is to say: if A is correlated to B, and B is correlated to C, A isn’t necessarily correlated to C.

One of the challenges with correlations is that they don’t give you a control group.  For example, if we’re trying to determine whether or not hard work causes success, we’d need to not just look at successful people and see if they all work hard, but also look at all people who work hard and see if they’re successful.

And the answer is: they’re not.  A lot of low-wage jobs (construction, retail, janitorial) involve a lot of hard work.  They would certainly take a lot more effort than my job, which mostly involves standing in an air conditioned room and reading things that interest me.

The answer here is that it’s doing things that society values highly, and doing them well, rather than hard work, that causes success.  I discuss this more in the busyness vs. productivity model, but this classic quote from Stephen Covey’s “ The 7 Habits of Highly Effective People ( 7H review + notes) illustrates the topic:

“envision a group cutting their way through the jungle with machetes […]

the leader is the one who climbs the tallest tree, surveys the entire situation, and yells, “wrong jungle!”  

[…] as individuals, groups, and businesses, we’re often so busy cutting through the undergrowth we don’t even realize we’re in the wrong jungle.”

So how do we determine if a correlation has a causatory mechanism?  We’ll investigate that.

Application / impact: just because two events are temporally associated does not mean that one caused the other.

Disaggregation / MECE: How Many Pairs of Briefs Are Sold Annually In America, How Many Vaccines Can You Give A Baby, And Why Don’t Teachers Get Paid A Lot?

How many pairs of briefs are sold annually in America?

Yes, it’s a frivolous question, and no, you’re not allowed to go searching for answers on Google or in Hanesbrands’ annual report.  It’s the type of question where the answer is less important than how you get to the answer.  

How should you get to the answer?  Well, let’s break it down into component pieces.

First, briefs are a form of underwear, but one only worn by males, so we should start with the number of boys and men in America – or roughly half the population, excluding a few diaper-wearing infants.

Second, different guys have different preferences, and these preferences also tend to shift over time, from early childhood to adolescence to adulthood, so a reasonable starting point would be to segment the American male population into various demographics and assign percentages to determine how many males in each age group wear briefs instead of boxers or boxer-briefs, and thus how many total males are likely to purchase briefs.  

(Careful not to rely too much on the availability bias of your own preference, or that of the guys in your life.  I’ll also let you make your own decision about whether or not to include “commando” as an acceptable potential category.  I’ll just silently judge you if you do.)

Third, the purchase of new underwear is likely a function of how many pairs of underwear the average guy needs, and how frequently they need to be replaced.

So we’ve broken the problem down into three constituent components – the total population, the percentage of that population that comprises our demographic, and the purchase frequency among that demographic – and if we come up with reasonable inputs for those – whether by guessing or by sourcing actual data – then we should have the right answer.  A good disaggregation will be “MECE” – mutually exclusive and collectively exhaustive.

Questions like that one, and the following, are often asked in interviews for major consulting firms like BCG, Bain, and McKinsey:

– How many gas stations are in America?

– How many golf balls could you fit on a 747?

– How many piano tuners are in Chicago?

The proper mental approach for answering these – and more complicated business “case interview” questions – are explored in Marc Cosentino’s Case in Point.  I found it interesting long ago when I was considering being a consultant, and while at the time I chafed at having to answer dumb questions – “why the hell would anyone want to stuff a 747 with golf balls?” – it turns out that the line of thinking is valuable.

In fact, that last question about piano tuners is sourced from Philip Tetlock’s “ Superforecasting ( SF review + notes), a wonderful book about forecasting.  It turns out that the eponymous “superforecasters” use a very similar MECE type disaggregation process to make accurate predictions.  

For example, in response to one question about how likely it was that a certain agency would detect polonium on a victim’s clothes, one “superforecaster” broke it down with various probabilistic assumptions like: would polonium be detectable today?  If it is, would the murderers have had access to it?

Combining disaggregation with probabilistic thinking, Tetlock’s superforecasters wildly outperformed experts at making predictions in their own fields.

Click Hinkie for an Easter Egg.
Click Hinkie for an Easter Egg.

Disaggregation seems to be the style of thinking that Sam Hinkie – an ex-Bain guy – took with his famous “process” for the Philadelphia 76ers.  Hinkie broke success down into its components:

– high draft picks

– lots of cap space

… then set about getting both of the above.

Disaggregation is the process Don Norman is talking about when he advocates exploring for root causes and fixing those.  A similar approach was used in creating the polio vaccine: at one point, as discussed in David Oshinsky’s “ Polio: An American Story ( PaaS review + notes): scientists realized that they hadn’t found polio in the blood because it didn’t manage to survive there after antibodies were generated.  That didn’t mean it had never been there to begin with.

See the parallels to the polonium example?  This proved to be a breakthrough.  via the blood before hitting the nervous system.  If you could put antibodies in the blood before polio got to the nervous system… polio would be prevented.  But nobody had thought to look there.  

Similar disaggregation was used by Leonard Hayflick in creating the WI-38 cell line that proved vital to future vaccine production, as discussed in Meredith Wadman’s “ The Vaccine Race ( TVR review + notes).

Another vaccine-related example of disaggregation comes from the aforementioned “ Deadly Choices ( VAX review + notes).

Offit uses a MECE-type framework to carefully examine the potential causal mechanisms for neurological damage for vaccines, for example, and finds that there are none.  Formaldehyde and aluminum don’t pose a risk because of the dose-dependency of poisons; total childhood immunizations contain de minimis amounts compared to total circulating blood levels.

Here’s one example of his disaggregation:

“The notion that a single viral protein could [cause strokes, blood clots, heart attacks, paralysis, seizures, and chronic fatigue syndrome] when the whole natural replicating virus can’t do any of this was illogical.

Offit applies a similar technique to ask, in light of some parents’ concerns that babies are receiving too many vaccines, how many vaccines a baby could safely receive at one time.  

His answer focuses on identifying the agent that creates a response, and comparing vaccine dosages of such agents to those babies encounter in their natural environment from bacteria and viruses like the common cold:

“It’s not the number of vaccines that counts; it’s the number of immunological components contained in vaccines […]

the total number of immunological components in today’s fourteen vaccines is about a hundred and sixty, fewer than the two hundred components in the only vaccine given more than a hundred years ago […]

arguably, a single infection with a common cold virus poses a much greater immunological challenge than all current vaccines combined […]

babies could theoretically respond to about a hundred thousand vaccines at one time […] the notion… shouldn’t be surprising.  In a sense, babies are doing that every day.”

The final example of disaggregation that I find tremendously helpful – albeit deceptively simple – is Peter Thiel’s “X | Y” framework for value creation.  Thiel notes in his (super-interesting, definitely worth watching” “How to Start a Startup” lecture that:

I want to suggest there’s basically a very simple formula, that if you have a valuable company two things are true.

Number one, that it creates “X” dollars of value for the world.

Number two, that you capture “Y” percent of “X.”

And the critical thing that I think people always miss in this sort of analysis is that “X” and “Y” are completely independent variables.

There’s plenty more such thinking where that came from in Thiel’s “ Zero to One ( Z21 review + notes). Examples of this abound in business and life: there are some industries, like multilevel marketing, where X may be very low or zero or negative, and Y may exceed X, such that the companies are sucking value out of the system.  There are other examples, like teaching and flying airplanes, where tremendous value is created, and yet very little is captured.

So if you want to make a lot of money, you not only have to create value, but also participate in a part of the ecosystem where it can be captured.

Application / impact: large problems are much easier to solve when broken down into small chunks that, in total, solve the problem.  

Multicausality & The Swiss Cheese Model

One challenge when assessing causality is that given any chosen group of outputs or studied variables, there’s likely not just one cause.  

Going back to my previous reference to The Signal and The Noise ( SigN review + notes), several parts of Dr. Jerome Groopman’s “ How Doctors Think ( HDT review + notes) explore some of the challenges of making diagnoses.

For example, Groopman quotes a Dr. Light on the challenge of false positives: while MRIs are obviously wonderful and useful, the truth is that natural variation combined with aging means that if you look hard enough, you’ll find something wonky about everyone – but that doesn’t mean it’s necessarily something wrong that needs treatment or is responsible for a problem.  

Groopman’s book is a wonderful exploration of multicausality, among other topics – he provides many examples of cases in which doctors tried to explain all of a patient’s symptoms with merely one diagnosis, when in fact a combination of different conditions contributed to the symptoms.  Groopman provides helpful advice on how doctors – and their patients – can avoid this trap.

One of my favorite examples of this comes from chronobiologist Till Roenneberg’s “ Internal Time ( IntTm review + notes).  Toward the end of the book, Roenneberg discusses children suffering from a particular genetic disorder (Smith-Magenis syndrome) that was thought to cause (among other symptoms) “severe behavioral pathologies.”  

The genetic disorder does cause a lot of challenges, but the way Roenneberg tells it, the behavior actually isn’t one of them (directly, anyway).  The kids are actually just sleep-deprived: in some cases, the syndrome can lead to an inverted melatonin production, i.e. during the day at night.

Treating this and allowing them to sleep properly led to both them and their caretakers having a higher quality of life – but it was long overlooked because it was simply assumed that the incurable genetic disorder caused all of the associated symptoms.

This is a particularly important phenomenon in science.  One of my other favorite examples of multicausality is explained in Jennifer Ackerman’s delightfully reader-friendly “ The Genius of Birds ( Bird review + notes), which is a book that is as beautiful and enjoyable as it is educational.  Exploring studies trying to determine “bird IQ,” Ackerman displays not only  intellectual humility  scientific thinking, but also a thorough understanding of multicausality:

“It’s tricky, however.  In these kinds of lab tests, all sorts of variables may affect a bird’s failure or success.  The boldness or fear of an individual bird may affect its problem-solving performance.

Birds that are faster at solving tasks may not be smarter; they may just be less hesitant to engage in a new task.  So a test designed to measure cognitive ability may really be measuring fearlessness.

“Unfortunately it is extremely difficult to get a ‘pure’ measure of cognitive performance that is not affected by myriad other factors,” says Neeltje Boogert [..] a bird cognition researcher at the University of St. Andrews.”

Similar analysis is usually helpful when it comes to understanding human psychology – Charlie Munger always talks about how many people, for example, fail to notice all the cognitive biases at play in experiments like Milgram, and this is definitely a phenomenon I’ve noticed myself.  

Many authors and books and analysts take a man with a hammerapproach and try to explain everything in terms of their preferred model.  For example, Olds/Schwartz attribute to social connection, in “ The Lonely American ( TLA review + notes), behavior that I think clearly has at least some component of learned helplessness.

Not all analysts do this, of course – one reason I particularly love Christopher Browning’s “ Ordinary Men ( OrdM review + notes) is that he displays a thorough understanding of multicausality, carefully evaluating the contributions of different potential factors to a group of ordinary mens’ willing participation in Holocaust genocide.

One particular angle on multicausality that I find particularly fascinating is the “Swiss Cheese Model” of causality.  What’s a Swiss Cheese Model? It’s Miss America wrapped in a cheese dress. (Let’s be honest: that wouldn’t be the weirdest fashion choice a celebrity has ever made.)

I’m just kidding.  The Swiss Cheese model of causality, discussed in books like Don Norman’s “ The Design of Everyday Things ( DOET review + notes), and Megan Mcardle’s “ The Up Side of Down ( UpD review + notes), posits that for anything to happen, a number of stars have to align… or a bunch of slices of swiss cheese.  It’s often used in context of mistakes or accidents: for example, for a specific car collision to occur, perhaps:

– the driver has to be tired or distracted, AND

– the car’s brakes have to be old, AND

– the road has to be slippery thanks to a recent oil spill

So, which of the three “caused” the car accident?  Well, all of them and none of them.

Similar analyses could be conducted for other questions: why do people end up in bad relationships?  Why did we not do as well we wanted on that presentation?

It’s an interesting model and there are obvious interactions with margin of safety: add a layer of Swiss Cheese, and you’re less likely to have a bad outcome.

Application / impact: be aware that searching for a single “cause” is often not likely to yield helpful answers, and can lead you to be a man with a hammer.  An understanding of multicausality can be helpful in making more rational decisions.

Disaggregation Dose-Dependency x Precision vs. Accuracy (x Emergence / Complexity x Inversion x Overconfidence): The Limits Of Analysis, Why Quarks Are Irrelevant, and Lollapaloozas

Time to undo everything I’ve just told you: MECE is a lie.

I know, I do this from time to time.  (I’m sorry.) But the truth is that there’s no such thing as a “collectively exhaustive” analysis – given the scale of the world, however extensive your analysis is, there’s more to do, and at some point you have to stop.

There are two main reasons for this.  The first is that, even if we assume a relatively linear world, everything can be said to cause everything via the Swiss Cheese model mentioned above.  John Lewis Gaddis presents a very thoughtful analysis of this in “ The Landscape of History ( LandH review + notes), an astonishingly thoughtful book about historians do (or should) approach their craft.

Gaddis explains that while you could go all the way back to the start of agriculture to explain Stalin’s Soviet Union (disaggregation), since in some sense the Soviet Union never could have occurred without agriculture, that doesn’t make much sense, similarly, more emphasis is placed on Truman’s decision to drop the bomb than on the men who carried it out.  

So at some point, analysis fails to yield further results – precision vs. accuracy – which is why, as Gaddis notes, it’s important to analyze the right bits of data, rather than all of them.

Another problem lies in the idea of complexity / emergence: I don’t have a full mental model on it yet because it’s a mental model that I’m still learning about, but the idea is that “complex adaptivesystems” like human beings, cities, or ant colonies display behaviors at one level that cannot be analyzed at another.

A classic example of this is Danish physicist Per Bak’s metaphor about a sand pile, told well by Laurence Gonzales in “ Deep Survival ( DpSv review + notes).  At some point, a sand pile will begin to collapse upon itself, and Gonzales observes that there’s:

“nothing in the physics of silicon dioxide that could predict the behavior of the sand pile.”  

Similar problems underlie important phenomena like turbulence in pipelines and traffic on highways: given that they are highly feedback-driven processes, linear analysis like disaggregation will fail to yield a lot ofuseful results for modeling purposes.

One example of this comes from the aforementioned “ How Doctors Think ( HDT review + notes), wherein Groopman cites a brilliant cardiologist – Dr. James Lock – who discusses a few instances in which “impeccable logic” failed to reach the right conclusions.

“My mistake was that I reasoned from first principles when there was no prior experience.  I turned out to be wrong because there are variables that you can’t factor in until you actually do it.”  

Lock goes on to point out that the complexity of human biology means that you can’t predict everything and often need to do real-world experimentation rather than just theoretical logic.

The takeaway here is that you have to analyze a system on a relevant scale with the relevant approach: for example, a lot of people like to throw the word “quantum” around, but it usually doesn’t mean anything.  Quantum effects matter a lot if you’re making small-scale semiconductors. But they don’t do a very good job of explaining why I like chocolate, or why my shoulder hurts when I throw a football too hard.  

By inversion, I believe that a lot of these phenomena come together to form what Charlie Munger calls “ lollapalooza” effects in “ Poor Charlie’s Almanack ( PCA review + notes).  Basically, Munger’s proposition is that when you get a lot of mental models acting together, they act nonlinearly – so combining, say, social proof with incentives with salientfeedback is not just 3 + 3 + 3, but perhaps more like 3 x 3 x 3.

It is difficult to prove mathematically, but at least qualitatively, it seems to occur.  Many of the referenced books – “ Ordinary Men ( OrdM review + notes) on mass murder, “ The Up Side of Down ( UpD review + notes) on technical medical errors, or “ How Doctors Think ( HDT review + notes) on cognitive medical errors – are all good to read looking for Lollapalooza-type effects.

Application / impact: we’re back to the Mauboussin quote; one damn relatedness after another: knowing the limits of analysis and understanding of causality makes us more effective at it.