Jordan Ellenberg’s “How Not To Be Wrong: The Power of Mathematical Thinking” – Book Review, Notes + Analysis

Poor Ash’s Almanack > Book Reviews > Science + Engineering + Math > Math

Overall Rating: ★★★★★★ (6/7) (standout for its category)

Per-Hour Learning Potential / Utility: ★★★★★★ (6/7)

Readability: ★★★★★★ (6/7)

Challenge Level: 2/5 (Easy) | ~440 pages ex-notes (456 official)

Blurb/Description: Novelist and mathematician Jordan Ellenberg combines his love for math and flair for snappy writing into an engaging, accessible book providing an overview of a number of important mathematical concepts for a lay audience.

Summary: For a long time, I’ve been looking for a “math for dummies” book that translated important mathematical concepts for lay readers.  The Joy of X was too basic – almost childish – and just didn’t do anything for me, even though I read it cover to cover three times.

How Not To Be Wrong is everything I wanted The Joy of X to be.  The same person who recommended Geoffrey West’s wonderful “ Scale (scale review + notes) suggested I also check out How Not To Be Wrong, and it didn’t disappoint.

Ellenberg establishes right up front that his goal is to talk about practical math, and with occasional exceptions on uninteresting/impractical/abstract topics like twin primes, set paradoxes, and Pascal’s Wager, he mostly sticks to that goal, providing great discussion of a number of mental models like sample sizecorrelation vs. causationcontrast bias, and so on.  Ellenberg provides a lot of nuance on a number of important concepts.  The book lives up to its title and, relative to my broad reading, is the best book I’ve seen summing up major mathematical concepts for a lay audience.

Highlights: In addition to the solid content, Ellenberg is a great science/math writer, one of the most engaging this side of Jonathan Waldman – this book is even more well-written than Nate Silver’s The Signal and the Noise (SigN review + notes – which I still prefer anyway).  Ellenberg is almost Richard Thaler-like in his witty irreverence.  The footnotes are fantastic. He can occasionally get a bit ADD and go on quasi-tangential digressions, but like Sam Kean’s, they make up for their merely-tangential utility with amusingness/cuteness.  

As someone who thinks a lot about writing and how to get better at it, I can be (very) critical of awful writers like Daniel Kahneman, David Foster Wallace, or James Gleick – but I can also be respectful/admiring of phenomenal writers like Jordan Ellenberg or Jonathan Waldman ( Rust review + notes) who inspire me to step up my own game.   How Not To Be Wrong gets a brownie point or two because I think I’m a better writer for having read it.

Lowlights: Like many books, this one is much too long and could/should have been condensed by at least a third of its length; many of the examples, like equidistant letter sequences and the lottery ticket numbers, drug on and grew uninteresting.

Also, it was sort of frustrating that such a large portion of the book seemed to have the punchline, in one way or another, of the “Baltimore Stockbroker” problem (i.e. analyzing only a limited subsample of the data); each additional return to the topic provided additional nuance, but that could have been accomplished in a much less verbose fashion, leaving more room to delve deeper into other fascinating topics, like “exhaustion” or the “sandwich theorem” that are only touched on briefly but would seem to have tremendous utility.

Second, Ellenberg doesn’t always stick to his promise to avoid pointless theoretical math – twin primes, etc.  The recurring “paradox” bit about ouroboric sets and 0.999 = 1 is the sort of annoying semantic Zeno’s paradox stuff that has no practical application in day to day life (with the caveat I point out below) and is a total waste of brain cells to contemplate or read about.  Like Richard Nisbett’s wholly ineffective attempt at explaining and advocating dialectical reasoning in Mindware, these parts of Ellenberg’s book just made me roll my eyes and go into skimming mode.

That’s a shame – from having read James Gleick’s (terrible, please don’t read it) The Information (TI review), I get the feeling that this concept is actually critically important to, like, a lot of technology like computers, but it’s not something that the average reader is going to learn anything from, at least the way Ellenberg presents it.  These discussions should have either been excluded entirely or better explained; as is, they were a complete waste of my time to read.

That’s not a reason to not read the book, though – just a reason to love the rest of it (I do love the rest of it!) and skim the sections that come across, like David Foster Wallace’s writing (who Ellenberg cites several times), as censored-atory wordplay masquerading as something intellectual that has Infinite (ly low)  return on time invested.  For what it’s worth, self-contradiction is a very hard topic to cover thoughtfully; in fact, I’ve only seen it done well in a single place (though, like the Baltimore stockbroker’s mailing list, I haven’t seen all the explanations everywhere!)

Where?  Well, a much better explained, more applicable discussion of the self-contradictions implied by formal deductive logic is available in Peter Godfrey-Smith’s Theory and Reality: An Introduction to the Philosophy of Science (TR review).  It’s not really a book worth reading soon or necessarily ever, but it’s one of the few that allowed me to understand some of the nuances on that topic.

Mental Model / ART Thinking Points:  utilitytradeoffsinversionsample sizenonlinearity,probabilistic thinkingprecision vs. accuracy, reductio ad absurdum, incentives, a/b test, conditional probabilities / priorsproduct vs. packagingschema, one to many, zero-sum games, n-order impactsmargin of safetycontrast biasprecision vs. accuracygrowth mindset

You should buy a copy of How Not To Be Wrong: The Power of Mathematical Thinking if: you want a well-written, engaging read that covers a lot of important ground, with occasional detours into uninteresting topics that can easily be skimmed or skipped.

Reading Tips: Whenever you get to a section that involves a paradox, skim heavily or skip entirely, because there’s not much of use to learn in these sections.  If you feel like a section is getting too long (as I did with the Torah ELS, the “does God exist,” the twin primes, etc) feel free to skim/skip it, because the book repeats most of its important concepts enough that you’re not going to miss much.

Pairs Well With:

 The Signal and the Noise by Nate Silver ( SigN review + notes).  I personally prefer this book to How Not to be Wrong, because it’s a little more focused on a topic that’s deeply of interest to me, but it’s a toss-up which book most readers will get more out of.  Read them both!

 How Doctors Think by Jerome Groopman (HDT review + notes).  Frequent PAA readers know that this is one of my favorite books; it covers a lot of ground, ranging from cognitive biases to how doctors do (and don’t) use Bayesian reasoning out there in the real world.  There’s also some good discussion of the research-findings / “ incentives” angle.

Scale by Geoffrey West.   (Scale review + notes) Wonderfully written (which one would not expect from a physicist), the first half of Scale merits 7 stars, as a concise yet thorough explanation of complex adaptive systems and scale effects that makes the math easy for us lay folk – the second half of the book isn’t as good and sort of goes out with a whimper, but it’s still well worth reading.

“ o Engineer is Human (TEIH review + notes) by Henry Petroski.  A great, practical application of margin of safety and critical thresholds that pairs well with some of the concepts in this book.

Rust: The Longest War” (Rust review + notes) by Jonathan Waldman.  Completely unrelated but even better-written than this book.

Superforecasting” by Philip Tetlock (SF review + notes).  Another book that hits  probabilistic thinking.

Reread Value: 3/5 (Medium)

More Detailed Notes + Analysis (SPOILERS BELOW):

IMPORTANT: the below commentary DOES NOT SUBSTITUTE for READING THE BOOK.  Full stop. This commentary is NOT a comprehensive summary of the lessons of the book, or intended to be comprehensive.  It was primarily created for my own personal reference.

Much of the below will be utterly incomprehensible if you have not read the book, or if you do not have the book on hand to reference.  Even if it was comprehensive, you would be depriving yourself of the vast majority of the learning opportunity by only reading the “Cliff Notes.”  Do so at your own peril.

I provide these notes and analysis for five use cases.  First, they may help you decide which books you should put on your shelf, based on a quick review of some of the ideas discussed.  

Second, as I discuss in the memory mental model, time-delayed re-encoding strengthens memory, and notes can also serve as a “cue” to enhance recall.  However, taking notes is a time consuming process that many busy students and professionals opt out of, so hopefully these notes can serve as a starting point to which you can append your own thoughts, marginalia, insights, etc.

Third, perhaps most importantly of all, I contextualize authors’ points with points from other books that either serve to strengthen, or weaken, the arguments made.  I also point out how specific examples tie in to specific mental models, which you are encouraged to read, thereby enriching your understanding and accelerating your learning.  Combining two and three, I recommend that you read these notes while the book’s still fresh in your mind – after a few days, perhaps.

Fourth, they will hopefully serve as a “discovery mechanism” for further related reading.

Fifth and finally, they will hopefully serve as an index for you to return to at a future point in time, to identify sections of the book worth rereading to help you better address current challenges and opportunities in your life – or to reinterpret and reimagine elements of the book in a light you didn’t see previously because you weren’t familiar with all the other models or books discussed in the third use case.

Page 1: Ellenberg makes himself likable from the first page.  Starting with the age-old grumble of math students – when am I gonna use this crap? – Ellenberg notes that the standard math teacher’s answer is seldom satisfying because, quote, “it’s a lie.” 

On  utility.

Page 2: Ellenberg notes that even if you don’t go into mathematics as a career, math helps you understand the world.

Pages 5 – 8: He wastes no time in getting into a useful example.  He notes the scenario of having to armor a plane, which is a tradeoff – too much armor and the planes will be heavier and less nimble; too little armor and, well, bad things will happen.  Where do you put the armor?

Mathematician Abraham Wald, using inversion, came to the opposite answer from WWII army officers.  Reviewing data on the density of bullet holes in various areas of the plane, the officers wanted to put more armor where there were the most bullet holes – but Wald recognized that the data was only for the planes that came back.  There was no reason to expect that certain parts of the plane would be hit disproportionately often relative to other parts of the plane; if you start with the presumption that bullet holes should be evenly distributed, then the areas where you find the least bullet holes need the most armor… because the planes with more holes there didn’t come back.

Ellenberg comes back to this “sample too small” problem repeatedly; in this context it’s survivorship bias ( sample size) but there are a number of other broader applications.

Pages 12 – 13: Ellenberg notes that mathematics is the study of things that must be, as opposed to a set of arbitrary rules that we define.  This is an oversimplification that isn’t technically true, as he gets to later with regard to defined quantities, and of course the set theory nonsense.

Pages 15 – 17: Ellenberg creates a four-quadrant Matrix that is sort of directionally like the Covey quadrant, except for math.  He lays out his thesis for the book: to stick to “simple and profound” mathematical ideas that “can be engaged with directly and profitably.”  He mostly sticks to this, although as stated, some stuff (like the sets and number theory stuff) have no business being included.

Pages 23 – 24: Here’s an irreverent take on dose-dependency / nonlinearity: how much Swedishness is too much?  The answer depends whether you’re in Denmark or Sweden.

… okay, that’s not actually what Ellenberg says about Swedish economics, but I used to be good friends with someone from Denmark and Danes and Swedes, it turns out, are a bit like the Hatfields and McCoys and various other territorial neighbors: they’re basically the same to everyone outside their borders, but they don’t get along at all and like beating each other up (sometimes literally).  What Ellenberg actually says here is the fact that while we like to see much of the world as straight lines, much of the world is actually not a straight line.

To use a practical example from outside of the book, consider the data on productivity and working hours presented in Alex Soojung-Kim Pang’s Rest (Rest review + notes) (similar data can be found elsewhere).  If you’re currently at zero, the right direction to move on the x-axis is right… working more hours will help you get more done.  But beyond a certain point, working even more hours will not help you get more done, and in fact in many cases might make you get less done.

Pages 34, 37 – 38, 40 – 41: The “method of exhaustion” was one of the few things in calculus that I just intuitively got; Ellenberg discusses it here, more briefly than I’d like.  The idea is that if you can’t figure out X – where X is some function or area or whatever – but you can figure out some Y and Z that are bigger and smaller than X, you can eventually “sandwich” X between Y and Z and know, to a good approximation, what it actually is.

The corollary to this is the derivative; if you zoom in far enough, every curve is a straight line.  (Literally, it turns out, thanks to quantum mechanics.)

Pages 42 – 48S: Here, Ellenberg discusses the “0.999 (repeating) = 1” paradox.  He sort of violates his dictum to be practical here and basically comes to the conclusion that some numbers and series are what we define them to be (kind of).  

Pages 51 – 55: Here, Ellenberg brings up regression, and offers the same caution as Nate Silver in The Signal and the Noise ( SigN review + notes): in Ellenberg’s words:

You can do linear regression without thinking about whether the phenomenon you’re modeling is actually close to linear. But you shouldn’t. - Jordan Ellenberg Click To Tweet

He provides the example of a missile being fired; over a small enough increment of time, you can approximate it linearly, but at some point the missile will come back down and hit its target.

Nonlinearity.

Pages 56 – 58: Ellenberg provides a pretty thoughtful discussion of the various approaches to math pedagogy; he tries to take a middle ground.  I personally never found much use in geometric proofs, and avoided doing them entirely, preferring Mathcounts style practical problem solving (which I loved), but that’s a discussion for another time.

Page 60, Page 61: Another good example of nonlinearity, not to mention an internal consistency problem: obviously not more than 100% of Americans can be obese, and obviously if all Americans are obese, then it is logically required for all subgroups of Americans to be obese.  And yet you can run a model that gives you conclusions that say some black males Americans will still be fit and trim while all Americans are obese.

Page 63: I’m not sure this is actually relevant, but it made me laugh:

“When there are two [Tuvaluan] men left in the bar at closing time, and one of them coldcocks the other, it is not equivalent in context to 150 million Americans getting simultaneously punched in the face.”  

This book is a joy to read, good job Ellenberg.

Pages 65 – 67: Ellenberg brings up the Law of Large Numbers as it relates to sample size; smaller samples tend to display more volatility/noise in the data, that is often meaningless.

Pages 71 – 72: Ellenberg raises the idea of standard deviation, which shrinks by the square root of the sample size.

Pages 73 – 74: He doesn’t explicitly mention it here, but the idea of conditional probabilities is sort of brought up here.  Flipping a coin four heads in a row has a probability of 1/16, but if you’re alreadyat three heads, getting a head the fourth time is still 50-50… you’re not due for a tail.

Pages 80 – 83: Ellenberg provides some useful commentary here on percentages and how gross vs. net numbers can be misleading.  This comes up in investment analysis sometimes; segment profitability, for example.

Page 98: Here is the Baltimore Stockbroker problem; the punchline is that “improbable things happen a lot,” so if you have a big enough sample size, you’ll have plenty of improbable things.  In other words, if you give enough monkeys enough keyboards, one will eventually type Hamlet.

The broader point (that Ellenberg gets to later, in other ways) is that, as with the airplanes, you have to be careful to make sure you’re not analyzing a cherry-picked dataset and using that to draw conclusions about things that are outside of that data set.

Pages 102 – 103: Via an anecdote about a mind-reading dead fish, Ellenberg here provides some criticism on fMRI, illustrating the opposite side of the “improbable things are probable” coin – use a large enough data set, and you’ll find some noise that looks interesting, but means nothing.

Page 107: a brief note on precision vs. accuracy in context of missiles: does it really matter if the missile hits in 40.4 seconds or 40.6?  You’d better start running.

He returns to this concept later.

Page 110: Ellenberg notes that probability is much less intuitive than, say, arithmetic and geometry… which is one of the reasons it’s so important to emphasize, in my view.

Pages 111 – 114: Some heavier stuff: the “frequentist” approach to probability relates to the fact that given enough iterations, observed data is highly likely to converge on the probability.  

However, for certain events, there aren’t infinite iterations; I would use the example of predicting who will win the Super Bowl, or the next election (in the vein of Silver).  

Ellenberg sort of touches on priors and Bayesianism but doesn’t get all the way there (yet); for now, he focuses on significance testing and the null hypothesis.  The null hypothesis, simply, says that nothing interesting is happening: your new drug has no effect on the patient’s condition; your new leadership strategy has no effect on company performance.

The idea of statistical significance is that, if your options are “the drug doesn’t work” or “the drug does work,” just because you see an improvement doesn’t prove anything… it could just be luck.

The p-value is the probability of the null hypothesis being true given observed experimental results; 0.05 (5%) is generally used as the standard of statistical significance.

Pages 117 – 121!: Ellenberg immediately moves on to point out that “significance” is not always significant; he also points out, relative to health studies finding a “significant” correlation between one thing and another, the importance of looking at the denominator of ratios: twice a really small number is still a really small number.

This sort of outside-in, utility-focused calculation is actually pretty valuable; recall the part of “ How Doctors Think (pages 89 – 90 – HDT review + notes) about why 90 year old women with clean mammogram histories don’t need another one.  Also, to some degree, this Pearls Before Swine comic

Anyway, Ellenberg concludes by noting that “statistically significant” means something more like “statistically detectable.”

Page 125: Studies can be overpowered or underpowered: underpowered studies aren’t capable of detecting small phenomena, while overpowered studies might detect a really small phenomenon that isn’t all that important.

Page 128: Commenting on Kahneman/Tversky’s findings on the “hot hand” phenomenon, Ellenberg notes that the hot hand probably doesn’t really exist, but that doesn’t mean the underlying statistical methodology used to determine that was appropriate or correct.

Pages 131 – 132: Ellenberg here brings up reductio ad absurdum and internal consistency; he doesn’t square up (to my satisfaction) why we can use this but why the stupid ouroboric set nonsense can be self-contradictory, Godel’s theorem, etc.

Pages 136 – 137: The problem is that “reductio ad unlikely,” as Ellenberg calls it, is basically the foundation of statistical significant – if something is statistically significant, that basically means there’s a 5% chance that it isn’t measuring something real, and since that’s pretty unlikely, the thing under measurement is probably true.  Ellenberg provides a useful example: given the rarity of albinos, you can conduct a statistical test that will tell you that it’s unlikely that a group of 50 people, including 1 albino, is a group of humans.

But here we come back to the Baltimore Stockbroker problem: Ellenberg notes that the probability of the same lottery sequence coming up twice in a row is the exact same as the probability of that sequence coming up next to some other randomly-picked sequence… if you have a large enoughsample size, lots of improbable things will happen.   Sample size and  probabilistic thinking.

Page 138: Yitong Zhang is mentioned here.

Page 140: Ellenberg comes up with a term called the “flogarithm,” which is the number of digits something has (the fake logarithm).  It actually works perfectly for base 10 if you round up; he’s using base-e.

Pages 147 – 155: Ellenberg here discusses John Ioannidis’s paper “Why Most Published Research Findings Are Wrong.”  It’s a short and relatively easy-to-understand paper, even if you skim over the math, so I’d recommend reading it.  (You might consider the Backtest Overfitting paper while you’re at it, as well as Ioannidis’s later  2016 paper, Why Most Clinical Research Is Not Useful).  Ioannidis seems like a firecracker with a lot of interesting views; in a more recent (2016) interview about evidence-based medicine, he noted dryly:

Many public funding agencies are accustomed to funding only research that clearly has no direct relevance to important, real-life questions, so perhaps they didn’t know where to place my application.

Ahahaha okay.  Utility.  Reminds me a bit of Richard Thaler in Misbehaving:

Finally, for some reason[,] the study of “applied” problems in psychology has traditionally been considered a low-status activity.”

Anyway. I’m integrating points from Ioannidis’s paper with Ellenberg’s discussion.

Ellenberg later cites XKCD comic 882, “Significant,” which gets at (one of) the roots of the problems here: the Baltimore Stockbroker problem, i.e. sample size.  If you look at a small enough part of a large enough data set, random noise / statistical variation all but ensures you’ll find something of interest (see the previously-referenced section on dead fish functional MRI).  In the paper, Ioannidis notes that the smaller the “effect size” of something being studied, the harder (obviously) it is to distinguish it from noise. For large effect sizes, he cites the example of smoking on lung cancer; smaller effect sizes would be something like the effect of intake of one nutrient on tumor development.

Ellenberg provides a nice matrix showing true positives, false positives, true negatives, and false negatives.  The significance test does what it’s supposed to do, but that doesn’ tnecessarily tell you anything useful. He provides a great, memorable example of the peril of small sample sizes (do ovulating women prefer Romney or Obama?) on page 149, as well as a genomic example for complex polygenic traits on page 150.  

The XKCD cartoon, among other things, ends up supporting the view (also expressed by Nate Silver) that we shouldn’t just take the Renaissance big-data correlation approach to science, but rather restrict our searches to factors with likely causality.

Ellenberg goes on to discuss the where-are-the-holes-in-the-airplane problem with regard to published scientific studies; there’s an example of incentives here.  If you take at face value Peter Godfrey-Smith’s discussion in Theory and Reality that recognition (in one form or another) is the chief motivator of scientists, or if you agree with Ioannidis that some areas of science have the potential for remuneration (see discussions in, for example, Meredith Wadman’s The Vaccine Race [ TVR review + notes] or Jerome Groopman’s How Doctors Think [ HDT review + notes]), then there’s obviously much more incentive to have publishable results than to not have publishable results.  This results in perhaps less verification of existing work than should be done, and…

“P-hacking,” as Ellenberg puts it; Ioannidis refers to “data dredging.”  Elenberg also calls it “torturing the data until it confesses” (apparently a term of art).  Basically, Ellenberg notes that an altogether-suspicious number of papers (in aggregate) just barely meet the threshold for statistical significant, suggesting that the data is being manipulated.

Ellenberg doesn’t (here) really go into the angle of what should be done differently from a research standpoint, but Ioannidis does.  I think it’s worthwhile to consider even though I’ll never conduct academic research, because in a certain sense, I’m also trying to come up with relevant patterns among broad data (even if it’s not all literally quantified).  The same factors that cause researchers to get bad results are likely, directionally, to cause me to get bad results.

Small effects combined with small samples.  I mentioned this earlier.  For things with big impacts (ex. “What happens to my mood if I skip a meal or don’t get enough sleep,”) not a very large sample size is required to come to the correct conclusion (“my mood sucks.”)  For things with small impacts (ex. “What impact does a single small share sale by a company executive have on the stock price”), you need a very large sample size to come up with the right conclusions.

Incentive or desire bias.  Obviously if there’s some utility in arriving at a certain conclusion, you’re more likely to arrive at a certain conclusion.

Replication/corroboration.  An effect observed by one team on one small aspect of a large problem with many relationships is more likely to have error than an effect observed by many teams on many aspects of the problem.

“Wiggle room” destroys intellectual honesty in terms of both study design and measured outcomes. Ioannidis notes that “The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.”  In terms of study endpoints: True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes)

Lots of useful takeaways there.

Pages 156 – 159: The flip side of this coin, argues Ellenberg, is that it’s not just the false positives (i.e. the things that are statistically significant but practically insignificant) that are the problem.  It’s also the false negatives (i.e. the things that aren’t statistically significant, but may be practically significant).  Discussing a hypothetical A/B test, Ellenberg notes that “confidence intervals,” the inverse of significant (in a way), can be more intellectually helpful than a simple, binary yes/no.

I tend to agree; going back to one of the book’s whole themes about attempting to reduce a non-linear world to linear models, I think one challenge is that people often focus on a midpoint which ignores probabilistic thinking

There’s also a utility inversion 80/20 / directionality) play here.  Say, for example, that you don’t know the exact payoff of some decision.  If the likely utility cost is small and the likely utility gain is larger than the cost, over all the decisions you make, it makes more sense to take the ups than the downs.

Finally, see also pages 249 on of Nate Silver’s “ The Signal and the Noise ( SigN review + notes), where he also discusses the Ioannidis paper and relates it to  Bayesian reasoning / priors.

Page 161: Ellenberg here mentions the incentives problem for replicating studies: it’s not sexy.  We don’t remember everyone who agreed that Darwin’s theory made sense… we remember Darwin for coming up with it.

Page 163: Ellenberg mentions the famous Target story here…

Page 165: Count Ellenberg a quasi-big-Data skeptic; a little bit like Geoffrey West in “ Scale ( Scale review + notes) discussing complexity theory, Ellenberg notes that modeling complex systems is hard.

Pages 168 – 171: important pages on asking the right questions.  The probability of X given Y is not the same as the probability of Y given X.   Conditional probabilities and  priors /  Bayesian reasoning.

Ultimately, the challenge with the p value is that it ignores the likelihood of a given condition among the total population rather than the sample (back to the albinism problem).  Ellenberg cites R. A. Fisher:

You have to evaluate each hypothesis in the ‘light of the evidence’ of what you already know about it. - Jordan Ellenberg (from R. A. Fisher) Click To Tweet

Page 173: Back to the randomness challenge; humans don’t think numbers ending in 0 or 5 are random… 7s are popular.

Pages 178 – 180: Here, Ellenberg discusses Bayesian reasoning, in somewhat more qualitative a way than Nate Silver in The Signal and the Noise.  Addressing readers’ concerns about whether or not it’s appropriate to apply prior beliefs to evaluating new evidence, he notes the be a filter, not a sponge model implicitly.  Ellenberg’s example – what if you found the same statistical result for a study investigating a new version of an old, proven cancer drug, and a ridiculous alternative? – is worth quoting at length:

“You’d like to say that your beliefs are based on evidence alone, not on some prior preconceptions you walked in the door with.  

But let’s face it – no one actually forms their beliefs this way. If an experiment […] slowed the growth of […] cancer […] by putting patients inside a plastic replica of Stonehenge, would you grudgingly accept that [vibrational earth energy was curative?]  

You would not, because that’s nutty. You’d think Stonehenge probably got lucky.

You have different priors about those two theories, and as a result you interpret the evidence differently, despite it being numerically the same.”

Great example of  priors /   Bayesian reasoning.  This is also explored in Nate Silver’s “ The Signal and the Noise ( SigN review + notes).  Silver:

numbers have no way of speaking for themselves.  We speak for them. We imbue them with meaning. 

[…] It is when we deny our role in the process that the odds of failure rise.  Before we demand more of our data, we need to demand more of ourselves.

Page 182: Richard Feynman sighting!  Feynman notes that improbable things happen all the time by pointing out the amazingness of him happening to see a specific, completely random license plate in the parking lot (a one in a million chance!)

Page 184: Ellenberg notes, sort of circumspectly, that theories that can’t be invalidated are bad theories, using the example of conspiracy theories.  This is also a bit like the “Linda the bankteller who may or may not also be a feminist” problem.

Page 185: A good example of, among other things, product vs. packaging  and schema: Ellenberg relays an anecdote about a friend from college who had a box of t-shirts made up to sell, and never sold them all, and so started wearing them (a clean one each day) for the rest of the semester rather than doing laundry.  Of course, everyone else on campus thought he was that guy wearing the same shirt for two months in a row.

One of the points here, touched on as well by Godfrey-Smith in Theory and Reality, is our general failure to consider alternative hypotheses.  “ The Halo Effect” by Rosenzweig ( Halo review + notes) also cross-references here.

Page 197: Ellenberg introduces expected value here, with the somewhat unintuitive note that if you already own a lottery ticket, buying another one is a (very modestly) less bad bet than buying the first one was.

Pages 198 – 199: A core concept related to expected value is that it isn’t actually what you expect– in many cases, it’s impossible to actually get the expected value.  The lottery-ticket example is helpful: oversimplifying, either you win the lottery (worth some positive amount) or you don’t win anything (worth some negative amount).  On average, buying a $1 lottery ticket might lose you $0.60 or whatever, but you’re never actually going to lose $0.60 – but as with the earlier Law of Large Numbers, the expected value is what actual observable results would trend toward over time if given enough iterations (a large enough sample size).

Page 200B: cross-reference the Flynn effect and one-to-many here: discussing Edmond Halley’s math on annuities (yes, same Halley as the comet), Ellenberg notes that most modern readers will be aware that older people should be charged less for life insurance or annuities than young people, because, y’know, they’ll live longer.  But this is not a fact that was always obvious to everyone! The British government, it turns out, would let anyone buy an annuity for the same price.

So, things that seem obvious now weren’t always, and a lot of things that may be “obvious” to me or to you may not be obvious to the rest of the world.

Pages 212 – 213: Expected value is additive – meaning, if E(A) is 1 and E(B) is 2, then E(A+B) = 3.

Pages 217 – 222: Here is one of the few examples of the lovely sandwich theorem, among other things.  Trying to find the probability that a needle, dropped on a wooden floor, will cross one of the slats, is sort of hard to do.  However, for various reasons, it’s not so hard to find the area of a circle… and as we learned earlier, a circle is functionally equivalent to a very many, many-sided polygon.  Or a series of many, many needles.

I need to think a little more about how to apply this in everyday contexts.

Page 231: A great example of zero-sum gamesarms races, and n-order impacts: the way the specific lottery analyzed by Ellenberg was set up, the “excess returns” captured by (naturally) math-savvy players were competed to a lower and lower level.  At some point, the players had to start considering not just their own actions, but what would happen after everyone else was playing too.

Page 233: I used to have a friend who was absolutely neurotic about missing flights/trains/etc, for reasons that I… honestly, nope, never quite understood.  If you’re missing no planes, says George Stigler, you’re missing too few!

Pages 234 – 235: Utility, yay utility, everyone knows that’s my favorite mental model.  (At least as of right now; like Marks has a lot of Most Important Things, I have a lot of Favorite Mental Models.)

Ellenberg explains the economics concept of utility… my explanation is broader than the economics one, of course, but the general idea is that everything (experiences, etc) has a “value” to us, and the rational goal of our lives should be to maximize total value, and when we make decisions that don’t maximize our total value, we’re being bad.

So for example, if the value of $10,000 is five “utils,” and the value of getting to see your family every night is seven “utils,” you shouldn’t take a new job that will pay you $10K/yr more but won’t let you be home for dinner, then you shouldn’t take the job, because the benefit of taking the job is five utils, but the opportunity cost is seven utils.

Of course, it’s questionable whether or not we’re even capable of quantifying, let alone rationally assessing, what maximizes our utility, in precise terms (see Thaler’s Misbehaving – M review + notes).  It is nonetheless a very useful concept.

Importantly (Ellenberg gets there later), utility shouldn’t be viewed as static or fixed.  For example, various studies generally find that income – obviously – boosts happiness up to a certain point: it’s a bit hard to be happy when you can’t put food on the table, or when you’re constantly worried about where next month’s rent comes from.  So for a person in this position, an incremental $1,000 is likely to be very meaningful indeed.

On the other hand, the marginal utility of an incremental $1,000 to, say, Warren Buffett, is close to zero… because there is nothing Warren Buffett could do with $1,000 more than he can do with his current net worth.  Marginal utility =  utility x  dose-dependency (the “Swedishness” and missile discussions earlier – lines don’t go up forever).

Of course, individual preferences are different, these are just averages, etc etc.  The concept remains the same. Another example would be the basis of capitalism; if a rancher has 1,000 cows, the marginal utility of another steak to him is pretty low; whereas if the farmer has 1,000 barrels of flour, the marginal utility of another barrel of flour to him is pretty low – but the marginal utility to each part of what the other has is very high, because now both can have a nice dinner of steak and bread.

Pages 243 – 248: a good discussion here of the marginal utility of money

Pages 250 – 251: Ellenberg here discusses the somewhat unintuitive idea of complementary probabilities… I’ve presented a simpler graphic below demonstrating why, in some cases, if you are allowed to choose two rather than one, you don’t want to choose the two individual highest payoffs, but rather the two that get you closest to a payoff in all scenarios.

Imagine the total box as the total range of possibilities, and the shaded portions (black, maroon, lime green) as the potential bets you can make.  If you want to win under all circumstances, and you get to make two bets, and the outcome is random, you don’t want to make black and maroon your bets, because even though they’re the two biggest, they overlap… you want to make black and green the bets.  (Assuming there’s not a double payoff for both being right on both black and maroon.)

An example of  inversion, and also kind of  margin of safety.

Pages 254 – 257: Here, Ellenberg talks about variance and diversification.

While I don’t subscribe to the traditional financial thinking on volatility-as-risk (which, as discussed extensively by Howard Marks among others, is just plain wrong), I do actually prefer low-variance in another context for reasons that are vaguely related to Bayesian reasoning.

Summarily, let’s choose two kinds of investments, neither of which I usually invest in, but they work for discussion purposes.  Let’s say one option is a consumer packaged goods company that sells toothbrushes and cereal – products for which demand tends to be incredibly stable and predictable.  Let’s say the other option is the complete opposite: an early-stage biotech company whose drug will either work, with some small probability, and be worth tons of money, or not work, with some large probability, and be worth no money at all.

Let’s say reasonable analysis concludes that both stocks are worth $100 per share, and both stocks are trading at $80 per share.  Which would I prefer?

The answer is, easily, the first one: not only for the first-level reason (as Ellenberg discusses) that it’s a sure thing, but also for the reason that it’s much easier to track the quality of my decisions over time and update my priors.  In inherently high-spread situations, it’s difficult to know what to do withfeedback: for example, if you live in Texas, you know that the weather right now is not a good predictor of the weather this evening, let alone tomorrow or next week.  So during certain seasons, it’s hard to know whether to pack an umbrella or deodorant… or both. It’s hard to learn anything from experience. But in lower-spread situations, where less things can happen, if you’re wrong, it’s easier / more important to take notice and adjust your process.

That’s just my view, of course, and I’m sure there are plenty of other investors who quite profitably invest in high-spread situations.  I’ve personally found lower-spread ones to be much easier.

Page 261: Ellenberg mentions the “combinatorial explosion” here – thanks to exponential growth, one modeling challenge is that actual scenario analysis of everything that could happen can quickly get out of hand.  The specific example is the “traveling salesman” problem about finding the best route from A to B… I’m not sure why Ellenberg didn’t mention something like UPS’s ORION.  Worth reading about..

Page 263: The projective geometry stuff here is fascinating, and explains why I can’t draw.

Pages 270 – 272: The concept of redundancy (a form of margin of safety), and the opportunity cost and utility of it, is wonderfully clear in information theory.

Without getting into the specific math, the idea here is that messages can in fact be transmitted quite efficiently (ppl wh cn stl rd ths sntnc r smrt!), but the risk of efficiency is that if something gets lost or transmuted along the way, the content of  the message gets changed. So language is naturally somewhat inefficient and this is, on balance, good rather than bad.

The flip side is, you know, opportunity cost: there can be too much of a good thing.  In an engineering context, as Henry Petroski notes on page 6 of “ To Engineer is Human ( TEiH review + notes):

“all bridges and buildings could be built ten times as strong as they presently are, but at a tremendous increase in cost […] since so few bridges and buildings collapse now, surely ten times stronger would be structural overkill.”

Similarly, in The 7 Habits of Highly Effective People ( 7H review + notes), Stephen Covey notes that when it comes to human interaction, effectiveness is often more important than efficiency.  You can break up with someone over text… but you shouldn’t.

Page 275: Geometry applied to error-correcting codes: it turns out for various reasons that I can’t explain as well as Ellenberg does that the fact that lines are defined by points is kinda helpful for these.

Page 278: Good examples of the tradeoffs with respect to language efficiency.

Pages 288 – 289: Kahneman/Tversky/behavioral economics briefly mentioned here.  Richard Thaler’s Misbehaving ( M review + notes) is my favorite book.  (I say that a lot, too.)

Page 291: Ellenberg uses utility theory to explain why people start businesses, even given the odds against them.  One point he doesn’t mention, that he should, is that it’s often not about a grand dream… it’s simply about wanting to be your own boss.  Autonomy is an important human motivation.

Pages 301 – 304: Francis Galton comes up here (as he does, seemingly in a lot of places).  As discussed in many other places, he discovered regression to the mean.

Pages 312 – 315: yay scatterplots.

Pages 317 – 319: Fun fact: apparently Paramore was a boat before it was a band.  Anywho, contour maps and ellipses are discussed here.  

Pages 323 – 325: I feel old now; conic sections apparently aren’t called conic sections anymore.  

Pages 329 – 330: Ellenberg doesn’t highlight it, but there’s a super super important point here, which is that more data isn’t always better data unless it tells you something new.  Ellenberg uses (sort of) the example of certain body features being correlated to one another: if you’re taller, you probably have longer arms too, so knowing that you have long arms, given that we already know that you’re tall, doesn’t really tell us very much new about you.

Yet a ridiculous number of people in a ridiculous number of situations fall prey to this: gathering more data about the same subject is not always going to give you more insight.  This is due to various factors, ranging from the 80/20 rule to others.

Pages 337 – 340: Ellenberg here explains vectors, which I never much loved.

Pages 341 – 343: The important takeaway from vectors is that if X is correlated to Y, and Y is correlated to Z, that doesn’t necessarily imply anything about X’s correlation to Z.  It’s not transitive.

Page 346: Again, here, the takeaway is that “no correlation” doesn’t mean “no relationship.”

Pages 347 – 351: correlation vs. causation – just because two things are correlated, doesn’t mean one caused the other.

Page 353: twin studies!

Pages 355 – 356: here, Ellenberg makes the directionality / expected utility argument for making decisions under uncertainty: there are lots of times we’re going to be wrong, but we should still make decisions anyway.  We just need to make the ones that have the smallest cost if we’re wrong and the biggest payoff if we’re right.

Pages 361 – 362!: these are some of the best pages in the book.  Ellenberg provides a witty, concise explanation of the Baltimore Stockbroker problem as it relates to what I’d classify as an example of the availability heuristic – a situation in which our schema is overly narrow.

Basically, Ellenberg proposes an alternative explanation for why people tend to see an inverse correlation between “niceness” and “attractiveness” in their prospective dating partners.  If you presume these traits are random and uncorrelated, and both have some level of utility in a relationship context, then most people have a critical threshold for some weighted combination of attractiveness and niceness.  If you’re really nice and just okay-looking, you make the cut; if you’re really good-looking and tolerable, you make the cut too.

What that leads to is a general inverse correlation, among your prospective dating pool, between niceness and attractiveness – that may actually not exist more broadly.

As Ellenberg puts it: “Be honest – the mean uglies are the ones you never even consider.”  

A similar availability heuristic relating to the supposed inverse correlation of IQ and social skills is discussed in Stuart Ritchie’s Intelligence, specifically on pages 56 – 57.  I offer some further thoughts in the notes there which are similar to the ones above.

Pages 366 – 367: Ellenberg discusses, fairly extensively, some of the problems with averages: they can generate nonsensical results.  A nice extension of this is Gladwell’s famous parable about Diet Pepsi and tomato sauce; the punchline is that there’s no perfect tomato sauce – some like it chunky, some like it garlicky, some like it spicy – and if you try to create a singular “best” sauce that averageseveryone’s preferences, pretty much nobody will like it.

Page 382: Ellenberg brings up the “asymmetric domination effect” which is, more or less, just contrast bias.  In this case, if you give someone a choice A, then give someone a choice that is clearly worse than A, they like A better.

Be careful of this when you’re buying a car – there’s some implicit thinking of “well, compared to the decked-out version, these useless upgrades aren’t that expensive…” that dealers undoubtedly try to encourage!

Pages 384 – 386: I enjoyed Ellenberg’s discussion of various methods of voting, which is a surprisingly hard problem to solve.  What is clear, at least, is that the system currently used in the U.S. isn’t optimal!

Page 408: Here’s the pointless semantic nonsense: I literally could not care less about ouroboric sets.

Page 410: The point here, not very well explained, is that a lot of stuff is just what we define it to be… I think.

Page 419: Again, the paradox stuff is actually important (I think) in an information-theory context, but Ellenberg doesn’t do a good job of making it understandable for a lay reader.  Peter Godfrey-Smith, of Other Minds fame, actually does a better job (to my ears/eyes, anyway) in Theory and Reality.

Page 425: Ellenberg doesn’t have much use for Roosevelt’s “Daring Greatly” speech (not clear if he’s familiar with Brene Brown).  He notes, like Nate Silver, the importance of dealing with uncertainty and acknowledging uncertainty.

Page 427: Of course, Ellenberg eventually references Nate Silver: Ellenberg relays a story about a boss of his who wanted a “number” for people who would have TB in 2050 (which was impossible).  Defending Silver against critics, Ellenberg notes that critics looking for a non-probabilistic election call “make [me] want to stab [myself] in the hand with a fork.  What Silver offers isn’t hedging, it’s honesty.”

This is an important point to dwell on for a bit, because it’s a mash-up of product vs. packaging and local vs. global optimization and a few other mental models that I’m probably forgetting.  I’ve written a paper about how this plays out in the investing world; I believed then (and still believe today) that firm and industry structure incentivizes precision vs accuracy; that is to say, analysts are not paid big bucks to say “well, their expansion initiative COULD be successful, in which case the stock could be worth a lot, or it could fail, and the stock would be pretty overvalued, and honestly it’s a hard call and I have no particular insight into the matter despite conducting a lot of research and won’t get any more insight by conducting more.”

That’s the sort of stuff that gets you passed up for promotions, if not outright fired.  And yet it’s often what good investing work looks like.  See Philip Tetlock’s “ Superforecasting ( SF review + notes).

Page 429: Here’s one paradox I can actually get behind, in fairness to Ellenberg: he quotes philosopher W. V. O. Quine (that has to be a pseudonym, right?)

“To believe something is to believe that it is true; therefore a reasonable person believes each of his beliefs to be true; yet experience has taught him to expect that some of his beliefs, he knows not which, will turn out to be false.  A reasonable person believes, in short, that each of his beliefs is true and that some of them are false.”

Yes.  That. Although not quite exactly; I’d rephrase it a little bit in directionally-Bayesian terms: your “beliefs” are a set of priors that you obviously believe, but you update those priors based on evidence.

So, for example, in an investing context, I believe every stock I buy is an attractive investment opportunity.  But I also know that incremental data might change that belief, and at such time as I’ve seen sufficient incremental data to justify changing the prior (i.e. “this stock is a good investment”), then I no longer believe it.

Paradox resolved.

Page 430: Ellenberg’s one criticism of Silver is that he’s actually “too” precise.  (Ellenberg points to decimal points.) This is one of the reasons that in my investment letters, I disclose returns to a whole number rather than to the decimal point, even if doing so requires me to round down my returns on occasion – first of all, the numbers I report at quarter end are based on my tracking spreadsheet and thus typically not as precise as the official statement / audit numbers, so it makes no sense to report decimals that I’m not confident in.  Second, I also think that in general, long-term investing is not a sport in which decimal points are that meaningful.

Pages 432 – 434: Ellenberg starts with another pointless, annoying paradox that’s completely irrelevant, but gets to a few important things: the growth mindset and inversion.  I.e. treating failure as a step rather than an endpoint, as well as actively trying to disprove things you’re trying to prove.

First Read: spring 2018

Last Read: spring 2018

Number of Times Read: 1

 

Review Date: spring 2018

Notes Date: spring 2018