Biased Confirmation

[<< | Prev | Index | Next | >>]
Sunday, December 20, 2015

Biased Confirmation

It has been a pet project of mine for nearly two decades to unravel exactly how and why highly intelligent people can reach such opposing conclusions from the same evidence. I realized early on that it boiled down to different priors, which is usually where this endeavor stops, but that doesn't answer how and why people's priors come to differ in the first place, or more importantly why they don't update and converge as more evidence comes in. That is, why, exactly, do even smart, introspective people have such Confirmation Bias? Sure, people are emotional, arrogant, afraid of having been wrong, and all of that, but I am concerned with the most objective, most rational, most honest individuals I have ever found, and why those people still reach diametrically opposed conclusions from each other. It has always felt like a mild form of insanity to me. Herein I offer an explanation.

Let's call the two sides the skeptics and the (conspiracy) theorists, remembering that we're talking about highly intelligent people on both sides. The difference I want to call attention to is in precisely the nature with which each approaches evidence. The skeptic asks "What are the odds of that hypothesis given this evidence?" while the theorist asks "What are the odds of this evidence given that hypothesis?"

Which one is the right approach?

It turns out not only does one side lead to different conclusions than the other, but one side converges on a common conclusion (which is what we would hope for) while the other side converges on many opposing conclusions (aka insanity). Sadly, the latter is far and away the majority side. But which side is which? Let's consider an example:

A news source, which all agree is reputable, says they have acquired secret documents showing that a certain nefarious public figure has done something nefarious, but they cannot release the documents due to national security. Said accused figure goes on record with the claim that the news agency is conspiring against him and has fabricated everything.

The guy at the bar says "Pshhh! Now we know that guy's a scoundrel!"

The skeptic says "Well, to be fair, they haven't actually shown us the documents, so this is only weak evidence against him."

The theorist says "Against? Clearly this evidence supports a conspiracy!"

"Crazy talk!" the skeptic chimes.

Who is right?

The skeptic goes on: "We agree the news source is reputable, so the odds they would lie are very very low. We agree the man is nefarious, so the odds he would do such a thing are very very high, as are the odds he would lie about it. Because they can't show us the documents, it's not hard proof, but clearly the odds favor their side, so I score this evidence against him. To do otherwise would be to ignore overwhelming odds!"

The theorist disagrees but sees no point in arguing so they move on.

A year later they are back at the same bar. The nefarious man has gone to jail. A news spot comes up saying that a witness who claimed he would vindicate the man has died of a drug overdose before he could testify. Our skeptic shakes his head and says "do you remember you once questioned that man's nefariousness? Clearly his associates are no better."

"First off," our theorist corrects, "at the time I didn't doubt his nefariousness--I held the same bias as you! What I contested was who the evidence supported. Secondly, I am now convinced he is not nefarious at all, and has been set up!"

Our skeptic rolls his eyes. "We've seen a hundred points of evidence just like the first by now, all against him. You, my friend, are truly nuts."

How did they so diverge?

Recall the skeptic's approach: "What are the odds of that hypothesis given this evidence?" As each bit of evidence comes in, he weighs the hypotheses and goes with the one that seems more likely.

Then there is the theorist's approach: "What are the odds of this evidence given that hypothesis?" As each bit of evidence comes in, he treats each hypothesis as true, and weighs the evidence each way.

Mathematically, we can write these as P(H|E) vs P(E|H). Because we're comparing two H's over a single E, we can drop P(E) in Bayes Rule* and write: P(H|E) ~= P(H)P(E|H).

That is, the two approaches are equivalent except that the skeptic's approach includes the priors (their existing model/beliefs), while the theorist's approach does not (they are willing to "entertain" any hypothesis without bias).

When it comes to drawing a conclusion between the hypotheses in the moment, the skeptic's approach is correct, which is why it seems wholly reasonable. But is it the right thing to do when evaluating the evidence, for purposes of learning and updating one's model?

From our example, our two hypotheses are Nefarious or Conspiracy, which I'll call N and C. The skeptic weighs all the evidence and says P(N|E) greatly outweighs P(C|E). He admits the evidence is weak, but remembers the evidence nonetheless as favoring P(N). A hundred examples later, P(N) is more supported than ever.

The theorist says sure, P(N|E)=P(N)P(E|N) outweighs P(C|E)=P(C)P(E|C), but that's because P(N) hugely outweighs P(C). But how do the pure P(E|N) and P(E|C) compare? That is, if we assume without bias each hypothesis, how well do they respectively explain the evidence?

The thing most people miss is why this matters. If you have to apply the priors (P(N) and P(C)) to actually draw a conclusion, what's the point in separating them out? In fact, skeptics often take affront at the idea of entertaining what they see as a ludicrous hypothesis: they might call it an "arbitrary proposition" and decree it unworthy of consideration, a waste of mental effort to seriously consider.

The reason it matters is because evaluation of the evidence is what updates the model itself--this is how we learn about the world. And if we include our priors in that evaluation, then they get redundantly applied to the Nth power.

This means that for three bits of evidence coming in over time, E1 to E3, the theorist draws a conclusion at the end by:

P(H)P(E1|H)P(E2|H)P(E3|H)

while the skeptic uses:

P(H|E1)P(H|E2)P(H|E3) = P(H)P(H)P(H)P(E1|H)P(E2|H)P(E3|H)

What we see here is that their prior need simply overpower any one piece of evidence, and not only will they never learn otherwise, but they will actually reinforce their prior even in light of contrary evidence.

The skeptic here would argue that with the evidence laid out before them all at once, they would do the same computation as the theorist. While this may be true, evidence rarely comes so conveniently packaged, and even on the occasion it does, their priors will already be tainted by previous over-application, so the end result is still grossly overwhelmed by their prior priors.

Coming back to our first example, how does it support the conspiracy side as opposed to just being weak evidence against? Let's re-frame the example slightly:

World News (a disreputable tabloid) says they have acquired secret documents showing that Elon Musk is an Islamic terrorist mastermind, but they cannot release the documents due to national security.

Would you say this is weak evidence in favor of Musk being a terrorist, or would you say this is evidence that World News makes stuff up? Compare this to your assessment of the first version above.

Most people would draw opposite conclusions from these two versions of the same example, when the only true difference between the examples is in their priors. Critically, I'm talking about the implications of the evidence, not just the final conclusion. (The latter reasonably must include priors, but the former is what shapes people's future priors. Again: people should separate these but they usually don't.)

In order to objectively evaluate the evidence, we need to abandon our priors and fully embrace each hypothesis: In one hypothesis, the news agency is lying, in the other, the accused is lying. Without any priors, we have little information with which to weigh those. Here in fact the only asymmetry is that the news agency cannot show their proof--a circumstance which is certainly more likely if they are lying than if they are not! Ergo, absent priors, the evidence given actually favors the news agency lying--in either version of the example--and our conspiracy theorist was right from the start. By itself, it weighs little against the priors, but properly accumulated over time (absent priors!) such evidence could someday outweigh the priors and justify a change in conclusion.

The impact of this common error is not limited to conspiracy theories. As a foundation of confirmation bias, it is applicable to just about everything from political to scientific beliefs.

The way around it is to start asking the right question: How likely is this evidence given each hypothesis? (Rather than the wrong question, the default: How likely is each hypothesis given this evidence?) This means frequently admitting when a piece of evidence happens to be better supported by the other side (because without priors, evidence is a lot less consistent). And maybe if you admit this frequently enough, you'll finally understand the other side.

[<< | Prev | Index | Next | >>]

Simon Funk / simonfunk@gmail.com