By Professor Doom

There are three
easy ways to lie with statistics, ways anybody can use, and will pass peer
review. It is important for the readers to know this, because of the following:

It’s no secret
that there’s a big problem with science nowadays. While many view “science” as
a trump card for “true, no argument,” the evidence just keeps getting stronger
that something is terribly wrong in modern scientific research.

Groundbreaking
research, after years of being taken as gospel, has been found to violate a key
scientific principle: replication. We can’t repeat the experiments to get the
same result.

We’re told Jesus
turned water into wine but…that’s not science, because there’s no way to repeat
the experiment under controlled conditions. No problem, such a thing is filed
under “religion” and we move on.

Every day we’re
told about a new scientific wonder, particularly in the medical and psychological
fields, as well as in the social. We’re told of wonders in other fields, and
those wonders are also drawing increasing suspicion, but I want to focus on
those medical/psych and social wonders, because these almost exclusively rely
upon statistical analysis.

Using statistics
in these fields is perfectly understandable. We all realize each human being is
unique, and even each ailment is to some extent unique. Even a simple headache
can vary quite a bit, and the response of that headache to, say, aspirin, will
vary.

For every statistical
study, the key concept is the “p-value,” the probability that there’s nothing
going on despite our belief that we’ve discovered something. Naturally you want
your p-value to be very small. The results reported on Sunday morning coffee
shows have p-values around 0.05, which is usually called “significant.” Done honestly, a p-value of 0.05 or less
means there’s a 5% chance that the result is just dumb luck, and means
nothing—if statistics were done honestly, about 1 in 20 “new study shows” story
on a Sunday morning show would be rubbish (rubbish, of course, is much more
common than 1 study in 20 on TV).

For p-values
below 0.01, we’re talking a 1% chance, 1 time in 100, of pure luck giving us
this result. This is a pretty
significant result, the kind of thing that’s considered reliable, and it is, if
statistics were done honestly (I’ll stop with the “if statistics were done
honestly” qualifier past this point, just assume I’m writing it nearly every
sentence, although it seldom seems to apply to the real world).

Before I get to
the three ways I can manipulate the statistics to get my results, I really feel
the need to point out: I’ve yet to see a single statistics textbook that goes
over these illicit methods. I taught my first statistics course over 20 years
ago…it’s never discussed, and yet, “somehow” these methods are the foundation
of results in many fields, for many studies.

So suppose I have
a new headache drug, and I want to show the drug works via statistical
analysis. Being a statistician, I’m virtually guaranteed to do so, and I’d like
to go over the three trivial ways I can show my drug works (and I won’t even
need to use the placebo affect if I don’t want to!).

I’ll find people
with headaches, and ask them to rate their pain on a 1 to 10 scale, give them
the drug (a glass of water), wait a while, and ask them to again rate their
pain on a 1 to 10 scale. I won’t bother describing the study in detail, but the
previous sentence describes the basic way statistical studies are done.

(There is
something called a “control” you really should use, but in many medical
studies, it’s hard to establish a control—you can’t exactly give people
headaches, or cancer, and not treat them, the better to compare to the people
you’re treating, for example.)

I don’t care if
the drug is “a glass of water” I’m going to produce a study with a result and
p-value below 0.01. All I have to do is manipulate the data in the way it’s
done every day now.

There are three
common ways to manipulate statistics; allow me to start with the first,
easiest, method to lie with statistics:

*Method One: Data Mining*

I trust the gentle
reader has filled out a survey before, and I assure you that such are quite
common in serious experimental tests. Such a survey might not include name, but
ethnicity, age, birthplace, religion, birthday, political leanings, income,
number of siblings (and type, problematic nowadays with all the transgenderism
brouhaha), gender (self assigned or otherwise), home ownership, car ownership,
education level, education level of parents, and blood type might be on it…all
sorts of questions can appear on a survey.

Let’s for the sake
of argument assume the survey for my experiment has a mere 12 questions that
might be relevant to headaches.

Now let’s get our
significant result!

So, first I do
the honest thing: compare the all the people in my study, and compare their
pain ratings before and after they drink water. Again, I spare the calculation
methods, but if, say, pain levels for the whole group drop from an average of
“7” to an average of “1,” then I got lucky, my p-value is already below 0.01.
Realize, I could just get lucky, which why you never use the phrase “statistics
prove.” Those two words should be close to each other as often as Trump and
Clinton (either or both) share a shower. While statistics can prove nothing, we
sure like to say we’ve proved something, so we use the bogus phrase,
“clinically proven,” even though

*nothing*has been proven, or can be proven, with statistics.
So I run the
study the honest way, hoping to get lucky.

No luck? No
problem. I now go along each variable. I check the males’ pain level (one
variable), I check the “under the age of 18” pain level (another variable), and
I keep going, with all 12 variables. Twelve more chances to get lucky! I have a
1% chance of getting lucky, and I’ve tried 13 things.

No luck? No
problem. I now compare two of my variables, say, gender and age. Again, avoiding
the math here, there are 66 ways to compare two variables in 12. Keep in mind,
I’ve now identified 79 ways to get lucky and arrive at a significant result.
Maybe I’ll get my 1% chance in that (I have a better than 50% chance of getting
lucky at this point).

No luck? No
problem! I now look at three of my variables, and compare to pain levels…still
no luck, I’ll go to four variables. Skipping over the math, there are thousands
of ways of checking for a result using data mining on my very small survey. I’m
going to make that 1% chance at some point…and now I’m on a Sunday morning
show.

If this seems
farfetched, I ask the gentle reader to simply watch TV, and wait until you hear
a line like “for women under 40 who smoke this drug may…” and realize you’re
listening to a result that came from data mining, simply running test after
test among various variables (in this case, three: gender, age, and smoking
status) until something significant and reasonable-sounding came up.

“Reasonable
sounding” is the problem with this method.

Using this method
I might get “Republican males under 30 who drink a glass of water will have their
headache pain reduced,” but by the time my study makes it to the Sunday show,
the talking heads will simply say “A study clinically proves some males can use
water as a pain reliever.”

I won’t have much
motivation to clarify what the media says, because it’s totally not in my best
interest to hurt my own publicity, and they’re technically correct anyway. I’ll
stand by my results, the data will pass peer review (I’ve seen the like enough
times, and many doctoral theses in education/administration pass the doctoral
review committee doing this). Yes, the result is rubbish, but here’s the
kicker:

Nobody will check
my work.

There’s no money
in research for verifying my results the proper way (that is, by creating a new
study just with Republican males and seeing what happens when they drink water).
My groundbreaking study will last for decades, probably, and I can make a new
career out of using water as a pain reliever. I might even set up a web page
selling “Professor Doom’s Miracle Water” that, I promise you, “may” be even
more effective (as shown by my statistical studies) in pain reduction for “some
people.” Anyone will buy my water, given enough pain.

This is much of
what passes as science in many fields today: huge data mining efforts, silly
results that are just the result of dumb luck and repeated effort, and
absolutely no attempt to verify because our scientific system wants results,
not honesty.

Jesus turned
water into wine, and this is not science because we can’t replicate the event.
Every day we get the results of another study, one that is not replicated (and
many studies simply cannot be replicated).

Why are these
studies called science?

This trivial way
to get a statistical result is why we have so
many drugs that do nothing (as the
CEO of a drug company admits), waste so much time buying things
“medical studies have shown to work” that do no such thing.

Data mining is
the most common, most trivial way to get a result, but I promised my readers
two more ways to lie with statistics, ways that never get mentioned in statistics
textbooks even though they are (illegitimately) used often today.

I’ll cover those
less popular ways next time.