…It isn't so.
There's a popular press release making the rounds right now and it is summed up in this headline from The Register: "Most brain science papers are neurotrash"
– I've also seen similar interpretations in "Science
Codex": "Reliability of
neuroscience research questioned"
– and The Guardian: "Unreliable neuroscience? Why power matters"
The original research entitled: "Power failure: why
small sample size undermines the reliability of neuroscience," by
Katherine S. Button et al. appeared
in the April 10, 2013 issue of Nature
Reviews Neuroscience, a quite respectable journal featuring reviews in the
field of neuroscience. This report by
researchers from the University of Bristol, Stanford University School of
Medicine, The University of Virginia, and Oxford University set out to review
whether research reports from the field of Neuroscience are analyzing data in a
manner that is statistically reliable.
What they conclude is that in the data they sampled, the
"statistical power" of the analyses is low. In fact, the statistics suggest that the
studies are very likely to either accept a hypothesis as true – when it is not
– or miss confirming a true hypothesis. However – and this is very important –
it does not apply this conclusion to the field of neuroscience as a
whole. In short –headlines implying that the study condemns an entire field of
science are false.
It is important to understand from the start that in the
field of scholarly scientific publication, we classify research articles as
either "primary publications" or
"review articles." A primary publication consists of a report
of data that (ideally) has never been published before, while a review consists of references to primary articles for the purpose of
summarizing previous findings or comparing results from different studies. In recent years, a new form of review article has arisen – meta-analyses. A meta-analysis
looks at the data gathered and reported by other labs and attempts to find new
information by applying complex mathematical and/or statistical analyses that would either be too
small to detect, or not evident until one collects data from multiple sites.
As a good scientist, rather than rely on the press releases
and reviews, I went to the original publication. Button et
al. started with a literature search for meta-analyses in Neuroscience published in 2011. Anyone who wants to find out what has already
been published in a field can perform an internet search for published
articles. They used Web of Science, but
that requires a subscription; I prefer the national Library of Medicine's
Medline service, accessible through PubMed (http://www.ncbi.nlm.nih.gov/pubmed).
Their search for keyword "neuroscience" + keyword
"meta-analysis" + publication year "2011" yielded 246
articles. They then had to sort through
those articles for ones that included meta-analyses
and provided enough information on the data used to allow calculation of "statistical power." I'll talk more about statistical power later, but first, let's put this in
perspective:
Their search returned just 246
articles – yet what can we get from a Medline search?
·
First let's look at publication year 2011: 1,002,135
articles.
·
Articles with the keyword
"neuroscience" in 2011: 14,941 articles.
·
Articles with keyword "meta-analysis"
(and variations) published in 2011: 9,099.
·
Articles with both "neuroscience" and
"meta-analysis" keywords, published in 2011: 128.
So, my search returned fewer articles than Button et al. – in many ways that is good, because
it means that my numbers are conservative – it also means that their analysis
applies to 246 articles out of about 15,000 Neuroscience articles published
in 2011!
Before one condemns an entire field of science – one should consider
that the same criticism regarding lack of statistical power can be leveled at
the condemnation itself: the authors started with only 1.6% of all
Neuroscience papers published in a single year! From that starting point,
they still rejected 4 out of 5 papers, applying their analysis to only 0.3% of
the possible Neuroscience papers from 2011, causing a statistical mess of a
totally different type. It is also
important to point out that of the 14,941 Neuroscience articles listed for
2011, there were 2,670 review
articles, leaving 12,271 primary publications to which the
statistics of meta-analysis are totally irrelevant!
But
what does "statistical power" really mean?
When designing experiments, scientists need to determine
ahead of time how many samples they need to perform valid statistical
tests. The Power Function (D = fP
* σ / √n)
relates the size of the effect to be measured to the population standard
deviation and the number of subjects. In
the equation above: D = the
difference in means that we want to consider to be a "real" effect; fP = a constant from the Power Function
table (found in most statistics textbooks) that is selected for a particular
level; σ = the anticipated
standard deviation (measure of randomness) of the measurements that I am
making; and √n = the square root of the number of
subjects I will study (or measurements I will make). For animal behavior, I like to work with a
Power = 90%. The fP function is exponential, it becomes very large as Power
approaches 100%, so 90% is quite reasonable.
fP for 90% Power = 3.6.
A real-world example of the calculation of statistical power:
Given n = 10 animals, σ = 0.5
Hz difference in neuron firing rate, and fP = 3.6, the minimum difference
(D) that I can reliably detect as significant in firing rate is 0.56
Hz.
Put another way, if my analysis says
that two groups of 10 neurons each have significantly different mean firing
rates, and the difference between those means is at least 0.56 Hz, then I can
be confident that 90% of the time I have reached the correct conclusion, but
that there is still a 10% chance that I am wrong. However, if I increase my n, decrease my σ , or increase D, the statistical
power increases and I can be much more
confident in my results.
Power functions are also very useful in
decided how many subjects to test or measurements to make – fixing D at 0.5 Hz,
I can determine that at least 8 neurons must be included in each group to
detect a 0.5 Hz difference at 90% Power.
The Power Function is the foundation of experimental design, and is the
basis for justifying how many subjects to test, and what is considered a "statistically significant result."
While I do not dispute the results of
Button et al. with respect to meta-analyses, their results cannot be applied to primary publications without
additional consideration. In fact, I feel that the authors raise quite valid concerns... about meta-analyses. Good
experimental design is a standard part of the research ethics that every author
confirms when submitting an article for publication. In addition, most primary publications look for rather large effects (mean
differences) and can do so with relatively small group numbers (group sizes of
6-10 are not uncommon). However, meta-analyses by their very nature are
looking for small effects that are otherwise missed in small groups –
otherwise it would not be necessary to combine data sets to perform the meta-analysis.
There are other factors at work which point out the good and
bad with respect to scientific research, but this need not be one of them. In perspective, this article in Nature Reviews Neuroscience sounds a
cautionary note regarding the need for better statistical planning in
meta-analysis. What the article does not
do is state that all or even many Neuroscience articles have the same flaw. In
particular, given that this caution applies to making unwarranted conclusions,
it behooves headline writers and journalists to avoid making the same type of
mistake in the course of reporting!
No comments:
Post a Comment
Please add comment - no links, spammers will be banned.