…It isn't so.
There's a popular press release making the rounds right now and it is summed up in this headline from The Register: "Most brain science papers are neurotrash"
– I've also seen similar interpretations in "Science Codex": "Reliability of neuroscience research questioned"
– and The Guardian: "Unreliable neuroscience? Why power matters"
The original research entitled: "Power failure: why small sample size undermines the reliability of neuroscience," by Katherine S. Button et al. appeared in the April 10, 2013 issue of Nature Reviews Neuroscience, a quite respectable journal featuring reviews in the field of neuroscience. This report by researchers from the University of Bristol, Stanford University School of Medicine, The University of Virginia, and Oxford University set out to review whether research reports from the field of Neuroscience are analyzing data in a manner that is statistically reliable.
What they conclude is that in the data they sampled, the "statistical power" of the analyses is low. In fact, the statistics suggest that the studies are very likely to either accept a hypothesis as true – when it is not – or miss confirming a true hypothesis. However – and this is very important – it does not apply this conclusion to the field of neuroscience as a whole. In short –headlines implying that the study condemns an entire field of science are false.
It is important to understand from the start that in the field of scholarly scientific publication, we classify research articles as either "primary publications" or "review articles." A primary publication consists of a report of data that (ideally) has never been published before, while a review consists of references to primary articles for the purpose of summarizing previous findings or comparing results from different studies. In recent years, a new form of review article has arisen – meta-analyses. A meta-analysis looks at the data gathered and reported by other labs and attempts to find new information by applying complex mathematical and/or statistical analyses that would either be too small to detect, or not evident until one collects data from multiple sites.
As a good scientist, rather than rely on the press releases and reviews, I went to the original publication. Button et al. started with a literature search for meta-analyses in Neuroscience published in 2011. Anyone who wants to find out what has already been published in a field can perform an internet search for published articles. They used Web of Science, but that requires a subscription; I prefer the national Library of Medicine's Medline service, accessible through PubMed (http://www.ncbi.nlm.nih.gov/pubmed).
Their search for keyword "neuroscience" + keyword "meta-analysis" + publication year "2011" yielded 246 articles. They then had to sort through those articles for ones that included meta-analyses and provided enough information on the data used to allow calculation of "statistical power." I'll talk more about statistical power later, but first, let's put this in perspective:
Their search returned just 246 articles – yet what can we get from a Medline search?
· First let's look at publication year 2011: 1,002,135 articles.
· Articles with the keyword "neuroscience" in 2011: 14,941 articles.
· Articles with keyword "meta-analysis" (and variations) published in 2011: 9,099.
· Articles with both "neuroscience" and "meta-analysis" keywords, published in 2011: 128.
So, my search returned fewer articles than Button et al. – in many ways that is good, because it means that my numbers are conservative – it also means that their analysis applies to 246 articles out of about 15,000 Neuroscience articles published in 2011!
Before one condemns an entire field of science – one should consider that the same criticism regarding lack of statistical power can be leveled at the condemnation itself: the authors started with only 1.6% of all Neuroscience papers published in a single year! From that starting point, they still rejected 4 out of 5 papers, applying their analysis to only 0.3% of the possible Neuroscience papers from 2011, causing a statistical mess of a totally different type. It is also important to point out that of the 14,941 Neuroscience articles listed for 2011, there were 2,670 review articles, leaving 12,271 primary publications to which the statistics of meta-analysis are totally irrelevant!
But what does "statistical power" really mean?
When designing experiments, scientists need to determine ahead of time how many samples they need to perform valid statistical tests. The Power Function (D = fP * σ / √n) relates the size of the effect to be measured to the population standard deviation and the number of subjects. In the equation above: D = the difference in means that we want to consider to be a "real" effect; fP = a constant from the Power Function table (found in most statistics textbooks) that is selected for a particular level; σ = the anticipated standard deviation (measure of randomness) of the measurements that I am making; and √n = the square root of the number of subjects I will study (or measurements I will make). For animal behavior, I like to work with a Power = 90%. The fP function is exponential, it becomes very large as Power approaches 100%, so 90% is quite reasonable. fP for 90% Power = 3.6.
A real-world example of the calculation of statistical power:
Given n = 10 animals, σ = 0.5 Hz difference in neuron firing rate, and fP = 3.6, the minimum difference (D) that I can reliably detect as significant in firing rate is 0.56 Hz.
Put another way, if my analysis says that two groups of 10 neurons each have significantly different mean firing rates, and the difference between those means is at least 0.56 Hz, then I can be confident that 90% of the time I have reached the correct conclusion, but that there is still a 10% chance that I am wrong. However, if I increase my n, decrease my σ , or increase D, the statistical power increases and I can be much more confident in my results.
Power functions are also very useful in decided how many subjects to test or measurements to make – fixing D at 0.5 Hz, I can determine that at least 8 neurons must be included in each group to detect a 0.5 Hz difference at 90% Power. The Power Function is the foundation of experimental design, and is the basis for justifying how many subjects to test, and what is considered a "statistically significant result."
While I do not dispute the results of Button et al. with respect to meta-analyses, their results cannot be applied to primary publications without additional consideration. In fact, I feel that the authors raise quite valid concerns... about meta-analyses. Good experimental design is a standard part of the research ethics that every author confirms when submitting an article for publication. In addition, most primary publications look for rather large effects (mean differences) and can do so with relatively small group numbers (group sizes of 6-10 are not uncommon). However, meta-analyses by their very nature are looking for small effects that are otherwise missed in small groups – otherwise it would not be necessary to combine data sets to perform the meta-analysis.
There are other factors at work which point out the good and bad with respect to scientific research, but this need not be one of them. In perspective, this article in Nature Reviews Neuroscience sounds a cautionary note regarding the need for better statistical planning in meta-analysis. What the article does not do is state that all or even many Neuroscience articles have the same flaw. In particular, given that this caution applies to making unwarranted conclusions, it behooves headline writers and journalists to avoid making the same type of mistake in the course of reporting!