Holding up RCTs as 'gold standard' of research is a false hope and conceals statistical pitfalls, experts say
Regina Nuzzo, PhD and Jeffrey Blumberg, PhD participated in a session titled “Clinical Research, Statistics and Other Deceptions” at the Council for Responsible Nutrition’s annual meeting, held this year in in Laguna Beach, CA. Nuzzo spoke on the widespread fallacy of equating P values to high levels of certainty about the truth of underlying hypotheses, whereas Blumberg laid out how the RCT model is inappropriate for most of the questions that nutritional science asks.
The question is a burning one as the Federal Trade Commission for the last several years has started to require that companies that sign consent decrees have at last two RCTs under their belts to substantiate future claims on products. While the agency has said that this standard applies only to those companies that signed decrees and is meant as a ‘fencing in’ measure for companies that were making outlandish claims, the development has had a chilling effect on the broad spectrum of nutritional research.
False promise of P values
Part of the move to hold RCTs up as a gold standard of evidence lies in a fallacy about what they actually mean, said Nuzzo, a professor of statistics at Gallaudet Univeristy in Washington, DC. There is widespread belief among the mainstream media and even among some regulators that a P value of 0.05 (a measure of statistical probability) equates to a 95% certainty that the hypothesis is true. This is a simplistic view of what a P value actually represents, Nuzzo said.
“All a P value can do is to summarize the data assuming assuming a specific null hypothesis. It can’t work backwards and make statements about the underlying reality,” Nuzzo said. What’s really necessary to make a judgement about reality using a P value as a reference point is the knowledge of how likely the original hypothesis was to be true in the first place. A high P value of 0.05 or better on a study provides additional support for something you were already pretty sure was true, Nuzzo said. A P 0.05 result for a very uncertain hypothesis might boost confidence in the truth of the underlying reality only to something like 50%, Nuzzo said. It doesn’t mean that a P 0.05 study is true and valuable and a P 0.07 study is not, and drawing this sort of bright line is something that FTC has been increasingly leaning toward.
The concept of P value has been around for about seven decades, and researchers have had reservations about the validity of using it as a tool to measure a study’s quality for almost that whole time, Nuzzo said. While it can be a valuable reference point, it was orginally meant as a measure of whether a given line of research seemed promising enough to deserve a second look, she said.
“Some journals are moving away from P values altogether,” she said. “What’s really needed is more sharing, more transparency and more collaboration to build statistical power from related studies.”
Blumberg, who heads the Antioxidants Research Laboratory at Tufts University in Medford, MA, said FTC is trying to tie the research question up in a neat bow using the two RCT standard, but the tool the agency has chosen is not fit for the task at hand.
“RCTs were in fact designed to test drugs,” Blumberg said. “Drugs have large effects and target specific systems. Nutrients by definition are pan-systemic. RCTs have very limited generalized ability to test the effects of nutrients.”
And then there are ethical questions, Blumberg said. To have a true control group, a deprivation of some sort would have to be in place. While this can be done with rats and mice, it obviously can’t happen with humans in most cases. And in nutrition science numerous other confounding factors can come into play, he said.
“At best you can test high intake levels and compare them to what low levels look like,” he said.
Totality of evidence as the standard
So what is a researcher to do? The effort to drive evidence-based decision making into ever more rigid channels is something that responsible members of industry and the research community have to struggle against, Blumberg said. Black and white is not the question, but rather how dark a shade of gray can you achieve.
“I believe in the power of nutrition, but the effects are modest and aggregated across multiple systems. Nutrients usually have multiple thresholds that are often under homeostatic control. Rather than using an RCT ‘gold standard’ we should consider the totality of research approaches,” Blumberg said.
“We should be able to use scientific judgement. We need to look at things like benefit and risk. If the shoe doesn’t fit, we shouldn't have to wear it. We need to be able to use the power of nutrition to advance public health without being held to a false standard like the RCT,” he said.
To P or not to P.
Posted by Larry,
P-Value Does Not Speak to the Validity of Hypotheses
Posted by Chris Melville,
Posted by Michael Evans,
View more comments