This blog post is by Donald Trump, MD, FACP, CEO and Executive Director of the Inova Dwight and Martha Schar Cancer Institute. Dr. Trump and other members of the Schar Cancer Institute will be blogging on current topics in cancer, research and developments.
- The use of sophisticated data mining techniques in medical research is growing.
- Some have even suggested that these emerging data mining techniques can replace clinical trials.
- However, data mining techniques have inherent research flaws that can lead to misleading insights and results around associations and causation.
- While large data base examinations can produce compelling hypotheses for medical research, they should not replace time-tested and proven clinical trials.
There has been considerable press coverage and attention recently about the role of big data – particularly the use of sophisticated data mining techniques – in medical research. Some experts have suggested that such techniques can or should replace well-designed clinical trials. Certainly, the potential that data mining has for moving medical research forward is compelling. However, recent coverage from NPR and The Washington Post about the link between the use of antacid drugs and cardiovascular risk go too far.
Both articles cover a recent report in PLoS One by Dr. Nigam Shah and colleagues at Stanford University who used sophisticated “data mining” techniques to search the electronic medical records of over 16 million clinical documents from 2.9 million individuals to examine whether the use of the antacid drugs in a class called proton pump inhibitors (PPIs, think Prilosec, Nexium, Prevacid) was associated with cardiovascular risk (CVR). While there is experimental evidence that these drugs may have unfavorable effects on heart muscle cells, no substantial increased CVR was detected in the large randomized trials done in the development of these drugs. However, based on the data analysis, Dr. Shah does make the connection between these drugs and CVR*. (You can see more detailed information on the study results below.)
Thankfully, the NPR story did include a caution by Dr. David Juurlink, a University of Toronto drug-safety researcher, who noted that such studies can provide misleading results: for example, were factors such as obesity or cigarette smoking – conditions that lead to PPI use – controlled for? People who smoke cigarettes and are overweight are more likely to need this type of medication. Of course, people who are obese and smoke are also more likely to have a heart attack.
Herein lies the problem: an association between PPIs and heart attacks may be found through data mining, but that doesn’t mean that one caused the other. To that end, Juurlink notes that association does not prove causation. However, the associations derived via data mining from such large numbers of observations (2,000,000 patients) tend to be viewed as special and intrinsically valid. Consistent with this perception that the value of a scientific finding is influenced by the size of the population studied, the Washington Post reported this story, and did not present any “counterpoint” argument such as the one Dr. Juurlink made.
In my view, large data base examinations are important, but primarily as hypothesis-generating exercises. I do not believe that big data exercises that determine associations can or should replace well-designed prospective clinical trials. Let’s take a look at how the two approaches compare:
How Do They Compare?
Clinical Trial Research
Big Data Research
While clinical trials certainly have their own set of flaws (being time and resource intensive and oftentimes recruiting patients that are not the same as the “real world” populations in which the treatment will be used), they are the gold standard in medical research and big data analyses are not a replacement for the prospective clinical trial and careful thought by the practicing physician.
What do you think? I’d love to hear your opinion in the comments section..