“Tweaking” Experimental Data
Earlier today, I read a blog post by Mark Chu-Carroll titled, Selective Data and Global Warming. The post is primarily concerning a global warming “denialist” Michael Duffy who dishonestly presented global climate data to force it to fit his anti-global warming agenda. [1]
While reading it, I couldn’t help but be reminded that this type of dishonesty happens all the time in science. Most often, scientific experiments do not give simple conclusive results. The data must be “interpreted,” and statistical methods must be “applied.” I’ve seen cases where researchers sat and “tweaked” the statistics to favor their hypothesis with the same aggressive dishonesty as this global warming denialist.
Software for real time PCR machines is a perfect example of how dishonest representation of data has become so embedded in the industry of science. Most real time PCR software allows you to adjust parameters in the data interpretation. Why? While initial results may not support your hypothesis, the software makes it trivial to “play around” to make the data fit. The data itself is not changed —merely its interpretation. To avoid this problem, experiments should be repeated in different ways to ensure the interpreted results are the actual results. If several different runs with different controls and different samples are performed, the real results cannot be hidden with these manipulations. However, in my experience, experiments are only repeated if the results are not as expected.
Experiments don’t always work. So, experiments are supposed to be repeated several times with multiple levels of controls to ensure the reliability and accuracy of the results. However, I know far too many scientists who will take the first experimental results that match what they want, and then never repeat the experiment again. Or, if an experiment only produces the expected results 10% of runs, scientists will simply report the “good” results and ignore the rest, sometimes claiming “there must have been an error for 90% of the runs.” They are not faking data, but merely selecting the data they want rather than to uphold their scientific obligation to report reality without bias.
The greatest enemy of data integrity is Photoshop. Every scientist knows how to use Photoshop. It’s needed for many legitimate purposes, such as to prepare photos for publication. Unfortunately, it too is used to dishonestly manipulate data. For example, Photoshop can make a band on an agarose gel seem darker, lighter, or even combine different experiments together into one image. While sometimes these manipulations are perfectly acceptable, results can be mixed and matched to fit the hypothesis. There is simply no way to know from the final images if they were manipulated honestly —or manipulated at all.
So why do scientists “tweak” their data? Maybe vanity, or arrogance, but I think the real problem stems from the nature of scientific funding and the incessant pressure to publish, publish, publish. The livelihood of many scientists, especially those in the biological sciences, depends on NIH grants and other applicant funding. To get these grants and earn university tenure, scientists must show progress, and progress is measured in published papers. However, wrong hypotheses don’t publish papers —only right ones do. So if a scientist spends a year investigating a hypothesis, and it turns out that the data doesn’t support it, he often has a problem: publish or starve. So, the data is made to fit.
If these practices continue, it will seriously hurt scientific progress. While many scientists do follow correct practices and don’t manipulate their data or its interpretations, there unfortunately are also many who do.
[1] Mark refers to a post by Tim Lambert which refers to Michael Duffy at the Sydney Morning Herold





Think Gene at Technorati