Researchers at the University of Alabama at Birmingham (UAB) have developed guidelines to help scientists evaluate the validity of statistical methods used to interpret and assess large amounts of complex data. Guidelines are published in the Aug. 30 issue of Nature Genetics.

Posted on September 2, 2004 at 9:35 a.m.

BIRMINGHAM, AL — Researchers at the University of Alabama at Birmingham (UAB) have developed guidelines to help scientists evaluate the validity of statistical methods used to interpret and assess large amounts of complex data. Guidelines are published in the Aug. 30 issue of Nature Genetics.

“Guidelines apply to high dimensional biology or ‘omic’ research, such as genomics, in which researchers examine the entire genome or many genes of an organism at once,” said David B. Allison, Ph.D., professor of biostatistics and head of the section on statistical genetics at UAB. “This is a younger, less mature discipline. The guidelines, which we’ve put in writing for the first time, are designed to answer: ‘What is a valid statistical method?’ and ‘How do you go about determining a statistical method’s validity?’”

UAB researchers engaged in extensive dialogue, gathered feedback from lecturing and attending seminars and reviewed published literature. “What resonated from this research is there is much confusion between illustration and demonstration,” Allison said. “We found many studies that included formulas with a lot of razzle dazzle that lacked sound reasoning.”

Allison cautions some writers also fail to qualify claims. “The field of omic research moves fast, and we need new methodologies. We all are willing to expect errors will happen, but we need to know the risk, or how often we can expect an error to occur — if we did the calculation an infinite number of times, how often would it be wrong?”

Statisticians rely on mathematical proofs or computer simulations, or situations where the ‘truth’ is known, to test the validity of methods. “Datasets are used for this purpose,” Allison said. “We need a near infinite number for datasets to draw firm conclusions. We also need a public archive of artificially constructed ‘plasmode’ datasets for methodologists to benchmark their method’s performance against.”

More information about UAB’s Section on Statistical Genetics is available online at www.soph.uab.edu/ssg or by contacting Richard Sarver at (205) 975-9169 or by e-mail at rsarver@uab.edu.

UAB researchers who collaborated on the paper are Tapan Mehta, a graduate student in electrical engineering, and Murat Tanik, Ph.D., professor of engineering with the department of electrical and computer engineering.