David Allison was sitting in the office of fellow researcher Stephen Barnes, helping prepare a large grant application to the National Institutes of Health when the question was asked: What size sample should be prepared?
There was no answer — no right answer at least, only a guess.
So Allison, Grier Page and other UAB colleagues began an endeavor to take the guesswork out of preparing grants for researchers, creating what they call a unique and one-of-a-kind database that estimates the sample size required for good statistical power.
The PowerAtlas is a storehouse of information potentially saving genetics researchers time and money in performing microarray studies.
“We think this is incredibly groundbreaking,” says Page, associate professor of public health and lead author on the study. “The PowerAtlas allows justification for sample size and allows researchers to minimize resource expenditures while maximizing power, or the probability that you will get the right answer.”
Microarrays are a 7-year-old technology that has become quite prevalent in research in the past two or three years. They permit biologists to simultaneously measure the RNA abundance of thousands of genes.
The problem researchers encounter in microarray experiments is the same question that caused Barnes and Allison to scratch their heads: How do you estimate the sample size required for good statistical power?
That’s where the PowerAtlas can be so beneficial, say its developers. It enables researchers to plan studies appropriately by building upon prior studies with similar experimental characteristics. Researchers also can upload their own pilot data and generate an appropriate sample size based on the information.
The PowerAtlas can even help researchers who don’t have pilot data for their project.
“Let’s say a Dr. Smith wants to do a microarray study investigating the affects of eating figs on rat liver fatty acid content,” explains Allison, a professor of public health. “She’s going to write a grant application to do this, but in order to calculate power she has to collect pilot data. That might cost $50,000. It might be that he or she would be satisfied that there is a similar study out there. Maybe there’s a study giving apples to rats and Dr. Smith’s hypothesis is that figs will do the same thing apples did. Now Dr. Smith can go and write her grant proposal and submit it to the NIH without having to spend the $50,000 to do the pilot work first.”
The PowerAtlas took 18 months to design, build and compile and has been used for 15 to 20 studies on campus during the past year. Funding for the project came via the National Science Foundation through the Small Grant for Exploratory Research (SGER) award, a high-risk, high-yield grant.
The PowerAtlas has approximately 1,053 studies available to researchers now. Page says the database, which was designed by UAB programmers, is due for an update this month that should increase the amount of studies available to close to 1,500.
“It is a live Web resource that we update every six months,” he says, explaining that the updates come via new datasets from the Gene Expression Omnibus (GEO) and other databases such as The Nottingham Arabidopsis Stock Center.
What kind of studies will researchers currently find in the PowerAtlas?
Page says many of the studies in the database are on cancer. Other major studies in the PowerAtlas include nutrition and obesity, diabetes, heart disease, stroke, yeast and others with the majority of the studies in humans or model organisms such as rats, mice, flies, worms and plants.
Microarray experiments also can be conducted here on campus, saving even more time and money for UAB scientists.
“It’s cheaper to do it on campus than with any of the for-profit companies,” Page says.
Allison says he pictured in his mind the PowerAtlas working like a book, where every page turned has a picture of the study the researcher is seeking. He says his vision for the project was achieved.
“But instead of a book, you have the computer and instead of the picture of, say, a fruit fly exposed to lead, you have a publication-quality graphic of the power and sample size results,” Allison says. “I know of no other tool that will allow investigators to immediately access hundreds of studies and in a matter of minutes have sophisticated power calculations on the basis of any one of those studies.
“I think this is something that could be aimed for in the future in other areas as well. I think the areas of genetics and genomics we will see these types of things because it is becoming the standard in the field for investigators to make their data publicly available.”