When your research team extends from the Caribbean to Chesapeake Bay and is responsible for hundreds of billions of DNA base-pairs of data, you have more than a scientific mystery on your hands — you have a logistical tangle.
Jeffrey Edberg, Ph.D., professor in the Division of Clinical Immunology and Rheumatology, is a member of a long-term clinical study of systemic lupus erythematosus (SLE) that has enrolled more than 8,600 patients and healthy controls at UAB and other institutions from Baltimore to Puerto Rico. The project’s data trove includes more than 8,000 genomewide association studies, more than 6,500 copy number variant assays, 600 targeted exome-sequencing studies, more than 75 whole genome-sequencing studies and a host of biobanked samples from almost all study participants, including plasma and serum. Finding new connections among all of those information sources could offer important clues to understanding the puzzling variety in lupus cases, including reasons the disease occurs more frequently in women and minority patient populations have worse outcomes than other groups.
"We had an enormous collection of data," said Edberg, who directs the Center for Clinical and Translational Science’s Specimen Processing, Analysis and Biorepository unit. "The problem was it was all split up."
“Everybody in the scientific community has similar problems and is clamoring for bioinformatics support.” |
Or it was until Edberg joined the first field test for a new tool designed to break down scientific data silos: UAB Biomedical Research Information Technology Enhancement, known as U-BRITE.
‘It even works on a cellphone’
U-BRITE gives investigators access to secure, high-volume storage for all the files that accompany a modern biomedical research project, along with a pipeline for clinical data and the high-performance computing resources to analyze all this information, says James Cimino, M.D., director of the UAB Informatics Institute, which created U-BRITE, and co-director of the CCTS, which featured the project in its recent grant renewal. The U-BRITE framework is designed to facilitate work among the investigators on a research team and interactions with the data scientists who support them.
In November 2018, after a six-month U-BRITE pilot, the lupus team demonstrated its progress along with three other groups of researchers and bioinformaticians from the UAB Informatics Institute. The lupus researchers had built the SLE Rosetta Database, a searchable platform connecting participant data. Rosetta allows them to visually analyze genetic variations seen by race and ethnicity, cluster recruited patients by genotype and visit time and explore the availability of different clinical data in the study over time.
Radiation oncologist Christopher Willey, M.D., Ph.D., is exploring the signaling of protein kinases to develop new treatments for the brain cancer glioblastoma multiforme. He demonstrated an intuitive, web-based interface that lets users “see the correlations” among a host of data points, including gene-sequencing, kinase activity profiles, tumor growth rates and treatment response. “It even works on a cellphone,” said Willey, who recently received a UO1 grant with a computational aim that will use the U-BRITE initiative.
Everybody’s problem
“Everybody in the scientific community has similar problems and is clamoring for bioinformatics support,” Cimino said. The Informatics Institute and CCTS provide that support to UAB researchers, but each project, with its unique aims and scientific questions, generally results in a custom solution. “If you want to ask a different question with the data, that doesn’t help you,” Cimino said.
The Informatics Institute and the CCTS envision a solution that could scale across UAB and beyond. To test U-BRITE, the institute issued a request for proposals, seeking a variety of knotty, real-world research problems. In addition to the lupus and brain cancer projects, the institute funded grants for teams with translational projects in rheumatoid arthritis, led by S. Louis Bridges Jr., M.D., Ph.D., and precision genomics, led by Eddy Yang, M.D., Ph.D. “We figured if we could solve their problems, we had solved problems for translational genomic/phenomic team science,” Cimino said. “And if we did it for several of them, it would be a generic solution, not a one-off.”
‘Easy and stress-free’
U-BRITE gives researchers a secure, HIPAA-compliant repository for data and includes a project-management interface to wrangle masses of files, says Jake Chen, Ph.D., the institute’s chief bioinformatics officer and U-BRITE project lead. It also offers direct access to Cheaha, UAB’s supercomputer — one of the five fastest at academic institutions in the Southeast — for self-service analysis of next-generation genomic-sequencing data.
Beyond the technical offerings, U-BRITE aims to demystify data analysis. “What we’re trying to do is provide an easy and stress-free data science environment to empower team science and drive advanced informatics method development,” Chen said.
“We want to get people who are not traditional data scientists used to the idea that data science is not intimidating.” |
To that end, U-BRITE includes access to Jupyter Notebooks, web-based applications running on the Cheaha supercomputer that let groups share live code and visualizations. Some coding is still required, “but that can be done by scientists in a principal investigator’s lab who have taken our courses,” Chen said. “The process is open so that investigators can understand how code produces graphs and other visualizations and they have more input. And if they get stuck, they can call in our bioinformaticians. We want to get people who are not traditional data scientists used to the idea that data science is not intimidating.”
Bringing everyone together
During the U-BRITE Day demonstration in November, research teams demonstrated what they had built in collaboration with the data scientists at the institute. The Bridges team integrated data from i2b2, a self-service application that enables UAB researchers to access de-identified patient records, with the Rheumatology Arthritis Database and Repository (RADAR) maintained at UAB. That has enabled them to test correlations among genetic markers, disease severity and a host of patient traits, including gender, smoking status and the presence of CCP antibodies. The goal is to develop a model to predict radiographic joint damage in patients with rheumatoid arthritis and improve treatment.
Curtis Hendrickson, a CCTS research associate and member of Edberg’s team, demonstrated the use of Jupyter Notebooks and Cheaha to integrate genomic, kinomic and clinical data with data stored in UAB’s biobank. “The only way we were able to bring all this together was to bring everyone in the same room with U-BRITE,” Hendrickson said.
U-BRITE enabled Yang’s team to store and analyze sequencing data from the 500-plus patient tumors sequenced at UAB through the STRATA trial. “Understanding the tumor genomic landscape at UAB can generate new avenues of future research,” said Yang, who demonstrated a dashboard interface that enables researchers to explore the distribution of mutations and tumor types for subsets of cancer patients. He also demonstrated a proof-of-concept link between clinical data from i2b2 and the NIH’s clinicaltrials.gov website, which could “aid in identifying eligible patients for clinical trials,” he said.
Next steps
Lessons from the 2018 pilot and new features will be incorporated into the next version of U-BRITE. A second pilot, one that incorporates image data, may be launched with the CCTS based on favorable reviews in the its renewal grant.
The goal is to create “not just a piece of software, but a specification” that will enable U-BRITE to be adapted to new research problems and shared with other universities, Cimino said. “Our mission is to expand the scope and scale of available data and knowledge resources to support team science and enhance the expertise of our research teams through training and collaboration. This isn’t a substitute for the science anyone is doing. We want to enhance it.”
For inquiries about how to participate in U-BRITE projects, contact Jake Chen at jakechen@uab.edu or visit ubrite.informatics.uab.edu/.