On Monday, Sept. 10, 2019, CCTS Co-Director and UAB Informatics Institute Director Dr. James Cimino presented “Informatics for Translating Clinical Research Findings to Populations: Small and Big Data,” to a packed room of public health students and faculty. He discussed the unique roles clinical informatics and bioinformatics play in creating both unique data sets and the tools and techniques that enable investigators to query those data and produce meaningful insights that improve population health. “Investigators just need the right skills to ask the right questions,” he said.
Informatics, Translational Science, Public Health, and Big Data: Synergistic Interconnections
Starting with the basics, Cimino described the fields of biomedical informatics and translational science and how they are interrelated at every phase of the translational spectrum, from genetic and molecular mechanisms in preclinical and animal studies (T0) all the way through environmental and social determinants of health and their effect on human health at the population level (T4).
He also described five characteristics of Big Data: volume (required), variety, velocity, value, and veracity, noting “you can get the signal out of the noise at the population level without worrying too much about value and veracity,” he said, calling them “the enemy of the good.”
Cimino reviewed a number of techniques for working with Big Data in public health, including data mining, text mining, machine learning, and deep learning. He presented examples of flu trends drawn from Google and Twitter data and how mashing up population data sets and/or geocoding public health data can result in insightful correlations.
Cimino also demonstrated i2b2, a tool developed by CCTS Informatics that enables investigators can search the Big Data in UAB’s electronic health record (EHR) on their own. He showed attendees “just how easy it is to get detailed clinical data on a million patients.”
Advantages of Reusing EHR Data
Why reuse EHR data for population health studies? Cimino provided several reasons:
- Data are relatively high quality
- Data are about the “phenome” vs genome
- Data are almost free, requiring mostly an investment of a researcher’s time to learn a tool such as i2b2
- Data can be used for several types of studies, including validation and replication of research studies and post-market surveillance (e.g., Phase 4 studies)
Reuse of EHR Data Caveats
He warned attendees about several limitations of working with clinical data including:
- EHRs do not offer “the whole story” on patients, who may have records from outside UAB health system, which represents missing data
- Data in clinical notes are hard to access, representing more missing data
- Data from research differ from clinical care data; protocols do not always follow standard of care
- “controlled terminologies” can make data messy, e.g., when International Classification of Disease (ICD) codes change from year to year
- Bias exists and must be taken into account (for instance, people who get lots of lab tests are more likely to have worse outcomes than those who get fewer tests due to the “healthy person” effect, which skews the data)
After highlighting several resources for informatics support at UAB, including CCTS Informatics experts, Cimino encouraged the audience to recognize the value of their field in improving outcomes and fueling important scientific discoveries, saying “Public health research is a natural last step in translational science and helps feed other phases of the translational spectrum. Informatics offers the conceptual abstraction and computational methods and tools to help extract knowledge from public health Big Data. I look forward to fruitful collaborations between UAB's School of Public Health and Informatics Institute investigators.”
To learn more about this talk, see Dr. Cimino’s slides. To learn more about i2b2 and translational research, see “April Forum Explores the Benefits of Using i2b2 for Translational Research” and “CCTS and UAB Informatics Institute Announce i2b2 Abstract Contest Winners.”