First, there are the myriad sources of patient data, collected for different purposes, using different standard and nonstandard terminologies. Then, these data are moved to a system, such as an enterprise data warehouse, where their original nature may be obscured or the data themselves may be altered. Then there are the myriad tools available for extracting a preferred subset of the data. Researchers who manage to clear these hurdles might be forgiven for thinking they have crossed the finish line.
But the real challenge isn’t accessing these data, explained UAB Informatics Institute Dr. James Cimino. It’s recognizing their limitations and structuring both hypotheses and analyses accordingly. “These data are almost free, and are rapidly becoming almost universally available, but should come with a caveat emptor warning due to the likelihood of hidden flaws.”
EHRs, he noted, are often incomplete, inaccurate, and coded for purposes other than research (i.e., billing). They rarely capture the complete patient history, lacking, for instance, both medications and outcomes. Repeat admissions can result in different diagnoses; notes may not be recoverable; and the data granularity may not match a researcher’s needs. Further, re-users of the data are not the same people as the original collectors of the data, and there may be no way to establish communication between them.
Dr. Cimino concluded by outlining a roadmap for dealing with the inherent limitations of EHR data, based on techniques for extracting best evidence from the medical literature and incorporating standards for terminologies, data representation, and data exchange.
Catch the entire talk on the CCTS YouTube channel.