University of Alabama at Birmingham is serving cancer clinicians and researchers across the world in their search for cancer biomarkers, therapeutic target discovery and precision treatments to help patients.
A cancer analysis web portal at theAs of December, the UALCAN data-mining platform has had more than 1,080,000 page views from more than 100 countries. The original site — launched in 2017 — allowed users to easily compare gene expression among tumor subgroups in more than 30 types of cancer. The site made the huge databases in collections like The Cancer Genome Atlas, or TCGA, and Clinical Proteomic Tumor Analysis Consortium, or CPTAC, accessible for in-depth analysis without the need for bioinformatics or programming skills. Differences in gene expression, proteins and patient survival could also be compared with data like pathological stages or tumor grades, patient gender, tumor molecular subtypes, patient race, alcohol consumption or smoking.
By 2022, UALCAN had updated its integrated cancer data analysis platform to include data on other molecular actors that can alter gene expression or reveal pathways perturbed in cancer, including microRNAs, long noncoding RNAs, promoter DNA methylation and mass-spectrometry proteomics. The gene expression and survival analysis now includes about 20,500 protein-coding genes from 33 different tumor types.
The reason for this big-data complexity? Cancer is not a single condition or disease; rather it is large number of different diseases and complex molecular alterations that cause uncontrolled cell growth and spread of abnormal cells in different organs.
“Cancer is a heterogeneous disease, rarely detected at its initial stages,” said Sooryanarayana Varambally, Ph.D., professor in the UAB Department of Pathology, Division of Molecular and Cellular Pathology and director of Translational Oncologic Pathology Research. Varambally led building of the UALCAN discovery platform, along with Darshan Chandrashekar, Ph.D., assistant professor in the UAB Department of Pathology Division of Genomic Diagnostics and Bioinformatics.
“The advent of high-throughput technologies, such as whole or targeted exome sequencing, whole-genome sequencing, large-scale RNA sequencing, chromatin immunoprecipitation-followed by sequencing, and mass spectrometry-based proteomics, has accelerated cancer research and has resulted in a large volume of publicly available data,” Varambally said.
“Clinicians and cancer researchers involved in detecting, discovering and validating cancer biomarkers, therapeutic targets and treatments find it difficult to access, process, integrate and interpret high-throughput data. Hence, easy-to-use, web-based/standalone tools enable cancer researchers and clinicians to access large-scale ‘Omic’ data and perform multilevel analyses.”
Omics refers to the universal detection of different biochemical classes in cells, including genes/genomics, messenger RNA/transcriptomics, proteins/proteomics and metabolites/metabolomics. This is a massive task because humans have more than 35 trillion cells and trillions of molecules. Genomic DNA in each diploid cell has 3 billion nucleotide pairs. Data in the TCGA alone would fill up 500,000 DVDs with 2.5 petabytes of data. A petabyte is equal to 1 million gigabytes.
Multiple-Omics data is needed to help understand the molecular basis of cancer and uncover many potential therapeutic targets that can benefit cancer patients.
“The updates and upgrades that we have made in UALCAN have enhanced the portal’s functionality,” Varambally said. “They now allow researchers to cross-compare the data for protein-coding gene expression with both noncoding RNAs and proteomic changes. Furthermore, the inclusion of epigenetic data, including promoter methylation, enables researchers to identify potential regulators of gene expression by these mechanisms. The inclusion of microRNA and long noncoding RNA expression and survival analysis adds another dimension to biomarker discovery and analysis of gene expression regulation.”
UALCAN has been cited in more than 3,500 research articles. A 2022 study by Varambally and colleagues analyzed 2,002 tumors and integrated them in the UALCAN site to identify 11 distinct proteome-based subtypes spanning multiple tissue-based cancer types. Two of the subtypes were enriched for brain tumors.
Varambally says different users and experts have enabled UALCAN by suggesting different tumor classifications, and comments from the cancer research community have been positive.
A user at the University of Texas Southwestern Medical Center wrote, “UALCAN is like Google for cancer researchers.” A National Institutes of Health’s National Cancer Institute user wrote, “I was very impressed by the ease of use and richness of data.”
“It’s really cool. I could not find this anywhere before,” said a user at the University of Essex in England.
Varambally and Chandrashekar plan continued improvements for UALCAN. “Moving forward, we intend to obtain, analyze and incorporate additional publicly available transcriptome sequencing datasets, so that they can be used as validation datasets for the observations made with the TCGA transcriptome sequencing data,” Varambally said. “We will also incorporate additional proteomic data and single cell transcriptome sequencing data for the analysis.
“We will also incorporate multiple relevant chromatin immunoprecipitation-followed by sequencing data, and the dot-plot/jitter bioinformatics feature, to help identify the expression of outlier genes. We will also continue to incorporate the suggestions from end-users, as and when they are applicable and feasible.
“We are also working on a UALCAN app for both Android and iOS mobile devices with the goal of providing these valuable cancer data at the fingertip of cancer researchers.”
UALCAN stands for “The University of ALabama CANcer portal, Yes! You All Can.”
Additional authors in the 2017 and 2022 manuscripts describing the UALCAN launch and updates are Bhuwan Bashel, Sai Akshaya Hodigere Balasubramanya, Israel Ponce-Rodriguez, Balabhadrapatruni V.S.K. Chakravarthi, Santhosh Kumar Karthikeyan, Praveen Kumar Korla, Henalben Patel, Ahmedur Rahman Shovon, Mohammad Athar, Sidharth Kumar, Upender Manne and George J. Netto, UAB departments of Pathology and Computer Science, and the O’Neal Comprehensive Cancer Center at UAB; Chad J. Creighton, Baylor College of Medicine, Houston, Texas; and Zhaohui S. Qin, Emory University, Atlanta, Georgia.
Varambally is a scientist in the O’Neal Comprehensive Cancer Center at the UAB Marnix E. Heersink School of Medicine. Pathology is a department in the Heersink School of Medicine, and Computer Science is a department in the UAB College of Arts and Sciences.
UAB support for UALCAN comes from the Heersink School of Medicine, the O’Neal Comprehensive Cancer Center and the Department of Pathology. This support is critical to develop and sustain this extremely valuable data sharing resource for accelerating cancer research, Varambally says.