Non-(protein) coding RNAs (ncRNAs) represent the molecular output from the 95-99% of the permissively transcribed human genome yet most transcriptional events that we know are mapped to the 1-2% protein-coding genome. Increasing evidence from work by us and others has shown that ncRNAs are functionally relevant in diseases such as cancer, where they are often mis-regulated (reviewed elsewhere, for example by Nguyen and Carninci, 2018). We are currently limited in delineating their functional consequences as they are often expressed in only specific cell-types and their detection has remained challenging as most existing tools depend on prior annotations. Further, since most transcriptome datasets from healthy and diseased individuals provide information on bulk tissues the signal for ncRNAs from within specific cell-types is lost.
Methodology: In this project, we propose to use novel single-cell and spatially resolved RNA sequencing technologies to discover ncRNAs in healthy and cancer tissues by developing a novel bioinformatics pipeline on a collection of very large in-house and public scRNA-seq and ST-seq cancer datasets across different organs, including Brain (Medulloblastoma), Skin (SCC, BCC, and Melanoma), Kidney, Colorectal, Prostate, Breast. The results will be computationally validated on published datasets such as CAGE-seq (from the lncRNA Atlas generated by the FANTOM consortium) and by re-aligning previously published cancer datasets such as The Cancer Genome Atlas (TCGA), the Human Cell Atlas, and The Human Tumor Atlas Network. We will assign putative functions to the validated newly discovered ncRNAs using co-expression networks and machine learning.
Both the PIs at IITD (Gupta et al 2018; Johri et al 2020) and UQ (Willis et al, 2020; Pham et al, 2020; Tran et al, 2020; Tan et al, 2020) have been pioneering computational pipelines in gene expression analysis leading single-cell gene expression adoption in the Asia-pacific region setting up an ideal collaboration.
The project will have three main outcomes:
Development of open-source computational pipelines for ncRNA discovery from single-cell and spatial transcriptomics datasets. These may be applied to other model and non-model organisms for the discovery of novel ncRNA in the context of development, aging, and disease.
The outcome of the project will lead to the creation of a pan-cancer atlas of ncRNAs and exhaustive gene expression annotations that help us better understand the complexity of information in the human genome and how it can be misappropriated to cause diseases such as cancer. The atlas will serve as a valuable reference resource to the research community.
These novel ncRNA functional annotations will also help in deciphering the role aid in future experimentation such as CRISPR-based genetic screening to better our molecular understanding of oncogenesis and the role of ncRNAs.