Non-(protein) coding RNAs (ncRNAs) represent the molecular output from the 95-99% of the permissively transcribed human genome yet most transcriptional events that we know are mapped to the 1-2% protein-coding genome. Increasing evidence from work by us and others has shown that ncRNAs are functionally relevant in diseases such as cancer, where they are often mis-regulated (reviewed elsewhere, for example by Nguyen and Carninci, 2018). We are currently limited in delineating their functional consequences as they are often expressed in only specific cell-types and their detection has remained challenging as most existing tools depend on prior annotations. Further, since most transcriptome datasets from healthy and diseased individuals provide information on bulk tissues the signal for ncRNAs from within specific cell-types is lost.
Methodology: In this project, we propose to use novel single-cell and spatially resolved RNA sequencing technologies to discover ncRNAs in healthy and cancer tissues by developing a novel bioinformatics pipeline on a collection of very large in-house and public scRNA-seq and ST-seq cancer datasets across different organs, including Brain (Medulloblastoma), Skin (SCC, BCC, and Melanoma), Kidney, Colorectal, Prostate, Breast. The results will be computationally validated on published datasets such as CAGE-seq (from the lncRNA Atlas generated by the FANTOM consortium) and by re-aligning previously published cancer datasets such as The Cancer Genome Atlas (TCGA), the Human Cell Atlas, and The Human Tumor Atlas Network. We will assign putative functions to the validated newly discovered ncRNAs using co-expression networks and machine learning.
Both the PIs at IITD (Gupta et al 2018; Johri et al 2020) and UQ (Willis et al, 2020; Pham et al, 2020; Tran et al, 2020; Tan et al, 2020) have been pioneering computational pipelines in gene expression analysis leading single-cell gene expression adoption in the Asia-pacific region setting up an ideal collaboration.
The project will have three main outcomes:
B.Tech from IITs (in Biological sciences, Biotechnology or related Biological sciences fields, Dual degree from IITs (with one degree in Biological sciences, Biotechnology or related Biological sciences fields), BS-MS in Biological Sciences from any IISERs/IISc, B.Tech in Computer Science or Bioinformatics, Dual degree students from BITS Pilani (with Biological Science in combination with either Computer Science, Electronics, Information systems, or EnI); M.Sc. Bioinformatics, BS-MS in Electrical Engineering, BS-MS in Computer science