Manifold Learning Yields Insight into Complex Biological State Space
Recent advances in single-cell technologies enable deep insights into cellular development, gene regulation, and phenotypic diversity by measuring gene expression and epigenetic information for thousands of single cells in a single experiment. While these technologies hold great potential for improving our understanding of cellular states and progression, they also pose new challenges in terms of scale, complexity, noise and measurement artifact which require advanced mathematical and algorithmic tools to extract underlying biological signals. In this talk, I cover one of most promising techniques to tackle these problems: manifold learning, and the related manifold assumption in data analysis. Manifold learning provides a powerful structure for algorithmic approaches to naturally process and the data, visualize the data and understand progressions as well as to find phenotypic diversity as well and infer patterns in it. I will cover two alternative approaches to manifold learning, diffusion-based and deep learning-based and show results in several projects including:1) MAGIC (Markov Affinity-based Graph Imputation of Cells): an algorithm for denoising and transcript recover of single cells applied to single-cell RNA sequencing data from the epithelial-to-mesenchymal transition in breast cancer, 2) PHATE (Potential of Heat-diffusion Affinity-based Transition Embedding): a visualization technique that offers an alternative to tSNE in that it emphasizes progressions and branching structures rather than cluster separations shown on several datasets including a newly generated embryoid body differentiation dataset, and 3) SAUCIE (Sparse AutoEncoders for Clustering Imputation and Embedding): a novel auto encoder architecture that performs denoising, batch normalization, clustering and visualization simultaneously for massive single-cell data sets from multi-patient cohorts shown on mass cytometry data from Dengue patients, and 4) The transcoder which learns to predict state transitions and trajectories after being trained on samples from dynamic systems and single-cell trajectories.