🍩 Database of Original & Non-Theoretical Uses of Topology

(found 17 matches in 0.006946s)
  1. Gene Expression Data Classification Using Topology and Machine Learning Models (2022)

    Tamal K. Dey, Sayan Mandal, Soham Mukherjee
    Abstract Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.
  2. A Topological Data Analysis Approach On Predicting Phenotypes From Gene Expression Data (2020)

    Sayan Mandal, Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida
    Abstract The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis., We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods., This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
  3. Topological Data Analysis of Single-Cell Hi-C Contact Maps (2020)

    Mathieu Carrière, Raúl Rabadán
    Abstract Due to recent breakthroughs in high-throughput sequencing, it is now possible to use chromosome conformation capture (CCC) to understand the three dimensional conformation of DNA at the whole genome level, and to characterize it with the so-called contact maps. This is very useful since many biological processes are correlated with DNA folding, such as DNA transcription. However, the methods for the analysis of such conformations are still lacking mathematical guarantees and statistical power. To handle this issue, we propose to use the Mapper, which is a standard tool of Topological Data Analysis (TDA) that allows one to efficiently encode the inherent continuity and topology of underlying biological processes in data, in the form of a graph with various features such as branches and loops. In this article, we show how recent statistical techniques developed in TDA for the Mapper algorithm can be extended and leveraged to formally define and statistically quantify the presence of topological structures coming from biological phenomena, such as the cell cyle, in datasets of CCC contact maps.
  4. Persistent Homology Analysis of Brain Transcriptome Data in Autism (2019)

    Daniel Shnier, Mircea A. Voineagu, Irina Voineagu
    Abstract Persistent homology methods have found applications in the analysis of multiple types of biological data, particularly imaging data or data with a spatial and/or temporal component. However, few studies have assessed the use of persistent homology for the analysis of gene expression data. Here we apply persistent homology methods to investigate the global properties of gene expression in post-mortem brain tissue (cerebral cortex) of individuals with autism spectrum disorders (ASD) and matched controls. We observe a significant difference in the geometry of inter-sample relationships between autism and healthy controls as measured by the sum of the death times of zero-dimensional components and the Euler characteristic. This observation is replicated across two distinct datasets, and we interpret it as evidence for an increased heterogeneity of gene expression in autism. We also assessed the topology of gene-level point clouds and did not observe significant differences between ASD and control transcriptomes, suggesting that the overall transcriptome organization is similar in ASD and healthy cerebral cortex. Overall, our study provides a novel framework for persistent homology analyses of gene expression data for genetically complex disorders.
  5. Topological Gene Expression Networks Recapitulate Brain Anatomy and Function (2019)

    Alice Patania, Pierluigi Selvaggi, Mattia Veronese, Ottavia Dipasquale, Paul Expert, Giovanni Petri
    Abstract Understanding how gene expression translates to and affects human behavior is one of the ultimate goals of neuroscience. In this paper, we present a pipeline based on Mapper, a topological simplification tool, to analyze gene co-expression data. We first validate the method by reproducing key results from the literature on the Allen Human Brain Atlas and the correlations between resting-state fMRI and gene co-expression maps. We then analyze a dopamine-related gene set and find that co-expression networks produced by Mapper return a structure that matches the well-known anatomy of the dopaminergic pathway. Our results suggest that network based descriptions can be a powerful tool to explore the relationships between genetic pathways and their association with brain function and its perturbation due to illness and/or pharmacological challenges., In this paper, we described a gene co-expression analysis pipeline that produces networks that we show to be closely related to either brain function and to neurotransmitter pathways. Our results suggest that this pipeline could be developed into a platform enabling the exploration of the effects of physiological and pathological alterations to specific gene sets, including profiling drugs effects.
  6. Molecular Phenotyping Using Networks, Diffusion, and Topology: Soft Tissue Sarcoma (2019)

    James C. Mathews, Maryam Pouryahya, Caroline Moosmüller, Yannis G. Kevrekidis, Joseph O. Deasy, Allen Tannenbaum
    Abstract Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
  7. Rootstock Effects on Scion Phenotypes in a ‘Chambourcin’ Experimental Vineyard (2019)

    Zoë Migicovsky, Zachary N Harris, Laura L Klein, Mao Li, Adam McDermaid, Daniel H Chitwood, Anne Fennell, Laszlo G Kovacs, Misha Kwasniewski, Jason P Londo, Qin Ma, Allison J Miller
    Abstract Understanding how root systems modulate shoot system phenotypes is a fundamental question in plant biology and will be useful in developing resilient agricultural crops. Grafting is a common horticultural practice that joins the roots (rootstock) of one plant to the shoot (scion) of another, providing an excellent method for investigating how these two organ systems affect each other. In this study, we used the French-American hybrid grapevine ‘Chambourcin’ (Vitis L.) as a model to explore the rootstock–scion relationship. We examined leaf shape, ion concentrations, and gene expression in ‘Chambourcin’ grown ungrafted as well as grafted to three different rootstocks (‘SO4’, ‘1103P’ and ‘3309C’) across 2 years and three different irrigation treatments. We found that a significant amount of the variation in leaf shape could be explained by the interaction between rootstock and irrigation. For ion concentrations, the primary source of variation identified was the position of a leaf in a shoot, although rootstock and rootstock by irrigation interaction also explained a significant amount of variation for most ions. Lastly, we found rootstock-specific patterns of gene expression in grafted plants when compared to ungrafted vines. Thus, our work reveals the subtle and complex effect of grafting on ‘Chambourcin’ leaf morphology, ionomics, and gene expression.
  8. Gene Coexpression Network Comparison via Persistent Homology (2018)

    Ali Nabi Duman, Harun Pirim
    Abstract Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies.
  9. Airway Pathological Heterogeneity in Asthma: Visualization of Disease Microclusters Using Topological Data Analysis (2018)

    Salman Siddiqui, Aarti Shikotra, Matthew Richardson, Emma Doran, David Choy, Alex Bell, Cary D. Austin, Jeffrey Eastham-Anderson, Beverley Hargadon, Joseph R. Arron, Andrew Wardlaw, Christopher E. Brightling, Liam G. Heaney, Peter Bradding
    Abstract Background Asthma is a complex chronic disease underpinned by pathological changes within the airway wall. How variations in structural airway pathology and cellular inflammation contribute to the expression and severity of asthma are poorly understood. Objectives Therefore we evaluated pathological heterogeneity using topological data analysis (TDA) with the aim of visualizing disease clusters and microclusters. Methods A discovery population of 202 adult patients (142 asthmatic patients and 60 healthy subjects) and an external replication population (59 patients with severe asthma) were evaluated. Pathology and gene expression were examined in bronchial biopsy samples. TDA was applied by using pathological variables alone to create pathology-driven visual networks. Results In the discovery cohort TDA identified 4 groups/networks with multiple microclusters/regions of interest that were masked by group-level statistics. Specifically, TDA group 1 consisted of a high proportion of healthy subjects, with a microcluster representing a topological continuum connecting healthy subjects to patients with mild-to-moderate asthma. Three additional TDA groups with moderate-to-severe asthma (Airway Smooth MuscleHigh, Reticular Basement MembraneHigh, and RemodelingLow groups) were identified and contained numerous microclusters with varying pathological and clinical features. Mutually exclusive TH2 and TH17 tissue gene expression signatures were identified in all pathological groups. Discovery and external replication applied to the severe asthma subgroup identified only highly similar “pathological data shapes” through analyses of persistent homology. Conclusions We have identified and replicated novel pathological phenotypes of asthma using TDA. Our methodology is applicable to other complex chronic diseases.
  10. Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations (2016)

    Abdellah Tebani, Carlos Afonso, Stéphane Marret, Soumeya Bekri
    Abstract The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of next-generation diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by high-throughput omics technologies and the computational surge, it enables multi-scale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. High-throughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable high-dimensional data modeling and, therefore, accessible and user-friendly visualization. Furthermore, bioinformatics has enabled comprehensive multi-omics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present state-of-the-art multi-omics data analysis strategies in a clinical context. The challenges of omics-based biomarker translation are discussed. Perspectives regarding the use of multi-omics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the post-genomic era.
  11. Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis (2015)

    Jose A. Perea, John Harer
    Abstract We develop in this paper a theoretical framework for the topological study of time series data. Broadly speaking, we describe geometrical and topological properties of sliding window embeddings, as seen through the lens of persistent homology. In particular, we show that maximum persistence at the point-cloud level can be used to quantify periodicity at the signal level, prove structural and convergence theorems for the resulting persistence diagrams, and derive estimates for their dependency on window size and embedding dimension. We apply this methodology to quantifying periodicity in synthetic data sets and compare the results with those obtained using state-of-the-art methods in gene expression analysis. We call this new method SW1PerS, which stands for Sliding Windows and 1-Dimensional Persistence Scoring.
  12. Delineation of a Conserved Arrestin-Biased Signaling Repertoire in Vivo (2015)

    Stuart Maudsley, Bronwen Martin, Diane Gesty-Palmer, Huey Cheung, Calvin Johnson, Shamit Patel, Kevin G. Becker, William H. Wood, Yongqing Zhang, Elin Lehrmann, Louis M. Luttrell
    Abstract Biased G protein–coupled receptor agonists engender a restricted repertoire of downstream events from their cognate receptors, permitting them to produce mixed agonist-antagonist effects in vivo. While this opens the possibility of novel therapeutics, it complicates rational drug design, since the in vivo response to a biased agonist cannot be reliably predicted from its in cellula efficacy. We have employed novel informatic approaches to characterize the in vivo transcriptomic signature of the arrestin pathway-selective parathyroid hormone analog [d-Trp12, Tyr34]bovine PTH(7-34) in six different murine tissues after chronic drug exposure. We find that [d-Trp12, Tyr34]bovine PTH(7-34) elicits a distinctive arrestin-signaling focused transcriptomic response that is more coherently regulated across tissues than that of the pluripotent agonist, human PTH(1-34). This arrestin-focused network is closely associated with transcriptional control of cell growth and development. Our demonstration of a conserved arrestin-dependent transcriptomic signature suggests a framework within which the in vivo outcomes of arrestin-biased signaling may be generalized.
  13. Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology (2015)

    Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, Catherine Park
    Abstract DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer.
  14. Extracting Insights From the Shape of Complex Data Using Topology (2013)

    P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson
    Abstract This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.
  15. Topological Analysis of Gene Expression Arrays Identifies High Risk Molecular Subtypes in Breast Cancer (2012)

    Javier Arsuaga, Nils A. Baas, Daniel DeWoskin, Hideaki Mizuno, Aleksandr Pankov, Catherine Park
    Abstract Genomic technologies measure thousands of molecular signals with the goal of understanding complex biological processes. In cancer these molecular signals have been used to characterize disease subtypes, signaling pathways and to identify subsets of patients with specific prognosis. However molecular signals for any disease type are so vast and complex that novel mathematical approaches are required for further analyses. Persistent and computational homology provide a new method for these analyses. In our previous work we presented a new homology-based supervised classification method to identify copy number aberrations from comparative genomic hybridization arrays. In this work we first propose a theoretical framework for our classification method and second we extend our analysis to gene expression data. We analyze a published breast cancer data set and find that that our method can distinguish most, but not all, different breast cancer subtypes. This result suggests that specific relationships between genes, captured by our algorithm, help distinguish between breast cancer subtypes. We propose that topological methods can be used for the classification and clustering of gene expression profiles.
  16. Lipschitz Functions Have Lp-Stable Persistence (2010)

    David Cohen-Steiner, Herbert Edelsbrunner, John Harer, Yuriy Mileyko
    Abstract We prove two stability results for Lipschitz functions on triangulable, compact metric spaces and consider applications of both to problems in systems biology. Given two functions, the first result is formulated in terms of the Wasserstein distance between their persistence diagrams and the second in terms of their total persistence.