Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition (2007)

Gurjeet Singh, Facundo Mémoli, Gunnar Carlsson

Abstract

We present a computational method for extracting simple descriptions of high dimensional data sets in the form of simplicial complexes. Our method, called Mapper, is based on the idea of partial clustering of the data guided by a set of functions deﬁned on the data. The proposed method is not dependent on any particular clustering algorithm, i.e. any clustering algorithm may be used with Mapper. We implement this method and present a few sample applications in which simple descriptions of the data present important information about its structure.

Export citation

Unifying Immunology With Informatics and Multiscale Biology (2014)

Brian A Kidd, Lauren A Peters, Eric E Schadt, Joel T Dudley

Abstract

The immune system is a highly complex and dynamic system. Historically, the most common scientific and clinical practice has been to evaluate its individual components. This kind of approach cannot always expose the interconnecting pathways that control immune-system responses and does not reveal how the immune system works across multiple biological systems and scales. High-throughput technologies can be used to measure thousands of parameters of the immune system at a genome-wide scale. These system-wide surveys yield massive amounts of quantitative data that provide a means to monitor and probe immune-system function. New integrative analyses can help synthesize and transform these data into valuable biological insight. Here we review some of the computational analysis tools for high-dimensional data and how they can be applied to immunology.

Skeletonization and Partitioning of Digital Images Using Discrete Morse Theory (2015)

Olaf Delgado-Friedrichs, Vanessa Robins, Adrian Sheppard

Abstract

We show how discrete Morse theory provides a rigorous and unifying foundation for defining skeletons and partitions of grayscale digital images. We model a grayscale image as a cubical complex with a real-valued function defined on its vertices (the voxel values). This function is extended to a discrete gradient vector field using the algorithm presented in Robins, Wood, Sheppard TPAMI 33:1646 (2011). In the current paper we define basins (the building blocks of a partition) and segments of the skeleton using the stable and unstable sets associated with critical cells. The natural connection between Morse theory and homology allows us to prove the topological validity of these constructions; for example, that the skeleton is homotopic to the initial object. We simplify the basins and skeletons via Morse-theoretic cancellation of critical cells in the discrete gradient vector field using a strategy informed by persistent homology. Simple working Python code for our algorithms for efficient vector field traversal is included. Example data are taken from micro-CT images of porous materials, an application area where accurate topological models of pore connectivity are vital for fluid-flow modelling.

When Remote Sensing Meets Topological Data Analysis (2018)

Ludovic Duponchel

Abstract

Author Summary: Hyperspectral remote sensing plays an increasingly important role in many scientific domains and everyday life problems. Indeed, this imaging concept ends up in applications as varied as catching tax-evaders red-handed by locating new construction and building alterations, searching for aircraft and saving lives after fatal crashes, detecting oil spills for marine life and environmental preservation, spying on enemies with reconnaissance satellites, watching algae grow as an indicator of environmental health, forecasting weather to warn about natural disasters and much more. From an instrumental point of view, we can say that the actual spectrometers have rather good characteristics, even if we can always increase spatial resolution and spectral range. In order to extract ever more information from such experiments and develop new applications, we must, therefore, propose multivariate data analysis tools able to capture the shape of data sets and their specific features. Nevertheless, actual methods often impose a data model which implicitly defines the geometry of the data set. The aim of the paper is thus to introduce the concept of topological data analysis in the framework of remote sensing, making no assumptions about the global shape of the data set, but also allowing the capture of its local features.

Community Resources

Data

Branching and Circular Features in High Dimensional Data (2011)

B. Wang, B. Summa, V. Pascucci, M. Vejdemo-Johansson

Abstract

Large observations and simulations in scientific research give rise to high-dimensional data sets that present many challenges and opportunities in data analysis and visualization. Researchers in application domains such as engineering, computational biology, climate study, imaging and motion capture are faced with the problem of how to discover compact representations of highdimensional data while preserving their intrinsic structure. In many applications, the original data is projected onto low-dimensional space via dimensionality reduction techniques prior to modeling. One problem with this approach is that the projection step in the process can fail to preserve structure in the data that is only apparent in high dimensions. Conversely, such techniques may create structural illusions in the projection, implying structure not present in the original high-dimensional data. Our solution is to utilize topological techniques to recover important structures in high-dimensional data that contains non-trivial topology. Specifically, we are interested in high-dimensional branching structures. We construct local circle-valued coordinate functions to represent such features. Subsequently, we perform dimensionality reduction on the data while ensuring such structures are visually preserved. Additionally, we study the effects of global circular structures on visualizations. Our results reveal never-before-seen structures on real-world data sets from a variety of applications.

Two-Tier Mapper, an Unbiased Topology-Based Clustering Method for Enhanced Global Gene Expression Analysis (2019)

Rachel Jeitziner, Mathieu Carrière, Jacques Rougemont, Steve Oudot, Kathryn Hess, Cathrin Brisken

Abstract

MOTIVATION: Unbiased clustering methods are needed to analyze growing numbers of complex datasets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small datasets. To overcome these shortcomings we used topological data analysis, an emerging field of mathematics that discerns additional feature and discovers hidden insights on datasets and has a wide application range. RESULTS: We have developed a topology-based clustering method called Two-Tier Mapper (TTMap) for enhanced analysis of global gene expression datasets. First, TTMap discerns divergent features in the control group, adjusts for them, and identifies outliers. Second, the deviation of each test sample from the control group in a high-dimensional space is computed, and the test samples are clustered using a new Mapper-based topological algorithm at two levels: a global tier and local tiers. All parameters are either carefully chosen or data-driven, avoiding any user-induced bias. The method is stable, different datasets can be combined for analysis, and significant subgroups can be identified. It outperforms current clustering methods in sensitivity and stability on synthetic and biological datasets, in particular when sample sizes are small; outcome is not affected by removal of control samples, by choice of normalization, or by subselection of data. TTMap is readily applicable to complex, highly variable biological samples and holds promise for personalized medicine. AVAILABILITY AND IMPLEMENTATION: TTMap is supplied as an R package in Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

🍩 Database of Original & Non-Theoretical Uses of Topology

Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition (2007)

Unifying Immunology With Informatics and Multiscale Biology (2014)

Skeletonization and Partitioning of Digital Images Using Discrete Morse Theory (2015)

When Remote Sensing Meets Topological Data Analysis (2018)

Community Resources

Branching and Circular Features in High Dimensional Data (2011)

Two-Tier Mapper, an Unbiased Topology-Based Clustering Method for Enhanced Global Gene Expression Analysis (2019)