Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

Abstract

This article introduces an application of Ghrist barcodes in the study of persistent Betti numbers derived from vortex nerve complexes found in triangulations of video frames. A Ghrist barcode (also called a persistence barcode) is a topology of data pic- tograph useful in representing the persistence of the features of changing shapes. The basic approach is to introduce a free Abelian group representation of intersecting filled polygons on the barycenters of the triangles of Alexandroff nerves. An Alexandroff nerve is a maximal collection of triangles of a common vertex in the triangulation of a finite, bounded planar region. In our case, the planar region is a video frame. A Betti number is a count of the number of generators is a finite Abelian group. The focus here is on the persistent Betti numbers across sequences of triangulated video frames. Each Betti number is mapped to an entry in a Ghrist barcode. Two main results are given, namely, vortex nerves are Edelsbrunner-Harer nerve complexes and the Betti number of a vortex nerve equals k + 2 for a vortex nerve containing k edges attached between a pair of vortex cycles in the nerve.

A Topological Representation of Branching Neuronal Morphologies (2018)

Lida Kanari, Pawe\\textbackslash\l D\\textbackslash\lotko, Martina Scolamiero, Ran Levi, Julian Shillcock, Kathryn Hess, Henry Markram

Export citation

Algorithms for Topological Analysis of Spatial Data (2018)

Sergey Eremeev, Ekaterina Seltsova

Export citation

Limitations of Topological Data Analysis for Event-Related fMRI (2018)

Cameron T. Ellis, Michael Lesnick, Gregory Henselman-Petrusek, Bryn Keller, Jonathan D. Cohen

Export citation

A Barcode Shape Descriptor for Curve Point Cloud Data (2004)

Anne Collins, Afra Zomorodian, Gunnar Carlsson, Leonidas J. Guibas

Abstract

In this paper, we present a complete computational pipeline for extracting a compact shape descriptor for curve point cloud data (PCD). Our shape descriptor, called a barcode, is based on a blend of techniques from differential geometry and algebraic topology. We also provide a metric over the space of barcodes, enabling fast comparison of PCDs for shape recognition and clustering. To demonstrate the feasibility of our approach, we implement our pipeline and provide experimental evidence in shape classification and parametrization.

Characterizing Scales of Genetic Recombination and Antibiotic Resistance in Pathogenic Bacteria Using Topological Data Analysis (2014)

Kevin J. Emmett, Raul Rabadan

Abstract

Pathogenic bacteria present a large disease burden on human health. Control of these pathogens is hampered by rampant lateral gene transfer, whereby pathogenic strains may acquire genes conferring resistance to common antibiotics. Here we introduce tools from topological data analysis to characterize the frequency and scale of lateral gene transfer in bacteria, focusing on a set of pathogens of significant public health relevance. As a case study, we examine the spread of antibiotic resistance in Staphylococcus aureus. Finally, we consider the possible role of the human microbiome as a reservoir for antibiotic resistance genes.

Topological Singularity Detection at Multiple Scales (2023)

Julius von Rohrscheidt, Bastian Rieck

Abstract

The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.

The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis (2024)

Ondřej Draganov, Steven Skiena

Abstract

Word embeddings represent language vocabularies as clouds of d-dimensional points. We investigate how information is conveyed by the general shape of these clouds, instead of representing the semantic meaning of each token. Specifically, we use the notion of persistent homology from topological data analysis (TDA) to measure the distances between language pairs from the shape of their unlabeled embeddings. These distances quantify the degree of non-isometry of the embeddings. To distinguish whether these differences are random training errors or capture real information about the languages, we use the computed distance matrices to construct language phylogenetic trees over 81 Indo-European languages. Careful evaluation shows that our reconstructed trees exhibit strong and statistically-significant similarities to the reference.

Community Resources

Code
Data

Persistent Homology for Breast Tumor Classification Using Mammogram Scans (2022)

Aras Asaad, Dashti Ali, Taban Majeed, Rasber Rashid

Abstract

An Important tool in the field topological data analysis is known as persistent Homology (PH) which is used to encode abstract representation of the homology of data at different resolutions in the form of persistence diagram (PD). In this work we build more than one PD representation of a single image based on a landmark selection method, known as local binary patterns, that encode different types of local textures from images. We employed different PD vectorizations using persistence landscapes, persistence images, persistence binning (Betti Curve) and statistics. We tested the effectiveness of proposed landmark based PH on two publicly available breast abnormality detection datasets using mammogram scans. Sensitivity of landmark based PH obtained is over 90% in both datasets for the detection of abnormal breast scans. Finally, experimental results give new insights on using different types of PD vectorizations which help in utilising PH in conjunction with machine learning classifiers.

Topological Data Analysis on Simple English Wikipedia Articles (2020)

Matthew Wright, Xiaojun Zheng

Abstract

Single-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems, with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology, and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. These approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. We demonstrate the applicability of our methods by distinguishing certain subsets of the Wikipedia data, and by comparison with random data. Results include insights into the construction of null distributions and stability of our methods with respect to noisy data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter.

From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives (2020)

Lida Kanari, Adélie Garin, Kathryn Hess

Abstract

Methods of topological data analysis have been successfully applied in a wide range of fields to provide useful summaries of the structure of complex data sets in terms of topological descriptors, such as persistence diagrams. While there are many powerful techniques for computing topological descriptors, the inverse problem, i.e., recovering the input data from topological descriptors, has proved to be challenging. In this article we study in detail the Topological Morphology Descriptor (TMD), which assigns a persistence diagram to any tree embedded in Euclidean space, and a sort of stochastic inverse to the TMD, the Topological Neuron Synthesis (TNS) algorithm, gaining both theoretical and computational insights into the relation between the two. We propose a new approach to classify barcodes using symmetric groups, which provides a concrete language to formulate our results. We investigate to what extent the TNS recovers a geometric tree from its TMD and describe the effect of different types of noise on the process of tree generation from persistence diagrams. We prove moreover that the TNS algorithm is stable with respect to specific types of noise.

Persistent Brain Network Homology From the Perspective of Dendrogram (2012)

Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, Dong Soo Lee

Abstract

The brain network is usually constructed by estimating the connectivity matrix and thresholding it at an arbitrary level. The problem with this standard method is that we do not have any generally accepted criteria for determining a proper threshold. Thus, we propose a novel multiscale framework that models all brain networks generated over every possible threshold. Our approach is based on persistent homology and its various representations such as the Rips filtration, barcodes, and dendrograms. This new persistent homological framework enables us to quantify various persistent topological features at different scales in a coherent manner. The barcode is used to quantify and visualize the evolutionary changes of topological features such as the Betti numbers over different scales. By incorporating additional geometric information to the barcode, we obtain a single linkage dendrogram that shows the overall evolution of the network. The difference between the two networks is then measured by the Gromov-Hausdorff distance over the dendrograms. As an illustration, we modeled and differentiated the FDG-PET based functional brain networks of 24 attention-deficit hyperactivity disorder children, 26 autism spectrum disorder children, and 11 pediatric control subjects.

Time-Inhomogeneous Diffusion Geometry and Topology (2022)

Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy

Abstract

Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.

Learning Representations of Persistence Barcodes (2019)

Christoph D. Hofer, Roland Kwitt, Marc Niethammer

Abstract

We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals.

Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution (2020)

Michael Lesnick, Raúl Rabadán, Daniel I. S. Rosenbloom

Abstract

A topological approach to the study of genetic recombination, based on persistent homology, was introduced by Chan, Carlsson, and Rabadán in 2013. This associates a sequence of signatures called barcodes to genomic data sampled from an evolutionary history. In this paper, we develop theoretical foundations for this approach. First, we present a novel formulation of the underlying inference problem. Specifically, we introduce and study the novelty profile, a simple, stable statistic of an evolutionary history which not only counts recombination events but also quantifies how recombination creates genetic diversity. We propose that the (hitherto implicit) goal of the topological approach to recombination is the estimation of novelty profiles. We then study the problem of obtaining a lower bound on the novelty profile using barcodes. We focus on a low-recombination regime, where the evolutionary history can be described by a directed acyclic graph called a galled tree, which differs from a tree only by isolated topological defects. We show that in this regime, under a complete sampling assumption, the \$1\textasciicircum\mathrm\st\\$ barcode yields a lower bound on the novelty profile, and hence on the number of recombination events. For \$i\textgreater1\$, the \$i\textasciicircum\\mathrm\th\\\$ barcode is empty. In addition, we use a stability principle to strengthen these results to ones which hold for any subsample of an arbitrary evolutionary history. To establish these results, we describe the topology of the Vietoris--Rips filtrations arising from evolutionary histories indexed by galled trees. As a step towards a probabilistic theory, we also show that for a random history indexed by a fixed galled tree and satisfying biologically reasonable conditions, the intervals of the \$1\textasciicircum\\mathrm\st\\\$ barcode are independent random variables. Using simulations, we explore the sensitivity of these intervals to recombination.

Inference of Ancestral Recombination Graphs Through Topological Data Analysis (2016)

Pablo G. Cámara, Arnold J. Levine, Raúl Rabadán

Abstract

The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands., Evolution occurs through different mechanisms, including point mutations, gene duplication, horizontal gene transfer, and recombinations. Some of these mechanisms cannot be captured by tree graphs. We present a framework, based on the mathematical tools of computational topology, that can explicitly accommodate both recombination and mutation events across the evolutionary history of a sample of genomic sequences. This approach generates a new type of summary graph and algebraic structures that provide quantitative information on the evolutionary scale and frequency of recombination events. The accompanying software, TARGet, is applied to several examples, including migration between sexually-reproducing populations, human recombination, and recombination in Darwin’s finches.

Feasibility of Topological Data Analysis for Event-Related fMRI (2019)

Cameron T. Ellis, Michael Lesnick, Gregory Henselman-Petrusek, Bryn Keller, Jonathan D. Cohen

Abstract

Recent fMRI research shows that perceptual and cognitive representations are instantiated in high-dimensional multivoxel patterns in the brain. However, the methods for detecting these representations are limited. Topological data analysis (TDA) is a new approach, based on the mathematical field of topology, that can detect unique types of geometric features in patterns of data. Several recent studies have successfully applied TDA to study various forms of neural data; however, to our knowledge, TDA has not been successfully applied to data from event-related fMRI designs. Event-related fMRI is very common but limited in terms of the number of events that can be run within a practical time frame and the effect size that can be expected. Here, we investigate whether persistent homology—a popular TDA tool that identifies topological features in data and quantifies their robustness—can identify known signals given these constraints. We use fmrisim, a Python-based simulator of realistic fMRI data, to assess the plausibility of recovering a simple topological representation under a variety of conditions. Our results suggest that persistent homology can be used under certain circumstances to recover topological structure embedded in realistic fMRI data simulations.How do we represent the world? In cognitive neuroscience it is typical to think representations are points in high-dimensional space. In order to study these kinds of spaces it is necessary to have tools that capture the organization of high-dimensional data. Topological data analysis (TDA) holds promise for detecting unique types of geometric features in patterns of data. Although potentially useful, TDA has not been applied to event-related fMRI data. Here we utilized a popular tool from TDA, persistent homology, to recover topological signals from event-related fMRI data. We simulated realistic fMRI data and explored the parameters under which persistent homology can successfully extract signal. We also provided extensive code and recommendations for how to make the most out of TDA for fMRI analysis.

A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science (2023)

Lander Ver Hoef, Henry Adams, Emily J. King, Imme Ebert-Uphoff

Abstract

Abstract Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code. Significance Statement Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.

🍩 Database of Original & Non-Theoretical Uses of Topology

Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

A Topological Representation of Branching Neuronal Morphologies (2018)

Algorithms for Topological Analysis of Spatial Data (2018)

Limitations of Topological Data Analysis for Event-Related fMRI (2018)

A Barcode Shape Descriptor for Curve Point Cloud Data (2004)

Characterizing Scales of Genetic Recombination and Antibiotic Resistance in Pathogenic Bacteria Using Topological Data Analysis (2014)

Topological Singularity Detection at Multiple Scales (2023)

The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis (2024)

Community Resources

Persistent Homology for Breast Tumor Classification Using Mammogram Scans (2022)

Topological Data Analysis on Simple English Wikipedia Articles (2020)

From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives (2020)

Persistent Brain Network Homology From the Perspective of Dendrogram (2012)

Time-Inhomogeneous Diffusion Geometry and Topology (2022)

Learning Representations of Persistence Barcodes (2019)

Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution (2020)

Inference of Ancestral Recombination Graphs Through Topological Data Analysis (2016)

Feasibility of Topological Data Analysis for Event-Related fMRI (2019)

A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science (2023)