Topic Detection in Twitter Using Topology Data Analysis (2015)

Pablo Torres-Tramón, Hugo Hromic, Bahareh Rahmanzadeh Heravi

Abstract

The massive volume of content generated by social media greatly exceeds human capacity to manually process this data in order to identify topics of interest. As a solution, various automated topic detection approaches have been proposed, most of which are based on document clustering and burst detection. These approaches normally represent textual features in standard n-dimensional Euclidean metric spaces. However, in these cases, directly filtering noisy documents is challenging for topic detection. Instead we propose Topol, a topic detection method based on Topology Data Analysis (TDA) that transforms the Euclidean feature space into a topological space where the shapes of noisy irrelevant documents are much easier to distinguish from topically-relevant documents. This topological space is organised in a network according to the connectivity of the points, i.e. the documents, and by only filtering based on the size of the connected components we obtain competitive results compared to other state of the art topic detection methods.

Persistent Topology for Cryo-Em Data Analysis (2015)

Kelin Xia, Guo-Wei Wei

Abstract

SummaryIn this work, we introduce persistent homology for the analysis of cryo-electron microscopy (cryo-EM) density maps. We identify the topological fingerprint or topological signature of noise, which is widespread in cryo-EM data. For low signal-to-noise ratio (SNR) volumetric data, intrinsic topological features of biomolecular structures are indistinguishable from noise. To remove noise, we employ geometric flows that are found to preserve the intrinsic topological fingerprints of cryo-EM structures and diminish the topological signature of noise. In particular, persistent homology enables us to visualize the gradual separation of the topological fingerprints of cryo-EM structures from those of noise during the denoising process, which gives rise to a practical procedure for prescribing a noise threshold to extract cryo-EM structure information from noise contaminated data after certain iterations of the geometric flow equation. To further demonstrate the utility of persistent homology for cryo-EM data analysis, we consider a microtubule intermediate structure Electron Microscopy Data (EMD 1129). Three helix models, an alpha-tubulin monomer model, an alpha-tubulin and beta-tubulin model, and an alpha-tubulin and beta-tubulin dimer model, are constructed to fit the cryo-EM data. The least square fitting leads to similarly high correlation coefficients, which indicates that structure determination via optimization is an ill-posed inverse problem. However, these models have dramatically different topological fingerprints. Especially, linkages or connectivities that discriminate one model from another, play little role in the traditional density fitting or optimization but are very sensitive and crucial to topological fingerprints. The intrinsic topological features of the microtubule data are identified after topological denoising. By a comparison of the topological fingerprints of the original data and those of three models, we found that the third model is topologically favored. The present work offers persistent homology based new strategies for topological denoising and for resolving ill-posed inverse problems. Copyright © 2015 John Wiley & Sons, Ltd.

A Topological Framework for Deep Learning (2020)

Mustafa Hajij, Kyle Istvan

Abstract

We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective.

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny

Abstract

While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.

A Novel Method of Extracting Topological Features From Word Embeddings (2020)

Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny

Abstract

In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.

Multidimensional Persistence in Biomolecular Data (2015)

Kelin Xia, Guo-Wei Wei

Abstract

Persistent homology has emerged as a popular technique for the topological simplification of big data, including biomolecular data. Multidimensional persistence bears considerable promise to bridge the gap between geometry and topology. However, its practical and robust construction has been a challenge. We introduce two families of multidimensional persistence, namely pseudo-multidimensional persistence and multiscale multidimensional persistence. The former is generated via the repeated applications of persistent homology filtration to high dimensional data, such as results from molecular dynamics or partial differential equations. The latter is constructed via isotropic and anisotropic scales that create new simiplicial complexes and associated topological spaces. The utility, robustness and efficiency of the proposed topological methods are demonstrated via protein folding, protein flexibility analysis, the topological denoising of cryo-electron microscopy data, and the scale dependence of nano particles. Topological transition between partial folded and unfolded proteins has been observed in multidimensional persistence. The separation between noise topological signatures and molecular topological fingerprints is achieved by the Laplace-Beltrami flow. The multiscale multidimensional persistent homology reveals relative local features in Betti-0 invariants and the relatively global characteristics of Betti-1 and Betti-2 invariants.

Finite Topology as Applied to Image Analysis (1989)

V. A Kovalevsky

Abstract

The notion of a cellular complex which is well known in the topology is applied to describe the structure of images. It is shown that the topology of cellular complexes is the only possible topology of finite sets. Under this topology no contradictions or paradoxes arise when defining connected subsets and their boundaries. Ways of encoding images as cellular complexes are discussed. The process of image segmentation is considered as splitting (in the topological sense) a cellular complex into blocks of cells. The notion of a cell list is introduced as a precise and compact data structure for encoding segmented images. Some applications of this data structure to the image analysis are demonstrated.

Topology-Aware Segmentation Using Discrete Morse Theory (2021)

Xiaoling Hu, Yusu Wang, Li Fuxin, Dimitris Samaras, Chao Chen

Abstract

In the segmentation of fine-scale structures from natural and biomedical images, per-pixel accuracy is not the only metric of concern. Topological correctness, such as vessel connectivity and membrane closure, is crucial for downstream analysis tasks. In this paper, we propose a new approach to train deep image segmentation networks for better topological accuracy. In particular, leveraging the power of discrete Morse theory (DMT), we identify global structures, including 1D skeletons and 2D patches, which are important for topological accuracy. Trained with a novel loss based on these global structures, the network performance is significantly improved especially near topologically challenging locations (such as weak spots of connections and membranes). On diverse datasets, our method achieves superior performance on both the DICE score and topological metrics.

Topological Data Analysis of Biological Aggregation Models (2015)

Chad M. Topaz, Lori Ziegelmeier, Tom Halverson

Abstract

We apply tools from topological data analysis to two mathematical models inspired by biological aggregations such as bird flocks, fish schools, and insect swarms. Our data consists of numerical simulation output from the models of Vicsek and D'Orsogna. These models are dynamical systems describing the movement of agents who interact via alignment, attraction, and/or repulsion. Each simulation time frame is a point cloud in position-velocity space. We analyze the topological structure of these point clouds, interpreting the persistent homology by calculating the first few Betti numbers. These Betti numbers count connected components, topological circles, and trapped volumes present in the data. To interpret our results, we introduce a visualization that displays Betti numbers over simulation time and topological persistence scale. We compare our topological results to order parameters typically used to quantify the global behavior of aggregations, such as polarization and angular momentum. The topological calculations reveal events and structure not captured by the order parameters.

Topological Regularization for Dense Prediction (2021)

Deqing Fu, Bradley J. Nelson

Abstract

Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with these topological descriptions. Experimental results show that the output topology can also appear in the internal activations of trained neural networks which allows for a novel use of topological regularization to the internal states of neural networks during training, reducing the computational cost of the regularization. We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures.

TopoGAN: A Topology-Aware Generative Adversarial Network (2020)

Fan Wang, Huidong Liu, Dimitris Samaras, Chao Chen

Abstract

Existing generative adversarial networks (GANs) focus on generating realistic images based on CNN-derived image features, but fail to preserve the structural properties of real images. This can be fatal in applications where the underlying structure (e.g.., neurons, vessels, membranes, and road networks) of the image carries crucial semantic meaning. In this paper, we propose a novel GAN model that learns the topology of real images, i.e., connectedness and loopy-ness. In particular, we introduce a new loss that bridges the gap between synthetic image distribution and real image distribution in the topological feature space. By optimizing this loss, the generator produces images with the same structural topology as real images. We also propose new GAN evaluation metrics that measure the topological realism of the synthetic images. We show in experiments that our method generates synthetic images with realistic topology. We also highlight the increased performance that our method brings to downstream tasks such as segmentation.

Community Resources

Video

Tenfold Topology of Crystals (2020)

Eyal Cornfeld, Shachar Carmeli

Abstract

The celebrated tenfold-way of Altland-Zirnbauer symmetry classes discern any quantum system by its pattern of non-spatial symmetries. It lays at the core of the periodic table of topological insulators and superconductors which provided a complete classification of weakly-interacting electrons' non-crystalline topological phases for all symmetry classes. Over recent years, a plethora of topological phenomena with diverse surface states has been discovered in crystalline materials. In this paper, we obtain an exhaustive classification of topologically distinct groundstates as well as topological phases with anomalous surface states of crystalline topological insulators and superconductors for key space-groups, layer-groups, and rod-groups. This is done in a unified manner for the full tenfold-way of Altland-Zirnbauer non-spatial symmetry classes. We establish a comprehensive paradigm that harnesses the modern mathematical framework of equivariant spectra; it allows us to obtain results applicable to generic topological classification problems. In particular, this paradigm provides efficient computational tools that enable an inherently unified treatment of the full tenfold-way.

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

Hongxiao Wang, Gang Huang, Zhuo Zhao, Liang Cheng, Anna Juncker-Jensen, Máté Levente Nagy, Xin Lu, Xiangliang Zhang, Danny Z. Chen

Abstract

Pathology images contain rich information of cell appearance, microenvironment, and topology features for cancer analysis and diagnosis. Among such features, topology becomes increasingly important in analysis for cancer immunotherapy. By analyzing geometric and hierarchically structured cell distribution topology, oncologists can identify densely-packed and cancer-relevant cell communities (CCs) for making decisions. Compared to commonly-used pixel-level Convolution Neural Network (CNN) features and cell-instance-level Graph Neural Network (GNN) features, CC topology features are at a higher level of granularity and geometry. However, topological features have not been well exploited by recent deep learning (DL) methods for pathology image classification due to lack of effective topological descriptors for cell distribution and gathering patterns. In this paper, inspired by clinical practice, we analyze and classify pathology images by comprehensively learning cell appearance, microenvironment, and topology in a fine-to-coarse manner. To describe and exploit topology, we design Cell Community Forest (CCF), a novel graph that represents the hierarchical formulation process of big-sparse CCs from small-dense CCs. Using CCF as a new geometric topological descriptor of tumor cells in pathology images, we propose CCF-GNN, a GNN model that successively aggregates heterogeneous features (e.g., appearance, microenvironment) from cell-instance-level, cell-community-level, into image-level for pathology image classification. Extensive cross-validation experiments show that our method significantly outperforms alternative methods on H&E-stained; immunofluorescence images for disease grading tasks with multiple cancer types. Our proposed CCF-GNN establishes a new topological data analysis (TDA) based method, which facilitates integrating multi-level heterogeneous features of point clouds (e.g., for cells) into a unified DL framework.

Topological Portraits of Multiscale Coordination Dynamics (2020)

Mengsen Zhang, William D. Kalies, J. A. Scott Kelso, Emmanuelle Tognoli

Abstract

Living systems exhibit complex yet organized behavior on multiple spatiotemporal scales. To investigate the nature of multiscale coordination in living systems, one needs a meaningful and systematic way to quantify the complex dynamics, a challenge in both theoretical and empirical realms. The present work shows how integrating approaches from computational algebraic topology and dynamical systems may help us meet this challenge. In particular, we focus on the application of multiscale topological analysis to coordinated rhythmic processes. First, theoretical arguments are introduced as to why certain topological features and their scale-dependency are highly relevant to understanding complex collective dynamics. Second, we propose a method to capture such dynamically relevant topological information using persistent homology, which allows us to effectively construct a multiscale topological portrait of rhythmic coordination. Finally, the method is put to test in detecting transitions in real data from an experiment of rhythmic coordination in ensembles of interacting humans. The recurrence plots of topological portraits highlight collective transitions in coordination patterns that were elusive to more traditional methods. This sensitivity to collective transitions would be lost if the behavioral dynamics of individuals were treated as separate degrees of freedom instead of constituents of the topology that they collectively forge. Such multiscale topological portraits highlight collective aspects of coordination patterns that are irreducible to properties of individual parts. The present work demonstrates how the analysis of multiscale coordination dynamics can benefit from topological methods, thereby paving the way for further systematic quantification of complex, high-dimensional dynamics in living systems.

Evolutionary Homology on Coupled Dynamical Systems With Applications to Protein Flexibility Analysis (2020)

Zixuan Cang, Elizabeth Munch, Guo-Wei Wei

Abstract

While the spatial topological persistence is naturally constructed from a radius-based ﬁltration, it has hardly been derived from a temporal ﬁltration. Most topological models are designed for the global topology of a given object as a whole. There is no method reported in the literature for the topology of an individual component in an object to the best of our knowledge. For many problems in science and engineering, the topology of an individual component is important for describing its properties. We propose evolutionary homology (EH) constructed via a time evolution-based ﬁltration and topological persistence. Our approach couples a set of dynamical systems or chaotic oscillators by the interactions of a physical system, such as a macromolecule. The interactions are approximated by weighted graph Laplacians. Simplices, simplicial complexes, algebraic groups and topological persistence are deﬁned on the coupled trajectories of the chaotic oscillators. The resulting EH gives rise to time-dependent topological invariants or evolutionary barcodes for an individual component of the physical system, revealing its topology-function relationship. In conjunction with Wasserstein metrics, the proposed EH is applied to protein ﬂexibility analysis, an important problem in computational biophysics. Numerical results for the B-factor prediction of a benchmark set of 364 proteins indicate that the proposed EH outperforms all the other state-of-the-art methods in the ﬁeld.

Bayesian Computation Meets Topology (2024)

Julius von Rohrscheidt, Bastian Rieck, Sebastian M. Schmon

Abstract

Computational topology recently started to emerge as a novel paradigm for characterising the ‘shape’ of high-dimensional data, leading to powerful algorithms in (un)supervised representation learning. While capable of capturing prominent features at multiple scales, topological methods cannot readily be used for Bayesian inference. We develop a novel approach that bridges this gap, making it possible to perform parameter estimation in a Bayesian framework, using topology-based loss functions. Our method affords easy integration into topological machine learning algorithms. We demonstrate its efficacy for parameter estimation in different simulation settings.

From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives (2020)

Lida Kanari, Adélie Garin, Kathryn Hess

Abstract

Methods of topological data analysis have been successfully applied in a wide range of fields to provide useful summaries of the structure of complex data sets in terms of topological descriptors, such as persistence diagrams. While there are many powerful techniques for computing topological descriptors, the inverse problem, i.e., recovering the input data from topological descriptors, has proved to be challenging. In this article we study in detail the Topological Morphology Descriptor (TMD), which assigns a persistence diagram to any tree embedded in Euclidean space, and a sort of stochastic inverse to the TMD, the Topological Neuron Synthesis (TNS) algorithm, gaining both theoretical and computational insights into the relation between the two. We propose a new approach to classify barcodes using symmetric groups, which provides a concrete language to formulate our results. We investigate to what extent the TNS recovers a geometric tree from its TMD and describe the effect of different types of noise on the process of tree generation from persistence diagrams. We prove moreover that the TNS algorithm is stable with respect to specific types of noise.

Topological Data Analysis for Aviation Applications (2019)

Max Z. Li, Megan S. Ryerson, Hamsa Balakrishnan

Abstract

Aviation data sets are increasingly high-dimensional and sparse. Consequently, the underlying features and interactions are not easily uncovered by traditional data analysis methods. Recent advancements in applied mathematics introduce topological methods, offering a new approach to obtain these features. This paper applies the fundamental notions underlying topological data analysis and persistent homology (TDA/PH) to aviation data analytics. We review past aviation research that leverage topological methods, and present a new computational case study exploring the topology of airport surface connectivity. In each case, we connect abstract topological features with real-world processes in aviation, and highlight potential operational and managerial insights.

Topological Data Analysis for Genomics and Evolution: Topology in Biology (2019)

Raul Rabadan, Andrew J. Blumberg

Abstract

Biology has entered the age of Big Data. A technical revolution has transformed the field, and extracting meaningful information from large biological data sets is now a central methodological challenge. Algebraic topology is a well-established branch of pure mathematics that studies qualitative descriptors of the shape of geometric objects. It aims to reduce comparisons of shape to a comparison of algebraic invariants, such as numbers, which are typically easier to work with. Topological data analysis is a rapidly developing subfield that leverages the tools of algebraic topology to provide robust multiscale analysis of data sets. This book introduces the central ideas and techniques of topological data analysis and its specific applications to biology, including the evolution of viruses, bacteria and humans, genomics of cancer, and single cell characterization of developmental processes. Bridging two disciplines, the book is for researchers and graduate students in genomics and evolutionary biology as well as mathematicians interested in applied topology.

Topological Machine Learning for Mixed Numeric and Categorical Data (2020)

Chengyuan Wu, Carol Anne Hargreaves

Abstract

Topological data analysis is a relatively new branch of machine learning that excels in studying high dimensional data, and is theoretically known to be robust against noise. Meanwhile, data objects with mixed numeric and categorical attributes are ubiquitous in real-world applications. However, topological methods are usually applied to point cloud data, and to the best of our knowledge there is no available framework for the classification of mixed data using topological methods. In this paper, we propose a novel topological machine learning method for mixed data classification. In the proposed method, we use theory from topological data analysis such as persistent homology, persistence diagrams and Wasserstein distance to study mixed data. The performance of the proposed method is demonstrated by experiments on a real-world heart disease dataset. Experimental results show that our topological method outperforms several state-of-the-art algorithms in the prediction of heart disease.

Topological Detection of Alzheimer’s Disease Using Betti Curves (2021)

Ameer Saadat-Yazdi, Rayna Andreeva, Rik Sarkar

Abstract

Alzheimer’s disease is a debilitating disease in the elderly, and is an increasing burden to the society due to an aging population. In this paper, we apply topological data analysis to structural MRI scans of the brain, and show that topological invariants make accurate predictors for Alzheimer’s. Using the construct of Betti Curves, we first show that topology is a good predictor of Age. Then we develop an approach to factor out the topological signature of age from Betti curves, and thus obtain accurate detection of Alzheimer’s disease. Experimental results show that topological features used with standard classifiers perform comparably to recently developed convolutional neural networks. These results imply that topology is a major aspect of structural changes due to aging and Alzheimer’s. We expect this relation will generate further insights for both early detection and better understanding of the disease.

Community Resources

Data

Atom-Specific Persistent Homology and Its Application to Protein Flexibility Analysis (2020)

David Bramer, Guo-Wei Wei

Abstract

Recently, persistent homology has had tremendous success in biomolecular data analysis. It works by examining the topological relationship or connectivity of a group of atoms in a molecule at a variety of scales, then rendering a family of topological representations of the molecule. However, persistent homology is rarely employed for the analysis of atomic properties, such as biomolecular flexibility analysis or B-factor prediction. This work introduces atom-specific persistent homology to provide a local atomic level representation of a molecule via a global topological tool. This is achieved through the construction of a pair of conjugated sets of atoms and corresponding conjugated simplicial complexes, as well as conjugated topological spaces. The difference between the topological invariants of the pair of conjugated sets is measured by Bottleneck and Wasserstein metrics and leads to an atom-specific topological representation of individual atomic properties in a molecule. Atom-specific topological features are integrated with various machine learning algorithms, including gradient boosting trees and convolutional neural network for protein thermal fluctuation analysis and B-factor prediction. Extensive numerical results indicate the proposed method provides a powerful topological tool for analyzing and predicting localized information in complex macromolecules.

Reviews: Topological Distances and Losses for Brain Networks (2021)

Moo K. Chung, Alexander Smith, Gary Shiu

Abstract

Almost all statistical and machine learning methods in analyzing brain networks rely on distances and loss functions, which are mostly Euclidean or matrix norms. The Euclidean or matrix distances may fail to capture underlying subtle topological differences in brain networks. Further, Euclidean distances are sensitive to outliers. A few extreme edge weights may severely affect the distance. Thus it is necessary to use distances and loss functions that recognize topology of data. In this review paper, we survey various topological distance and loss functions from topological data analysis (TDA) and persistent homology that can be used in brain network analysis more effectively. Although there are many recent brain imaging studies that are based on TDA methods, possibly due to the lack of method awareness, TDA has not taken as the mainstream tool in brain imaging field yet. The main purpose of this paper is provide the relevant technical survey of these powerful tools that are immediately applicable to brain network data.

The Importance of the Whole: Topological Data Analysis for the Network Neuroscientist (2019)

Ann E. Sizemore, Jennifer E. Phillips-Cremins, Robert Ghrist, Danielle S. Bassett

Abstract

Data analysis techniques from network science have fundamentally improved our understanding of neural systems and the complex behaviors that they support. Yet the restriction of network techniques to the study of pairwise interactions prevents us from taking into account intrinsic topological features such as cavities that may be crucial for system function. To detect and quantify these topological features, we must turn to algebro-topological methods that encode data as a simplicial complex built from sets of interacting nodes called simplices. We then use the relations between simplices to expose cavities within the complex, thereby summarizing its topological features. Here we provide an introduction to persistent homology, a fundamental method from applied topology that builds a global descriptor of system structure by chronicling the evolution of cavities as we move through a combinatorial object such as a weighted network. We detail the mathematics and perform demonstrative calculations on the mouse structural connectome, synapses in C. elegans, and genomic interaction data. Finally, we suggest avenues for future work and highlight new advances in mathematics ready for use in neural systems., For the network neuroscientist, this exposition aims to communicate both the mathematics and the advantages of using tools from applied topology for the study of neural systems. Using data from the mouse connectome, electrical and chemical synapses in C. elegans, and chromatin interaction data, we offer example computations and applications to further demonstrate the power of topological data analysis in neuroscience. Finally, we expose the reader to novel developments in applied topology and relate these developments to current questions and methodological difficulties in network neuroscience.

Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

Zixuan Cang, Lin Mu, Guo-Wei Wei

Abstract

This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.

Topological Electronic Structure and Weyl Points in Nonsymmorphic Hexagonal Materials (2020)

Rafael González-Hernández, Erick Tuiran, Bernardo Uribe

Abstract

Using topological band theory analysis we show that the nonsymmorphic symmetry operations in hexagonal lattices enforce Weyl points at the screw-invariant high-symmetry lines of the band structure. The corepresentation theory and connectivity group theory show that Weyl points are generated by band crossings in accordion-like and hourglass-like dispersion relations. These Weyl points are stable against weak perturbations and are protected by the screw rotation symmetry. Based on first-principles calculations we found a complete agreement between the topological predicted energy dispersion relations and real hexagonal materials. Topological charge (chirality) and Berry curvature calculations show the simultaneous formation of Weyl points and nodal-lines in 4d transition-metal trifluorides such as AgF3 and AuF3. Furthermore, a large intrinsic spin-Hall conductivity was found due to the combined strong spin-orbit coupling and multiple Weyl-point crossings in the electronic structure. These materials could be used to the spin/charge conversion in more energy-efficient spintronic devices.

Topological Edge Modes by Smart Patterning (2018)

David J. Apigo, Kai Qian, Camelia Prodan, Emil Prodan

Abstract

We study identical coupled mechanical resonators whose collective dynamics are fully determined by the patterns in which they are arranged. In this work, we call a system topological if (1) boundary resonant modes fully fill all existing spectral gaps whenever the system is halved, and (2) if the boundary spectrum cannot be removed or gapped by any boundary condition. We demonstrate that such topological characteristics can be induced solely through patterning, in a manner entirely independent of the structure of the resonators and the details of the couplings. The existence of such patterns is proven using K theory and exemplified using an experimental platform based on magnetically coupled spinners. Topological metamaterials built on these principles can be easily engineered at any scale, providing a practical platform for applications and devices.

Localization in the Crowd With Topological Constraints (2020)

Shahira Abousamra, Minh Hoai, Dimitris Samaras, Chao Chen

Abstract

We address the problem of crowd localization, i.e., the prediction of dots corresponding to people in a crowded scene. Due to various challenges, a localization method is prone to spatial semantic errors, i.e., predicting multiple dots within a same person or collapsing multiple dots in a cluttered region. We propose a topological approach targeting these semantic errors. We introduce a topological constraint that teaches the model to reason about the spatial arrangement of dots. To enforce this constraint, we define a persistence loss based on the theory of persistent homology. The loss compares the topographic landscape of the likelihood map and the topology of the ground truth. Topological reasoning improves the quality of the localization algorithm especially near cluttered regions. On multiple public benchmarks, our method outperforms previous localization methods. Additionally, we demonstrate the potential of our method in improving the performance in the crowd counting task.

Topological Autoencoders (2020)

Michael Moor, Max Horn, Bastian Rieck, Karsten Borgwardt

Abstract

We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors.

Multivariate Data Analysis Using Persistence-Based Filtering and Topological Signatures (2012)

B. Rieck, H. Mara, H. Leitte

Abstract

The extraction of significant structures in arbitrary high-dimensional data sets is a challenging task. Moreover, classifying data points as noise in order to reduce a data set bears special relevance for many application domains. Standard methods such as clustering serve to reduce problem complexity by providing the user with classes of similar entities. However, they usually do not highlight relations between different entities and require a stopping criterion, e.g. the number of clusters to be detected. In this paper, we present a visualization pipeline based on recent advancements in algebraic topology. More precisely, we employ methods from persistent homology that enable topological data analysis on high-dimensional data sets. Our pipeline inherently copes with noisy data and data sets of arbitrary dimensions. It extracts central structures of a data set in a hierarchical manner by using a persistence-based filtering algorithm that is theoretically well-founded. We furthermore introduce persistence rings, a novel visualization technique for a class of topological features-the persistence intervals-of large data sets. Persistence rings provide a unique topological signature of a data set, which helps in recognizing similarities. In addition, we provide interactive visualization techniques that assist the user in evaluating the parameter space of our method in order to extract relevant structures. We describe and evaluate our analysis pipeline by means of two very distinct classes of data sets: First, a class of synthetic data sets containing topological objects is employed to highlight the interaction capabilities of our method. Second, in order to affirm the utility of our technique, we analyse a class of high-dimensional real-world data sets arising from current research in cultural heritage.

Persistent Homology Analysis of Protein Structure, Flexibility, and Folding (2014)

Kelin Xia, Guo-Wei Wei

Abstract

SUMMARYProteins are the most important biomolecules for living organisms. The understanding of protein structure, function, dynamics, and transport is one of the most challenging tasks in biological science. In the present work, persistent homology is, for the first time, introduced for extracting molecular topological fingerprints (MTFs) based on the persistence of molecular topological invariants. MTFs are utilized for protein characterization, identification, and classification. The method of slicing is proposed to track the geometric origin of protein topological invariants. Both all-atom and coarse-grained representations of MTFs are constructed. A new cutoff-like filtration is proposed to shed light on the optimal cutoff distance in elastic network models. On the basis of the correlation between protein compactness, rigidity, and connectivity, we propose an accumulated bar length generated from persistent topological invariants for the quantitative modeling of protein flexibility. To this end, a correlation matrix-based filtration is developed. This approach gives rise to an accurate prediction of the optimal characteristic distance used in protein B-factor analysis. Finally, MTFs are employed to characterize protein topological evolution during protein folding and quantitatively predict the protein folding stability. An excellent consistence between our persistent homology prediction and molecular dynamics simulation is found. This work reveals the topology–function relationship of proteins. Copyright © 2014 John Wiley & Sons, Ltd.

Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)

Kelin Xia, Zhixiong Zhao, Guo-Wei Wei

Abstract

Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

Topological Electronic Structure and Weyl Points in Nonsymmorphic Hexagonal Materials (2020)

Rafael González-Hernández, Erick Tuiran, Bernardo Uribe

Abstract

Using topological band theory analysis we show that the nonsymmorphic symmetry operations in hexagonal lattices enforce Weyl points at the screw-invariant high-symmetry lines of the band structure. The corepresentation theory and connectivity group theory show that Weyl points are generated by band crossings in accordion-like and hourglass-like dispersion relations. These Weyl points are stable against weak perturbations and are protected by the screw rotation symmetry. Based on first-principles calculations we found a complete agreement between the topological predicted energy dispersion relations and real hexagonal materials. Topological charge (chirality) and Berry curvature calculations show the simultaneous formation of Weyl points and nodal-lines in 4d transition-metal trifluorides such as AgF3 and AuF3. Furthermore, a large intrinsic spin-Hall conductivity was found due to the combined strong spin-orbit coupling and multiple Weyl-point crossings in the electronic structure. These materials could be used to the spin/charge conversion in more energy-efficient spintronic devices.

Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)

Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny

Abstract

Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.

Topological Early Warning Signals: Quantifying Varying Routes to Extinction in a Spatially Distributed Population Model (2022)

Laura S. Storch, Sarah L. Day

Abstract

Understanding and predicting critical transitions in spatially explicit ecological systems is particularly challenging due to their complex spatial and temporal dynamics and high dimensionality. Here, we explore changes in population distribution patterns during a critical transition (an extinction event) using computational topology. Computational topology allows us to quantify certain features of a population distribution pattern, such as the level of fragmentation. We create population distribution patterns via a simple coupled patch model with Ricker map growth and nearest neighbors dispersal on a two dimensional lattice. We observe two dominant paths to extinction within the explored parameter space that depend critically on the dispersal rate d and the rate of parameter drift, Δϵ. These paths to extinction are easily topologically distinguishable, so categorization can be automated. We use this population model as a theoretical proof-of-concept for the methodology, and argue that computational topology is a powerful tool for analyzing dynamical changes in systems with noisy data that are coarsely resolved in space and/or time. In addition, computational topology can provide early warning signals for chaotic dynamical systems where traditional statistical early warning signals would fail. For these reasons, we envision this work as a helpful addition to the critical transitions prediction toolbox.

Analyzing Collective Motion With Machine Learning and Topology (2019)

Dhananjay Bhaskar, Angelika Manhart, Jesse Milzman, John T. Nardini, Kathleen M. Storey, Chad M. Topaz, Lori Ziegelmeier

Abstract

We use topological data analysis and machine learning to study a seminal model of collective motion in biology [M. R. D’Orsogna et al., Phys. Rev. Lett. 96, 104302 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based on topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.

Classification of Histopathology Slides With Persistence Homology Convolutions (2025)

Shrunal Pothagoni, Benjamin Schweinhart

Abstract

Convolutional neural networks (CNNs) are a standard tool for computer vision tasks such as image classification. However, typical model architectures may result in the loss of topological information. In specific domains such as histopathology, topology is an important descriptor that can be used to distinguish between disease-indicating tissue by analyzing the shape characteristics of cells. Current literature suggests that reintroducing topological information using persistent homology can improve medical diagnostics; however, previous methods utilize global topological summaries which do not contain information about the locality of topological features. To address this gap, we present a novel method that generates local persistent homology-based data using a modified version of the convolution operator called Persistent Homology Convolutions. This method captures information about the locality and translation invariance of topological features. We perform a comparative study using various representations of histopathology slides and find that models trained with persistent homology convolutions outperform conventionally trained models and are less sensitive to hyperparameters. These results indicate that persistent homology convolutions extract meaningful geometric information from the histopathology slides.

Community Resources

Data

Classification of Histopathology Slides With Persistence Homology Convolutions (2025)

Shrunal Pothagoni, Benjamin Schweinhart

Abstract

Convolutional neural networks (CNNs) are a standard tool for computer vision tasks such as image classification. However, typical model architectures may result in the loss of topological information. In specific domains such as histopathology, topology is an important descriptor that can be used to distinguish between disease-indicating tissue by analyzing the shape characteristics of cells. Current literature suggests that reintroducing topological information using persistent homology can improve medical diagnostics; however, previous methods utilize global topological summaries which do not contain information about the locality of topological features. To address this gap, we present a novel method that generates local persistent homology-based data using a modified version of the convolution operator called Persistent Homology Convolutions. This method captures information about the locality and translation invariance of topological features. We perform a comparative study using various representations of histopathology slides and find that models trained with persistent homology convolutions outperform conventionally trained models and are less sensitive to hyperparameters. These results indicate that persistent homology convolutions extract meaningful geometric information from the histopathology slides.

Community Resources

Data

The Growing Topology of the C. Elegans Connectome (2020)

Alec Helm, Ann S. Blevins, Danielle S. Bassett

Abstract

Probing the developing neural circuitry in Caenorhabditis elegans has enhanced our understanding of nervous systems. The C. elegans connectome, like those of other species, is characterized by a rich club of densely connected neurons embedded within a small-world architecture. This organization of neuronal connections, captured by quantitative network statistics, provides insight into the system's capacity to perform integrative computations. Yet these network measures are limited in their ability to detect weakly connected motifs, such as topological cavities, that may support the systems capacity to perform segregated computations. We address this limitation by using persistent homology to track the evolution of topological cavities in the growing C. elegans connectome throughout neural development, and assess the degree to which the growing connectomes topology is resistant to biological noise. We show that the developing connectome topology is both relatively robust to changes in neuron birth times and not captured by similar growth models. Additionally, we quantify the consequence of a neurons specific birth time and ask if this metric tracks other biological properties of neurons. Our results suggest that the connectomes growing topology is a robust feature of the developing connectome that is distinct from other network properties, and that the growing topology is particularly sensitive to the exact birth times of a small set of predominantly motor neurons. By utilizing novel measurements that track biological features, we anticipate that our study will be helpful in the construction of more accurate models of neuronal development in C. elegans

Time-Inhomogeneous Diffusion Geometry and Topology (2022)

Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy

Abstract

Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.

Topology Identifies Emerging Adaptive Mutations in SARS-CoV-2 (2021)

Michael Bleher, Lukas Hahn, Juan Angel Patino-Galindo, Mathieu Carriere, Ulrich Bauer, Raul Rabadan, Andreas Ott

Abstract

The COVID-19 pandemic has lead to a worldwide effort to characterize its evolution through the mapping of mutations in the genome of the coronavirus SARS-CoV-2. Ideally, one would like to quickly identify new mutations that could confer adaptive advantages (e.g. higher infectivity or immune evasion) by leveraging the large number of genomes. One way of identifying adaptive mutations is by looking at convergent mutations, mutations in the same genomic position that occur independently. However, the large number of currently available genomes precludes the efficient use of phylogeny-based techniques. Here, we establish a fast and scalable Topological Data Analysis approach for the early warning and surveillance of emerging adaptive mutations based on persistent homology. It identifies convergent events merely by their topological footprint and thus overcomes limitations of current phylogenetic inference techniques. This allows for an unbiased and rapid analysis of large viral datasets. We introduce a new topological measure for convergent evolution and apply it to the GISAID dataset as of February 2021, comprising 303,651 high-quality SARS-CoV-2 isolates collected since the beginning of the pandemic. We find that topologically salient mutations on the receptor-binding domain appear in several variants of concern and are linked with an increase in infectivity and immune escape, and for many adaptive mutations the topological signal precedes an increase in prevalence. We show that our method effectively identifies emerging adaptive mutations at an early stage. By localizing topological signals in the dataset, we extract geo-temporal information about the early occurrence of emerging adaptive mutations. The identification of these mutations can help to develop an alert system to monitor mutations of concern and guide experimentalists to focus the study of specific circulating variants.

Community Resources

Data

Capturing Dynamics of Time-Varying Data via Topology (2020)

Lu Xian, Henry Adams, Chad M. Topaz, Lori Ziegelmeier

Abstract

One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of time-varying metric spaces including vineyards [17], crocker plots [52], and multiparameter rank functions [34]. We then introduce a new tool to summarize time-varying metric spaces: a crocker stack. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable stability property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [54]. Altogether, we aim to bring the broader applied mathematics community up-to-date on topological summaries of time-varying metric spaces.

Capturing Shape Information With Multi-Scale Topological Loss Terms For 3D Reconstruction (2022)

Dominik J. E. Waibel, Scott Atwell, Matthias Meier, Carsten Marr, Bastian Rieck

Abstract

Reconstructing 3D objects from 2D images is both challenging for our brains and machine learning algorithms. To support this spatial reasoning task, contextual information about the overall shape of an object is critical. However, such information is not captured by established loss terms (e.g. Dice loss). We propose to complement geometrical shape information by including multi-scale topological features, such as connected components, cycles, and voids, in the reconstruction loss. Our method uses cubical complexes to calculate topological features of 3D volume data and employs an optimal transport distance to guide the reconstruction process. This topology-aware loss is fully differentiable, computationally efficient, and can be added to any neural network. We demonstrate the utility of our loss by incorporating it into SHAPR, a model for predicting the 3D cell shape of individual cells based on 2D microscopy images. Using a hybrid loss that leverages both geometrical and topological information of single objects to assess their shape, we find that topological information substantially improves the quality of reconstructions, thus highlighting its ability to extract more relevant features from image datasets.

Histopathological Cancer Detection With Topological Signatures (2023)

Ankur Yadav, Faisal Ahmed, Ovidiu Daescu, Reyhan Gedik, Baris Coskunuzer

Abstract

We present a transformative approach to histopathological cancer detection and grading by introducing a very powerful feature extraction method based on the latest topological data analysis tools. By analyzing the evolution of topological patterns in different color channels, we discovered that every tumor class leaves its own topological footprint in histopathological images, allowing to extract feature vectors that can be used to reliably identify tumor classes.Our topological signatures, even when combined with traditional machine learning methods, provide very fast and highly accurate results in various settings. While most DL models work well for one type of cancer, our model easily adapts to different scenarios, and consistently gives highly competitive results with the state-of-the-art models on benchmark datasets across multiple cancer types including bone, colon, breast, cervical (cytopathology), and prostate cancer. Unlike most DL models, our proposed Topo-ML model does not need any data augmentation or pre-processing steps and works perfectly on small datasets. The model is computationally very efficient, with end-to-end processing taking only a few hours for datasets consisting of thousands of images.

Community Resources

Code

Optimal Topological Cycles and Their Application in Cardiac Trabeculae Restoration (2017)

Pengxiang Wu, Chao Chen, Yusu Wang, Shaoting Zhang, Changhe Yuan, Zhen Qian, Dimitris Metaxas, Leon Axel

Abstract

In cardiac image analysis, it is important yet challenging to reconstruct the trabeculae, namely, fine muscle columns whose ends are attached to the ventricular walls. To extract these fine structures, traditional image segmentation methods are insufficient. In this paper, we propose a novel method to jointly detect salient topological handles and compute the optimal representations of them. The detected handles are considered hypothetical trabeculae structures. They are further screened using a classifier and are then included in the final segmentation. We show in experiments the significance of our contribution compared with previous standard segmentation methods without topological priors, as well as with previous topological method in which non-optimal representations of topological handles are used.

Measuring Hidden Phenotype: Quantifying the Shape of Barley Seeds Using the Euler Characteristic Transform (2021)

Erik J. Amézquita, Michelle Y. Quigley, Tim Ophelders, Jacob B. Landis, Daniel Koenig, Elizabeth Munch, Daniel H. Chitwood

Abstract

Shape plays a fundamental role in biology. Traditional phenotypic analysis methods measure some features but fail to measure the information embedded in shape comprehensively. To extract, compare, and analyze this information embedded in a robust and concise way, we turn to Topological Data Analysis (TDA), specifically the Euler Characteristic Transform. TDA measures shape comprehensively using mathematical representations based on algebraic topology features. To study its use, we compute both traditional and topological shape descriptors to quantify the morphology of 3121 barley seeds scanned with X-ray Computed Tomography (CT) technology at 127 micron resolution. The Euler Characteristic Transform measures shape by analyzing topological features of an object at thresholds across a number of directional axes. A Kruskal-Wallis analysis of the information encoded by the topological signature reveals that the Euler Characteristic Transform picks up successfully the shape of the crease and bottom of the seeds. Moreover, while traditional shape descriptors can cluster the seeds based on their accession, topological shape descriptors can cluster them further based on their panicle. We then successfully train a support vector machine (SVM) to classify 28 different accessions of barley based exclusively on the shape of their grains. We observe that combining both traditional and topological descriptors classifies barley seeds better than using just traditional descriptors alone. This improvement suggests that TDA is thus a powerful complement to traditional morphometrics to comprehensively describe a multitude of “hidden” shape nuances which are otherwise not detected.

Feasibility of Topological Data Analysis for Event-Related fMRI (2019)

Cameron T. Ellis, Michael Lesnick, Gregory Henselman-Petrusek, Bryn Keller, Jonathan D. Cohen

Abstract

Recent fMRI research shows that perceptual and cognitive representations are instantiated in high-dimensional multivoxel patterns in the brain. However, the methods for detecting these representations are limited. Topological data analysis (TDA) is a new approach, based on the mathematical field of topology, that can detect unique types of geometric features in patterns of data. Several recent studies have successfully applied TDA to study various forms of neural data; however, to our knowledge, TDA has not been successfully applied to data from event-related fMRI designs. Event-related fMRI is very common but limited in terms of the number of events that can be run within a practical time frame and the effect size that can be expected. Here, we investigate whether persistent homology—a popular TDA tool that identifies topological features in data and quantifies their robustness—can identify known signals given these constraints. We use fmrisim, a Python-based simulator of realistic fMRI data, to assess the plausibility of recovering a simple topological representation under a variety of conditions. Our results suggest that persistent homology can be used under certain circumstances to recover topological structure embedded in realistic fMRI data simulations.How do we represent the world? In cognitive neuroscience it is typical to think representations are points in high-dimensional space. In order to study these kinds of spaces it is necessary to have tools that capture the organization of high-dimensional data. Topological data analysis (TDA) holds promise for detecting unique types of geometric features in patterns of data. Although potentially useful, TDA has not been applied to event-related fMRI data. Here we utilized a popular tool from TDA, persistent homology, to recover topological signals from event-related fMRI data. We simulated realistic fMRI data and explored the parameters under which persistent homology can successfully extract signal. We also provided extensive code and recommendations for how to make the most out of TDA for fMRI analysis.

A Topological Measurement of Protein Compressibility (2015)

Marcio Gameiro, Yasuaki Hiraoka, Shunsuke Izumi, Miroslav Kramar, Konstantin Mischaikow, Vidit Nanda

Abstract

In this paper we partially clarify the relation between the compressibility of a protein and its molecular geometric structure. To identify and understand the relevant topological features within a given protein, we model its molecule as an alpha filtration and hence obtain multi-scale insight into the structure of its tunnels and cavities. The persistence diagrams of this alpha filtration capture the sizes and robustness of such tunnels and cavities in a compact and meaningful manner. From these persistence diagrams, we extract a measure of compressibility derived from those topological features whose relevance is suggested by physical and chemical properties. Due to recent advances in combinatorial topology, this measure is efficiently and directly computable from information found in the Protein Data Bank (PDB). Our main result establishes a clear linear correlation between the topological measure and the experimentally-determined compressibility of most proteins for which both PDB information and experimental compressibility data are available. Finally, we establish that both the topological measurement and the linear correlation are stable with respect to small perturbations in the input data, such as those arising from experimental errors in compressibility and X-ray crystallography experiments.

Barcodes Distinguishing Morphology of Neuronal Tauopathy (2022)

David Beers, Despoina Goniotaki, Diane P. Hanger, Alain Goriely, Heather A. Harrington

Abstract

The geometry of neurons is known to be important for their functions. Hence, neurons are often classified by their morphology. Two recent methods, persistent homology and the topological morphology descriptor, assign a morphology descriptor called a barcode to a neuron equipped with a given function, such as the Euclidean distance from the root of the neuron. These barcodes can be converted into matrices called persistence images, which can then be averaged across groups. We show that when the defining function is the path length from the root, both the topological morphology descriptor and persistent homology are equivalent. We further show that persistence images arising from the path length procedure provide an interpretable summary of neuronal morphology. We introduce \topological morphology functions\, a class of functions similar to Sholl functions, that can be recovered from the associated topological morphology descriptor. To demonstrate this topological approach, we compare healthy cortical and hippocampal mouse neurons to those affected by progressive tauopathy. We find a significant difference in the morphology of healthy neurons and those with a tauopathy at a postsymptomatic age. We use persistence images to conclude that the diseased group tends to have neurons with shorter branches as well as fewer branches far from the soma.

Topological Descriptors of Histology Images (2014)

Nikhil Singh, Heather D. Couture, J. S. Marron, Charles Perou, Marc Niethammer

Abstract

The purpose of this study is to investigate architectural characteristics of cell arrangements in breast cancer histology images. We propose the use of topological data analysis to summarize the geometric information inherent in tumor cell arrangements. Our goal is to use this information as signatures that encode robust summaries of cell arrangements in tumor tissue as captured through histology images. In particular, using ideas from algebraic topology we construct topological descriptors based on cell nucleus segmentations such as persistency charts and Betti sequences. We assess their performance on the task of discriminating the breast cancer subtypes Basal, Luminal A, Luminal B and HER2. We demonstrate that the topological features contain useful complementary information to image-appearance based features that can improve discriminatory performance of classifiers.

Substructure Topology Preserving Simplification of Tetrahedral Meshes (2011)

Fabien Vivodtzev, Georges-Pierre Bonneau, Stefanie Hahmann, Hans Hagen

Abstract

Interdisciplinary efforts in modeling and simulating phenomena have led to complex multi-physics models involving different physical properties and materials in the same system. Within a 3d domain, substructures of lower dimensions appear at the interface between different materials. Correspondingly, an unstructuredtetrahedral mesh used for such a simulation includes 2d and 1d substructures embedded in the vertices, edges and faces of the mesh.The simplification of suchtetrahedral meshes must preserve (1) the geometry and the topology of the 3d domain, (2) the simulated data and (3) the geometry and topology of the embedded substructures. Although intensive research has been conducted on the first two goals, the third objective has received little attention.This paper focuses on the preservation of the topology of 1d and 2d substructures embedded in an unstructuredtetrahedral mesh, during edge collapse simplification. We define these substructures as simplicial sub-complexes of the mesh, which is modeled as an extended simplicial complex. We derive a robust algorithm, based on combinatorial topology results, in order to determine if an edge can be collapsed without changing the topology of both the mesh and all embedded substructures. Based on this algorithm we have developed a system for simplifying scientific datasets defined on irregular tetrahedral meshes with substructures. The implementation of our system is discussed in detail. We demonstrate the power of our system with real world scientific datasets from electromagnetism simulations.

A Topological Paradigm for Hippocampal Spatial Map Formation Using Persistent Homology (2012)

Y. Dabaghian, F. Mémoli, L. Frank, G. Carlsson

Abstract

An animal's ability to navigate through space rests on its ability to create a mental map of its environment. The hippocampus is the brain region centrally responsible for such maps, and it has been assumed to encode geometric information (distances, angles). Given, however, that hippocampal output consists of patterns of spiking across many neurons, and downstream regions must be able to translate those patterns into accurate information about an animal's spatial environment, we hypothesized that 1) the temporal pattern of neuronal firing, particularly co-firing, is key to decoding spatial information, and 2) since co-firing implies spatial overlap of place fields, a map encoded by co-firing will be based on connectivity and adjacency, i.e., it will be a topological map. Here we test this topological hypothesis with a simple model of hippocampal activity, varying three parameters (firing rate, place field size, and number of neurons) in computer simulations of rat trajectories in three topologically and geometrically distinct test environments. Using a computational algorithm based on recently developed tools from Persistent Homology theory in the field of algebraic topology, we find that the patterns of neuronal co-firing can, in fact, convey topological information about the environment in a biologically realistic length of time. Furthermore, our simulations reveal a “learning region” that highlights the interplay between the parameters in combining to produce hippocampal states that are more or less adept at map formation. For example, within the learning region a lower number of neurons firing can be compensated by adjustments in firing rate or place field size, but beyond a certain point map formation begins to fail. We propose that this learning region provides a coherent theoretical lens through which to view conditions that impair spatial learning by altering place cell firing rates or spatial specificity., Our ability to navigate our environments relies on the ability of our brains to form an internal representation of the spaces we're in. The hippocampus plays a central role in forming this internal spatial map, and it is thought that the ensemble of active “place cells” (neurons that are sensitive to location) somehow encode metrical information about the environment, akin to a street map. Several considerations suggested to us, however, that the brain might be more interested in topological information—i.e., connectivity, containment, and adjacency, more akin to a subway map— so we employed new methods in computational topology to estimate how basic properties of neuronal firing affect the time required to form a hippocampal spatial map of three test environments. Our analysis suggests that, in order to encode topological information correctly and in a biologically reasonable amount of time, the hippocampal place cells must operate within certain parameters of neuronal activity that vary with both the geometric and topological properties of the environment. The interplay of these parameters forms a “learning region” in which changes in one parameter can successfully compensate for changes in the others; values beyond the limits of this region, however, impair map formation.

Topological Analysis of Population Activity in Visual Cortex (2008)

Gurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, Dario L. Ringach

Abstract

Information in the cortex is thought to be represented by the joint activity of neurons. Here we describe how fundamental questions about neural representation can be cast in terms of the topological structure of population activity. A new method, based on the concept of persistent homology, is introduced and applied to the study of population activity in primary visual cortex (V1). We found that the topological structure of activity patterns when the cortex is spontaneously active is similar to those evoked by natural image stimulation and consistent with the topology of a two sphere. We discuss how this structure could emerge from the functional organization of orientation and spatial frequency maps and their mutual relationship. Our findings extend prior results on the relationship between spontaneous and evoked activity in V1 and illustrates how computational topology can help tackle elementary questions about the representation of information in the nervous system.

Topological Attention for Time Series Forecasting (2021)

Sebastian Zeng, Florian Graf, Christoph Hofer, Roland Kwitt

Abstract

The problem of (point) forecasting univariate time series is considered. Most approaches, ranging from traditional statistical methods to recent learning-based techniques with neural networks, directly operate on raw time series observations. As an extension, we study whether local topological properties, as captured via persistent homology, can serve as a reliable signal that provides complementary information for learning to forecast. To this end, we propose topological attention, which allows attending to local topological features within a time horizon of historical data. Our approach easily integrates into existing end-to-end trainable forecasting models, such as N-BEATS, and, in combination with the latter exhibits state-of-the-art performance on the large-scale M4 benchmark dataset of 100,000 diverse time series from different domains. Ablation experiments, as well as a comparison to recent techniques in a setting where only a single time series is available for training, corroborate the beneficial nature of including local topological information through an attention mechanism.

Cooperative Grasping Through Topological Object Representation (2014)

A. Marzinotto, J. A. Stork, D. V. Dimarogonas, D. Kragic

Abstract

We present a cooperative grasping approach based on a topological representation of objects. Using point cloud data we extract loops on objects suitable for generating entanglement. We use the Gauss Linking Integral to derive controllers for multi-agent systems that generate hooking grasps on such loops while minimizing the entanglement between robots. The approach copes well with noisy point cloud data, it is computationally simple and robust. We demonstrate the method for performing object grasping and transportation, through a hooking maneuver, with two coordinated NAO robots.

Persistent Homology of the Cosmic Web. I: Hierarchical Topology in \$\Lambda\$CDM Cosmologies (2021)

Georg Wilding, Keimpe Nevenzeel, Rien van de Weygaert, Gert Vegter, Pratyush Pranav, Bernard J. T. Jones, Konstantinos Efstathiou, Job Feldbrugge

Abstract

Using a set of \$\Lambda\$CDM simulations of cosmic structure formation, we study the evolving connectivity and changing topological structure of the cosmic web using state-of-the-art tools of multiscale topological data analysis (TDA). We follow the development of the cosmic web topology in terms of the evolution of Betti number curves and feature persistence diagrams of the three (topological) classes of structural features: matter concentrations, filaments and tunnels, and voids. The Betti curves specify the prominence of features as a function of density level, and their evolution with cosmic epoch reflects the changing network connections between these structural features. The persistence diagrams quantify the longevity and stability of topological features. In this study we establish, for the first time, the link between persistence diagrams, the features they show, and the gravitationally driven cosmic structure formation process. By following the diagrams' development over cosmic time, the link between the multiscale topology of the cosmic web and the hierarchical buildup of cosmic structure is established. The sharp apexes in the diagrams are intimately related to key transitions in the structure formation process. The apex in the matter concentration diagrams coincides with the density level at which, typically, they detach from the Hubble expansion and begin to collapse. At that level many individual islands merge to form the network of the cosmic web and a large number of filaments and tunnels emerge to establish its connecting bridges. The location trends of the apex possess a self-similar character that can be related to the cosmic web's hierarchical buildup. We find that persistence diagrams provide a significantly higher and more profound level of information on the structure formation process than more global summary statistics like Euler characteristic or Betti numbers.

What Can Topology Tell Us About the Neural Code? (2017)

Carina Curto

Abstract

Neuroscience is undergoing a period of rapid experimental progress and expansion. New mathematical tools, previously unknown in the neuroscience community, are now being used to tackle fundamental questions and analyze emerging data sets. Consistent with this trend, the last decade has seen an uptick in the use of topological ideas and methods in neuroscience. In this paper I will survey recent applications of topology in neuroscience, and explain why topology is an especially natural tool for understanding neural codes.

Toward Automated Prediction of Manufacturing Productivity Based on Feature Selection Using Topological Data Analysis (2016)

Wei Guo, Ashis G. Banerjee

Abstract

In this paper, we extend the application of topological data analysis (TDA) to the field of manufacturing for the first time to the best of our knowledge. We apply a particular TDA method, known as the Mapper algorithm, on a benchmark chemical processing data set. The algorithm yields a topological network that captures the intrinsic clusters and connections among the clusters present in the high-dimensional data set, which are difficult to detect using traditional methods. We select key process variables or features that impact the final product yield by analyzing the shape of this network. We then use three prediction models to evaluate the impact of the selected features. Results show that the models achieve the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.

A Topological Data Analysis Approach On Predicting Phenotypes From Gene Expression Data (2020)

Sayan Mandal, Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida

Abstract

The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis., We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods., This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.

Topological Differential Testing (2020)

Kristopher Ambrose, Steve Huntsman, Michael Robinson, Matvey Yutin

Abstract

We introduce topological differential testing (TDT), an approach to extracting the consensus behavior of a set of programs on a corpus of inputs. TDT uses the topological notion of a simplicial complex (and implicitly draws on richer topological notions such as sheaves and persistence) to determine inputs that cause inconsistent behavior and in turn reveal \emph\de facto\ input specifications. We gently introduce TDT with a toy example before detailing its application to understanding the PDF file format from the behavior of various parsers. Finally, we discuss theoretical details and other possible applications.

Combining Geometric and Topological Information in Image Segmentation (2019)

Hengrui Luo, Justin Strait

Abstract

A fundamental problem in computer vision is image segmentation, where the goal is to delineate the boundary of an object in the image. The focus of this work is on the segmentation of grayscale images and its purpose is two-fold. First, we conduct an in-depth study comparing active contour and topology-based methods in a statistical framework, two popular approaches for boundary detection of 2-dimensional images. Certain properties of the image dataset may favor one method over the other, both from an interpretability perspective as well as through evaluation of performance measures. Second, we propose the use of topological knowledge to assist an active contour method, which can potentially incorporate prior shape information. The latter is known to be extremely sensitive to algorithm initialization, and thus, we use a topological model to provide an automatic initialization. In addition, our proposed model can handle objects in images with more complex topological structures, including objects with holes and multiple objects within one image. We demonstrate this on artificially-constructed image datasets from computer vision, as well as real medical image data.

Spatial Applications of Topological Data Analysis: Cities, Snowflakes, Random Structures, and Spiders Spinning Under the Influence (2020)

Michelle Feng, Mason A. Porter

Abstract

Spatial networks are ubiquitous in social, geographic, physical, and biological applications. To understand their large-scale structure, it is important to develop methods that allow one to directly probe the effects of space on structure and dynamics. Historically, algebraic topology has provided one framework for rigorously and quantitatively describing the global structure of a space, and recent advances in topological data analysis (TDA) have given scholars a new lens for analyzing network data. In this paper, we study a variety of spatial networks --- including both synthetic and natural ones --- using novel topological methods that we recently developed specifically for analyzing spatial networks. We demonstrate that our methods are able to capture meaningful quantities, with specifics that depend on context, in spatial networks and thereby provide useful insights into the structure of those networks, including a novel approach for characterizing them based on their topological structures. We illustrate these ideas with examples of synthetic networks and dynamics on them, street networks in cities, snowflakes, and webs spun by spiders under the influence of various psychotropic substances.

A Topological Approach to Selecting Models of Biological Experiments (2019)

M. Ulmer, Lori Ziegelmeier, Chad M. Topaz

Abstract

We use topological data analysis as a tool to analyze the fit of mathematical models to experimental data. This study is built on data obtained from motion tracking groups of aphids in [Nilsen et al., PLOS One, 2013] and two random walk models that were proposed to describe the data. One model incorporates social interactions between the insects via a functional dependence on an aphid’s distance to its nearest neighbor. The second model is a control model that ignores this dependence. We compare data from each model to data from experiment by performing statistical tests based on three different sets of measures. First, we use time series of order parameters commonly used in collective motion studies. These order parameters measure the overall polarization and angular momentum of the group, and do not rely on a priori knowledge of the models that produced the data. Second, we use order parameter time series that do rely on a priori knowledge, namely average distance to nearest neighbor and percentage of aphids moving. Third, we use computational persistent homology to calculate topological signatures of the data. Analysis of the a priori order parameters indicates that the interactive model better describes the experimental data than the control model does. The topological approach performs as well as these a priori order parameters and better than the other order parameters, suggesting the utility of the topological approach in the absence of specific knowledge of mechanisms underlying the data.

Topological Data Analysis of Single-Trial Electroencephalographic Signals (2018)

Yuan Wang, Hernando Ombao, Moo K. Chung

Abstract

Epilepsy is a neurological disorder that can negatively affect the visual, audial and motor functions of the human brain. Statistical analysis of neurophysiological recordings, such as electroencephalogram (EEG), facilitates the understanding and diagnosis of epileptic seizures. Standard statistical methods, however, do not account for topological features embedded in EEG signals. In the current study, we propose a persistent homology (PH) procedure to analyze single-trial EEG signals. The procedure denoises signals with a weighted Fourier series (WFS), and tests for topological difference between the denoised signals with a permutation test based on their PH features persistence landscapes (PL). Simulation studies show that the test effectively identifies topological difference and invariance between two signals. In an application to a single-trial multichannel seizure EEG dataset, our proposed PH procedure was able to identify the left temporal region to consistently show topological invariance, suggesting that the PH features of the Fourier decomposition during seizure is similar to the process before seizure. This finding is important because it could not be identified from a mere visual inspection of the EEG data and was in fact missed by earlier analyses of the same dataset.

Current Theoretical Models Fail to Predict the Topological Complexity of the Human Genome (2015)

Javier Arsuaga, Reyka G. Jayasinghe, Robert G. Scharein, Mark R. Segal, Robert H. Stolz, Mariel Vazquez

Abstract

Understanding the folding of the human genome is a key challenge of modern structural biology. The emergence of chromatin conformation capture assays (e.g., Hi-C) has revolutionized chromosome biology and provided new insights into the three dimensional structure of the genome. The experimental data are highly complex and need to be analyzed with quantitative tools. It has been argued that the data obtained from Hi-C assays are consistent with a fractal organization of the genome. A key characteristic of the fractal globule is the lack of topological complexity (knotting or inter-linking). However, the absence of topological complexity contradicts results from polymer physics showing that the entanglement of long linear polymers in a confined volume increases rapidly with the length and with decreasing volume. In vivo and in vitro assays support this claim in some biological systems. We simulate knotted lattice polygons confined inside a sphere and demonstrate that their contact frequencies agree with the human Hi-C data. We conclude that the topological complexity of the human genome cannot be inferred from current Hi-C data.

Loops Abound in the Cosmic Microwave Background: A \$4\sigma\$ Anomaly on Super-Horizon Scales (2021)

Pratyush Pranav

Abstract

We present a topological analysis of the temperature fluctuation maps from the \emph\Planck 2020\ Data release 4 (DR4) based on the \texttt\NPIPE\ data processing pipeline. For comparison, we also present the topological characteristics of the maps from \emph\Planck 2018\ Data release 3 (DR3). We perform our analysis in terms of the homology characteristics of the maps, invoking relative homology to account for analysis in the presence of masks. We perform our analysis for a range of smoothing scales spanning sub- and super-horizon scales corresponding to \$FWHM = 5', 10', 20', 40', 80', 160', 320', 640'\$. Our main result indicates a significantly anomalous behavior of the loops in the observed maps compared to simulations that are modeled as isotopic and homogeneous Gaussian random fields. Specifically, we observe a \$4\sigma\$ deviation between the observation and simulations in the number of loops at \$FWHM = 320'\$ and \$FWHM = 640'\$, corresponding to super-horizon scales of \$5\$ degrees and larger. In addition, we also notice a mildly significant deviation at \$2\sigma\$ for all the topological descriptors for almost all the scales analyzed. Our results show a consistency across different data releases, and therefore, the anomalous behavior deserves a careful consideration regarding its origin and ramifications. Disregarding the unlikely source of the anomaly being instrumental systematics, the origin of the anomaly may be genuinely astrophysical -- perhaps due to a yet unresolved foreground, or truly primordial in nature. Given the nature of the topological descriptors, that potentially encodes information of all orders, non-Gaussianities, of either primordial or late-type nature, may be potential candidates. Alternate possibilities include the Universe admitting a non-trivial global topology, including effects induced by large-scale topological defects.

Topology of Frame Field Meshing (2020)

Piotr Beben

Abstract

In the past decade frame fields have emerged as a promising approach for generating hexahedral meshes for CFD and CAE applications. One important problem asks for construction of a boundary aligned frame field with prescribed singularity constraints that correspond to a valid hexahedral mesh. We give a necessary and sufficient condition in terms of solutions to a system of monomial equations whose variables are in the binary octahedral group. Along the way we look at frame field design from an algebraic topological perspective, proving various results, some known, some new.

Testing Topological Data Analysis for Condition Monitoring of Wind Turbines (2024)

Simone Casolo, Alexander Stasik, Zhenyou Zhang, Signe Riemer-Sørensen

Abstract

We present an investigation of how topological data analysis (TDA) can be applied to condition-based monitoring (CBM) of wind turbines for energy generation.TDA is a branch of data analysis focusing on extracting mean- ingful information from complex datasets by analyzing their structure in state space and computing their underlying topo- logical features. By representing data in a high-dimensional state space, TDA enables the identification of patterns, anoma- lies, and trends in the data that may not be apparent through traditional signal processing methods. For this study, wind turbine data was acquired from a wind park in Norway via standard vibration sensors at different lo- cations of the turbine’s gearbox. Both the vibration acceler- ation data and its frequency spectra were recorded at infre- quent intervals for a few seconds at high frequency and fail- ure events were labelled as either gear-tooth or ball-bearing failures. The data processing and analysis are based on a pipeline where the time series data is first split into intervals and then transformed into multi-dimensional point clouds via a time-delay embedding. The shape of the point cloud is an- alyzed with topological methods such as persistent homol- ogy to generate topology-based key health indicators based on Betti numbers, information entropy and signal persistence. Such indicators are tested for CBM and diagnosis (fault de- tection) to identify faults in wind turbines and classify them accordingly. Topological indicators are shown to be an in- teresting alternative for failure identification and diagnosis of operational failures in wind turbines.

Topological Data Analysis and Diagnostics of Compressible Magnetohydrodynamic Turbulence (2018)

Irina Makarenko, Paul Bushby, Andrew Fletcher, Robin Henderson, Nikolay Makarenko, Anvar Shukurov

Abstract

The predictions of mean-field electrodynamics can now be probed using direct numerical simulations of random flows and magnetic fields. When modelling astrophysical magnetohydrodynamics, it is important to verify that such simulations are in agreement with observations. One of the main challenges in this area is to identify robust quantitative measures to compare structures found in simulations with those inferred from astrophysical observations. A similar challenge is to compare quantitatively results from different simulations. Topological data analysis offers a range of techniques, including the Betti numbers and persistence diagrams, that can be used to facilitate such a comparison. After describing these tools, we first apply them to synthetic random fields and demonstrate that, when the data are standardized in a straightforward manner, some topological measures are insensitive to either large-scale trends or the resolution of the data. Focusing upon one particular astrophysical example, we apply topological data analysis to H i observations of the turbulent interstellar medium (ISM) in the Milky Way and to recent magnetohydrodynamic simulations of the random, strongly compressible ISM. We stress that these topological techniques are generic and could be applied to any complex, multi-dimensional random field.

Topological Persistence for Relating Microstructure and Capillary Fluid Trapping in Sandstones (2019)

A. L. Herring, V. Robins, A. P. Sheppard

Abstract

Results from a series of two-phase fluid flow experiments in Leopard, Berea, and Bentheimer sandstones are presented. Fluid configurations are characterized using laboratory-based and synchrotron based 3-D X-ray computed tomography. All flow experiments are conducted under capillary-dominated conditions. We conduct geometry-topology analysis via persistent homology and compare this to standard topological and watershed-partition-based pore-network statistics. Metrics identified as predictors of nonwetting fluid trapping are calculated from the different analytical methods and are compared to levels of trapping measured during drainage-imbibition cycles in the experiments. Metrics calculated from pore networks (i.e., pore body-throat aspect ratio and coordination number) and topological analysis (Euler characteristic) do not correlate well with trapping in these samples. In contrast, a new metric derived from the persistent homology analysis, which incorporates counts of topological features as well as their length scale and spatial distribution, correlates very well (R2 = 0.97) to trapping for all systems. This correlation encompasses a wide range of porous media and initial fluid configurations, and also applies to data sets of different imaging and image processing protocols.

Topologically Densified Distributions (2020)

Christoph Hofer, Florian Graf, Marc Niethammer, Roland Kwitt

Abstract

We study regularization in the context of small sample-size learning with over-parametrized neural networks. Specifically, we shift focus from architectural properties, such as norms on the network weights, to properties of the internal representations before a linear classifier. Specifically, we impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances, i.e., a property beneficial for generalization. By leveraging previous work to impose topological constrains in a neural network setting, we provide empirical evidence (across various vision benchmarks) to support our claim for better generalization.

Persistent Homology Analysis of Ion Aggregations and Hydrogen-Bonding Networks (2018)

Kelin Xia

Abstract

Despite the great advancement of experimental tools and theoretical models, a quantitative characterization of the microscopic structures of ion aggregates and their associated water hydrogen-bonding networks still remains a challenging problem. In this paper, a newly-invented mathematical method called persistent homology is introduced, for the first time, to quantitatively analyze the intrinsic topological properties of ion aggregation systems and hydrogen-bonding networks. The two most distinguishable properties of persistent homology analysis of assembly systems are as follows. First, it does not require a predefined bond length to construct the ion or hydrogen-bonding network. Persistent homology results are determined by the morphological structure of the data only. Second, it can directly measure the size of circles or holes in ion aggregates and hydrogen-bonding networks. To validate our model, we consider two well-studied systems, i.e., NaCl and KSCN solutions, generated from molecular dynamics simulations. They are believed to represent two morphological types of aggregation, i.e., local clusters and extended ion networks. It has been found that the two aggregation types have distinguishable topological features and can be characterized by our topological model very well. Further, we construct two types of networks, i.e., O-networks and H2O-networks, for analyzing the topological properties of hydrogen-bonding networks. It is found that for both models, KSCN systems demonstrate much more dramatic variations in their local circle structures with a concentration increase. A consistent increase of large-sized local circle structures is observed and the sizes of these circles become more and more diverse. In contrast, NaCl systems show no obvious increase of large-sized circles. Instead a consistent decline of the average size of the circle structures is observed and the sizes of these circles become more and more uniform with a concentration increase. As far as we know, these unique intrinsic topological features in ion aggregation systems have never been pointed out before. More importantly, our models can be directly used to quantitatively analyze the intrinsic topological invariants, including circles, loops, holes, and cavities, of any network-like structures, such as nanomaterials, colloidal systems, biomolecular assemblies, among others. These topological invariants cannot be described by traditional graph and network models.

Euler Characteristic Surfaces (2021)

Gabriele Beltramo, Rayna Andreeva, Ylenia Giarratano, Miguel O. Bernabeu, Rik Sarkar, Primoz Skraba

Abstract

We study the use of the Euler characteristic for multiparameter topological data analysis. Euler characteristic is a classical, well-understood topological invariant that has appeared in numerous applications, including in the context of random fields. The goal of this paper is to present the extension of using the Euler characteristic in higher-dimensional parameter spaces. While topological data analysis of higher-dimensional parameter spaces using stronger invariants such as homology continues to be the subject of intense research, Euler characteristic is more manageable theoretically and computationally, and this analysis can be seen as an important intermediary step in multi-parameter topological data analysis. We show the usefulness of the techniques using artificially generated examples, and a real-world application of detecting diabetic retinopathy in retinal images.

Grasping Objects With Holes: A Topological Approach (2013)

F. T. Pokorny, J. A. Stork, D. Kragic

Abstract

This work proposes a topologically inspired approach for generating robot grasps on objects with `holes'. Starting from a noisy point-cloud, we generate a simplicial representation of an object of interest and use a recently developed method for approximating shortest homology generators to identify graspable loops. To control the movement of the robot hand, a topologically motivated coordinate system is used in order to wrap the hand around such loops. Finally, another concept from topology - namely the Gauss linking integral - is adapted to serve as evidence for secure caging grasps after a grasp has been executed. We evaluate our approach in simulation on a Barrett hand using several target objects of different sizes and shapes and present an initial experiment with real sensor data.

Topological Biomarkers for Real-Time Detection of Epileptic Seizures (2022)

Ximena Fernández, Diego Mateos

Abstract

Automated seizure detection is a fundamental problem in computational neuroscience towards diagnosis and treatment's improvement of epileptic disease. We propose a real-time computational method for automated tracking and detection of epileptic seizures from raw neurophysiological recordings. Our mechanism is based on the topological analysis of the sliding-window embedding of the time series derived from simultaneously recorded channels. We extract topological biomarkers from the signals via the computation of the persistent homology of time-evolving topological spaces. Remarkably, the proposed biomarkers robustly captures the change in the brain dynamics during the ictal state. We apply our methods in different types of signals including scalp and intracranial EEG and MEG, in patients during interictal and ictal states, showing high accuracy in a range of clinical situations.

Algebraic Topology-Based Machine Learning Using MRI Predicts Outcomes in Primary Sclerosing Cholangitis (2022)

Yashbir Singh, William A. Jons, John E. Eaton, Mette Vesterhus, Tom Karlsen, Ida Bjoerk, Andreas Abildgaard, Kristin Kaasen Jorgensen, Folseraas Trine, Derek Little, Aliya F. Gulamhusein, Kosta Petrovic, Anne Negard, Gian Marco Conte, Joseph D. Sobek, Jaidip Jagtap, Sudhakar K. Venkatesh, Gregory J. Gores, Nicholas F. LaRusso, Konstantinos N. Lazaridis, Bradley J. Erickson

Abstract

Background: Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease that can lead to cirrhosis and hepatic decompensation. However, predicting future outcomes in patients with PSC is challenging. Our aim was to extract magnetic resonance imaging (MRI) features that predict the development of hepatic decompensation by applying algebraic topology-based machine learning (ML). Methods: We conducted a retrospective multicenter study among adults with large duct PSC who underwent MRI. A topological data analysis-inspired nonlinear framework was used to predict the risk of hepatic decompensation, which was motivated by algebraic topology theory-based ML. The topological representations (persistence images) were employed as input for classifcation to predict who developed early hepatic decompensation within one year after their baseline MRI. Results: We reviewed 590 patients; 298 were excluded due to poor image quality or inadequate liver coverage, leaving 292 potentially eligible subjects, of which 169 subjects were included in the study. We trained our model using contrast-enhanced delayed phase T1-weighted images on a single center derivation cohort consisting of 54 patients (hepatic decompensation, n = 21; no hepatic decompensation, n = 33) and a multicenter independent validation cohort of 115 individuals (hepatic decompensation, n = 31; no hepatic decompensation, n = 84). When our model was applied in the independent validation cohort, it remained predictive of early hepatic decompensation (area under the receiver operating characteristic curve = 0.84). Conclusions: Algebraic topology-based ML is a methodological approach that can predict outcomes in patients with PSC and has the potential for application in other chronic liver diseases

A Classification of Topological Discrepancies in Additive Manufacturing (2019)

Morad Behandish, Amir M. Mirzendehdel, Saigopal Nelaturi

Abstract

Additive manufacturing (AM) enables enormous freedom for design of complex structures. However, the process-dependent limitations that result in discrepancies between as-designed and as-manufactured shapes are not fully understood. The tradeoffs between infinitely many different ways to approximate a design by a manufacturable replica are even harder to characterize. To support design for AM (DfAM), one has to quantify local discrepancies introduced by AM processes, identify the detrimental deviations (if any) to the original design intent, and prescribe modifications to the design and/or process parameters to countervail their effects. Our focus in this work will be on topological analysis. There is ample evidence in many applications that preserving local topology (e.g., connectivity of beams in a lattice) is important even when slight geometric deviations can be tolerated. We first present a generic method to characterize local topological discrepancies due to material under-and over-deposition in AM, and show how it captures various types of defects in the as-manufactured structures. We use this information to systematically modify the as-manufactured outcomes within the limitations of available 3D printer resolution(s), which often comes at the expense of introducing more geometric deviations (e.g., thickening a beam to avoid disconnection). We validate the effectiveness of the method on 3D examples with nontrivial topologies such as lattice structures and foams.

A Topological Analysis of the Space of Recipes (2025)

Emerson G. Escolar, Yuta Shimada, Masahiro Yuasa

Abstract

In recent years, the use of data-driven methods has provided insights into underlying patterns and principles behind culinary recipes. In this exploratory work, we introduce the use of topological data analysis, especially persistent homology, in order to study the space of culinary recipes. In particular, persistent homology analysis provides a set of recipes surrounding the multiscale “holes” in the space of existing recipes. We then propose a method to generate novel ingredient combinations using combinatorial optimization on this topological information. We made biscuits using the novel ingredient combinations, which were confirmed to be acceptable enough by a sensory evaluation study. Our findings indicate that topological data analysis has the potential for providing new tools and insights in the study of culinary recipes.

Community Resources

Code

Algebraic Topology-Based Machine Learning Using MRI Predicts Outcomes in Primary Sclerosing Cholangitis (2022)

Yashbir Singh, William A. Jons, John E. Eaton, Mette Vesterhus, Tom Karlsen, Ida Bjoerk, Andreas Abildgaard, Kristin Kaasen Jorgensen, Folseraas Trine, Derek Little, Aliya F. Gulamhusein, Kosta Petrovic, Anne Negard, Gian Marco Conte, Joseph D. Sobek, Jaidip Jagtap, Sudhakar K. Venkatesh, Gregory J. Gores, Nicholas F. LaRusso, Konstantinos N. Lazaridis, Bradley J. Erickson

Abstract

Background: Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease that can lead to cirrhosis and hepatic decompensation. However, predicting future outcomes in patients with PSC is challenging. Our aim was to extract magnetic resonance imaging (MRI) features that predict the development of hepatic decompensation by applying algebraic topology-based machine learning (ML). Methods: We conducted a retrospective multicenter study among adults with large duct PSC who underwent MRI. A topological data analysis-inspired nonlinear framework was used to predict the risk of hepatic decompensation, which was motivated by algebraic topology theory-based ML. The topological representations (persistence images) were employed as input for classifcation to predict who developed early hepatic decompensation within one year after their baseline MRI. Results: We reviewed 590 patients; 298 were excluded due to poor image quality or inadequate liver coverage, leaving 292 potentially eligible subjects, of which 169 subjects were included in the study. We trained our model using contrast-enhanced delayed phase T1-weighted images on a single center derivation cohort consisting of 54 patients (hepatic decompensation, n = 21; no hepatic decompensation, n = 33) and a multicenter independent validation cohort of 115 individuals (hepatic decompensation, n = 31; no hepatic decompensation, n = 84). When our model was applied in the independent validation cohort, it remained predictive of early hepatic decompensation (area under the receiver operating characteristic curve = 0.84). Conclusions: Algebraic topology-based ML is a methodological approach that can predict outcomes in patients with PSC and has the potential for application in other chronic liver diseases

The Importance of Forgetting: Limiting Memory Improves Recovery of Topological Characteristics From Neural Data (2018)

Samir Chowdhury, Bowen Dai, Facundo Mémoli

Abstract

We develop of a line of work initiated by Curto and Itskov towards understanding the amount of information contained in the spike trains of hippocampal place cells via topology considerations. Previously, it was established that simply knowing which groups of place cells fire together in an animal’s hippocampus is sufficient to extract the global topology of the animal’s physical environment. We model a system where collections of place cells group and ungroup according to short-term plasticity rules. In particular, we obtain the surprising result that in experiments with spurious firing, the accuracy of the extracted topological information decreases with the persistence (beyond a certain regime) of the cell groups. This suggests that synaptic transience, or forgetting, is a mechanism by which the brain counteracts the effects of spurious place cell activity.

Uncovering the Topology of Time-Varying fMRI Data Using Cubical Persistence (2020)

Bastian Rieck, Tristan Yates, Christian Bock, Karsten Borgwardt, Guy Wolf, Nicholas Turk-Browne, Smita Krishnaswamy

Abstract

Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and person-to-person variation in how information is represented in the brain. To address this challenge, we present a novel topological approach that encodes each time point in an fMRI data set as a persistence diagram of topological features, i.e. high-dimensional voids present in the data. This representation naturally does not rely on voxel-by-voxel correspondence and is robust to noise. We show that these time-varying persistence diagrams can be clustered to find meaningful groupings between participants, and that they are also useful in studying within-subject brain state trajectories of subjects performing a particular task. Here, we apply both clustering and trajectory analysis techniques to a group of participants watching the movie 'Partly Cloudy'. We observe significant differences in both brain state trajectories and overall topological activity between adults and children watching the same movie.

Community Resources

Code

Development of the Functional Connectome Topology in Adolescence: Evidence From Topological Data Analysis (2021)

Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Martha Beatriz Moreno, Fernando A. Barrios, Sarael Alcauter

Abstract

Adolescence is a crucial developmental period in terms of behavior and mental health. Therefore, understanding how the brain develops during this stage is a fundamental challenge for neuroscience. Recent studies have modelled the brain as a network or connectome, mainly applying measures from graph theory, showing a change in its functional organization such as an increase in its segregation and integration. Topological Data Analysis (TDA) complements such modelling by extracting high-dimensional features across the whole range of connectivity values, instead of exploring a fixed set of connections. This study enquiries into the developmental trajectories of such properties using a longitudinal sample of typically developing participants (N = 98; 53/45 F/M; 6.7-18.1 years), applying TDA into their functional connectomes. In addition, we explore the effect of puberty on the individual developmental trajectories. Results showed that compared to random networks, the adolescent brain is more segregated at the global level, but more densely connected at the local level. Furthermore, developmental effects showed nonlinear trajectories for the integration of the whole brain and fronto-parietal networks, with an inflection point and increasing trajectories after puberty onset. These results add to the insights in the development of the functional organization of the adolescent. Significance Statement Topological Data Analysis may be used to explore the topology of the brain along the whole range of connectivity values instead of selecting only a fixed set of connectivity thresholds. Here, we explored some properties of the topology of the brain functional connectome, and how they develop in adolescence. First, we show that developmental trajectories are nonlinear and better explained by the puberty status than chronological age, with an inflection point around the puberty onset. The greatest effect is the increase in functional integration for the whole brain, and particularly for the Fronto-Parietal Network when exploring functional subnetworks.

Modelling Topological Features of Swarm Behaviour in Space and Time With Persistence Landscapes (2017)

P. Corcoran, C. B. Jones

Abstract

This paper presents a model of swarm behavior that encodes the spatial-temporal characteristics of topological features, such as holes and connected components. Specifically, the persistence of topological features with respect to time is computed using zig-zag persistent homology. This information is in turn modelled as a persistence landscape, which forms a normed vector space and facilitates the application of statistical and data mining techniques. Validation of the proposed model is performed using a real data set corresponding to a swarm of fish. It is demonstrated that the proposed model may be used to perform retrieval and clustering of swarm behavior in terms of topological features. In fact, it is discovered that clustering returns clusters corresponding to the swarm behaviors of flock, torus, and disordered. These are the most frequently occurring types of behavior exhibited by swarms in general.

Hyperparameter Optimization of Topological Features for Machine Learning Applications (2019)

Francis Motta, Christopher Tralie, Rossella Bedini, Fabiano Bini, Gilberto Bini, Hamed Eramian, Marcio Gameiro, Steve Haase, Hugh Haddox, John Harer, Nick Leiby, Franco Marinozzi, Scott Novotney, Gabe Rocklin, Jed Singer, Devin Strickland, Matt Vaughn

Abstract

This paper describes a general pipeline for generating optimal vector representations of topological features of data for use with machine learning algorithms. This pipeline can be viewed as a costly black-box function defined over a complex configuration space, each point of which specifies both how features are generated and how predictive models are trained on those features. We propose using state-of-the-art Bayesian optimization algorithms to inform the choice of topological vectorization hyperparameters while simultaneously choosing learning model parameters. We demonstrate the need for and effectiveness of this pipeline using two difficult biological learning problems, and illustrate the nontrivial interactions between topological feature generation and learning model hyperparameters.

Fast Estimation of Recombination Rates Using Topological Data Analysis (2019)

Devon P. Humphreys, Melissa R. McGuirl, Michael Miyagi, Andrew J. Blumberg

Abstract

Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number (\textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-1.gif"/\textgreater) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call ψ, with a natural connection to coalescent models, and present novel arguments relating \textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-2.gif"/\textgreater to population genetic models. Using simulations, we show that ψ and \textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-3.gif"/\textgreater are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE’s efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.

Gene Expression Data Classification Using Topology and Machine Learning Models (2022)

Tamal K. Dey, Sayan Mandal, Soham Mukherjee

Abstract

Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.

Community Resources

Code

A Probabilistic Topological Approach to Feature Identification Using a Stochastic Robotic Swarm (2018)

Ragesh K. Ramachandran, Sean Wilson, Spring Berman

Abstract

This paper presents a novel automated approach to quantifying the topological features of an unknown environment using a swarm of robots with local sensing and limited or no access to global position information. The robots randomly explore the environment and record a time series of their estimated position and the covariance matrix associated with this estimate. After the robots’ deployment, a point cloud indicating the free space of the environment is extracted from their aggregated data. Tools from topological data analysis, in particular the concept of persistent homology, are applied to a subset of the point cloud to construct barcode diagrams, which are used to determine the numbers of different types of features in the domain. We demonstrate that our approach can correctly identify the number of topological features in simulations with zero to four features and in multi-robot experiments with one to three features.

Position: Topological Deep Learning Is the New Frontier for Relational Learning (2024)

Theodore Papamarkou, Tolga Birdal, Michael M. Bronstein, Gunnar E. Carlsson, Justin Curry, Yue Gao, Mustafa Hajij, Roland Kwitt, Pietro Lio, Paolo Di Lorenzo, Vasileios Maroulas, Nina Miolane, Farzana Nasrin, Karthikeyan Natesan Ramamurthy, Bastian Rieck, Simone Scardapane, Michael T. Schaub, Petar Veličković, Bei Wang, Yusu Wang, Guowei Wei, Ghada Zamzmi

Abstract

Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.

Alpha, Betti and the Megaparsec Universe: On the Topology of the Cosmic Web (2011)

Rien Van De Weygaert, Gert Vegter, Herbert Edelsbrunner, Bernard J. T. Jones, Pratyush Pranav, Changbom Park, Wojciech A. Hellwing, Bob Eldering, Nico Kruithof, E. G. P. Bos, Johan Hidding, Job Feldbrugge, Eline Ten Have, Matti Van Engelen, Manuel Caroli, Monique Teillaud

Abstract

We study the topology of the Megaparsec Cosmic Web in terms of the scale-dependent Betti numbers, which formalize the topological information content of...

Topology in Cyber Research (2022)

Steve Huntsman, Jimmy Palladino, Michael Robinson

Abstract

We give an idiosyncratic overview of applications of topology to cyber research, spanning the analysis of variables/assignments and control flow in computer programs, a brief sketch of topological data analysis in one dimension, and the use of sheaves to analyze wireless networks. The text is from a chapter in the forthcoming book Mathematics in Cyber Research, to be published by Taylor and Francis.

The Topology of the Cosmic Web in Terms of Persistent Betti Numbers (2017)

Pratyush Pranav, Herbert Edelsbrunner, Rien van de Weygaert, Gert Vegter, Michael Kerber, Bernard J. T. Jones, Mathijs Wintraecken

Abstract

Abstract. We introduce a multiscale topological description of the Megaparsec web-like cosmic matter distribution. Betti numbers and topological persistence of

Topology-Based Signal Separation (2004)

V. Robins, N. Rooney, E. Bradley

Revisiting Abnormalities in Brain Network Architecture Underlying Autism Using Topology-Inspired Statistical Inference (2018)

Sourabh Palande, Vipin Jose, Brandon Zielinski, Jeffrey Anderson, P. Thomas Fletcher, Bei Wang

Abstract

A large body of evidence relates autism with abnormal structural and functional brain connectivity. Structural covariance magnetic resonance imaging (scMRI) is a technique that maps brain regions with covarying gray matter densities across subjects. It provides a way to probe the anatomical structure underlying intrinsic connectivity networks (ICNs) through analysis of gray matter signal covariance. In this article, we apply topological data analysis in conjunction with scMRI to explore network-specific differences in the gray matter structure in subjects with autism versus age-, gender-, and IQ-matched controls. Specifically, we investigate topological differences in gray matter structure captured by structural correlation graphs derived from three ICNs strongly implicated in autism, namely the salience network, default mode network, and executive control network. By combining topological data analysis with statistical inference, our results provide evidence of statistically significant network-specific structural abnormalities in autism.

Identification of Topological Network Modules in Perturbed Protein Interaction Networks (2017)

Mihaela E. Sardiu, Joshua M. Gilmore, Brad Groppe, Laurence Florens, Michael P. Washburn

Abstract

Biological networks consist of functional modules, however detecting and characterizing such modules in networks remains challenging. Perturbing networks is one strategy for identifying modules. Here we used an advanced mathematical approach named topological data analysis (TDA) to interrogate two perturbed networks. In one, we disrupted the S. cerevisiae INO80 protein interaction network by isolating complexes after protein complex components were deleted from the genome. In the second, we reanalyzed previously published data demonstrating the disruption of the human Sin3 network with a histone deacetylase inhibitor. Here we show that disrupted networks contained topological network modules (TNMs) with shared properties that mapped onto distinct locations in networks. We define TMNs as proteins that occupy close network positions depending on their coordinates in a topological space. TNMs provide new insight into networks by capturing proteins from different categories including proteins within a complex, proteins with shared biological functions, and proteins disrupted across networks.

Severe Slugging Flow Identification From Topological Indicators (2022)

Simone Casolo

Abstract

In this work, topological data analysis is used to identify the onset of severe slug flow in offshore petroleum production systems. Severe slugging is a multiphase flow regime known to be very inefficient and potentially harmful to process equipment and it is characterized by large oscillations in the production fluid pressure. Time series from pressure sensors in subsea oil wells are processed by means of Takens embedding to produce point clouds of data. Embedded sensor data is then analyzed using persistent homology to obtain topological indicators capable of revealing the occurrence of severe slugging in a condition-based monitoring approach. A large dataset of well events consisting of both real and simulated data is used to demonstrate the possibilty of authomatizing severe slugging detection from live data via topological data analysis. Methods based on persistence diagrams are shown to accurately identify severe slugging and to classify different flow regimes from pressure signals of producing wells with supervised machine learning.

Advancing Precision Medicine: Algebraic Topology and Differential Geometry in Radiology and Computational Pathology (2024)

Richard M. Levenson, Yashbir Singh, Bastian Rieck, Ashok Choudhary, Gunnar Carlsson, Deepa Sarkar, Quincy A. Hathaway, Colleen Farrelly, Jennifer Rozenblit, Prateek Prasanna, Bradley Erickson

Abstract

Precision medicine aims to provide personalized care based on individual patient characteristics, rather than guideline-directed therapies for groups of diseases or patient demographics. Images—both radiology- and pathology-derived—are a major source of information on presence, type, and status of disease. Exploring the mathematical relationship of pixels in medical imaging (“radiomics”) and cellular-scale structures in digital pathology slides (“pathomics”) offers powerful tools for extracting both qualitative and, increasingly, quantitative data. These analytical approaches, however, may be significantly enhanced by applying additional methods arising from fields of mathematics such as differential geometry and algebraic topology that remain underexplored in this context. Geometry’s strength lies in its ability to provide precise local measurements, such as curvature, that can be crucial for identifying abnormalities at multiple spatial levels. These measurements can augment the quantitative features extracted in conventional radiomics, leading to more nuanced diagnostics. By contrast, topology serves as a robust shape descriptor, capturing essential features such as connected components and holes. The field of topological data analysis was initially founded to explore the shape of data, with functional network connectivity in the brain being a prominent example. Increasingly, its tools are now being used to explore organizational patterns of physical structures in medical images and digitized pathology slides. By leveraging tools from both differential geometry and algebraic topology, researchers and clinicians may be able to obtain a more comprehensive, multi-layered understanding of medical images and contribute to precision medicine’s armamentarium

Alzheimer Disease Detection From Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning (2023)

Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D’Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini

Abstract

The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer’s disease (AD) as well as of 5 pathological controls was collected and analyzed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (\textgreater87%). Although our results are preliminary, they indicate that RS and topological analysis may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to investigate whether topological data analysis could support the characterization of AD subtypes.

Topological Data Analysis as a Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace (2018)

Mao Li, Hong An, Ruthie Angelovici, Clement Bagaza, Albert Batushansky, Lynn Clark, Viktoriya Coneva, Michael J. Donoghue, Erika Edwards, Diego Fajardo, Hui Fang, Margaret H. Frank, Timothy Gallaher, Sarah Gebken, Theresa Hill, Shelley Jansky, Baljinder Kaur, Phillip C. Klahs, Laura L. Klein, Vasu Kuraparthy, Jason Londo, Zoë Migicovsky, Allison Miller, Rebekah Mohn, Sean Myles, Wagner C. Otoni, J. C. Pires, Edmond Rieffer, Sam Schmerler, Elizabeth Spriggs, Christopher N. Topp, Allen Van Deynze, Kuang Zhang, Linglong Zhu, Braden M. Zink, Daniel H. Chitwood

Abstract

Current morphometric methods that comprehensively measure shape cannot compare the disparate leaf shapes found in seed plants and are sensitive to processing artifacts. We explore the use of persistent homology, a topological method applied as a filtration across simplicial complexes (or more simply, a method to measure topological features of spaces across different spatial resolutions), to overcome these limitations. The described method isolates subsets of shape features and measures the spatial relationship of neighboring pixel densities in a shape. We apply the method to the analysis of 182,707 leaves, both published and unpublished, representing 141 plant families collected from 75 sites throughout the world. By measuring leaves from throughout the seed plants using persistent homology, a defined morphospace comparing all leaves is demarcated. Clear differences in shape between major phylogenetic groups are detected and estimates of leaf shape diversity within plant families are made. The approach predicts plant family above chance. The application of a persistent homology method, using topological features, to measure leaf shape allows for a unified morphometric framework to measure plant form, including shapes, textures, patterns, and branching architectures.

Topological Data Analysis: A Promising Big Data Exploration Tool in Biology, Analytical Chemistry and Physical Chemistry (2016)

Marc Offroy, Ludovic Duponchel

Abstract

An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data).

Topological Data Analysis of Single-Cell Hi-C Contact Maps (2020)

Mathieu Carrière, Raúl Rabadán

Abstract

Due to recent breakthroughs in high-throughput sequencing, it is now possible to use chromosome conformation capture (CCC) to understand the three dimensional conformation of DNA at the whole genome level, and to characterize it with the so-called contact maps. This is very useful since many biological processes are correlated with DNA folding, such as DNA transcription. However, the methods for the analysis of such conformations are still lacking mathematical guarantees and statistical power. To handle this issue, we propose to use the Mapper, which is a standard tool of Topological Data Analysis (TDA) that allows one to efficiently encode the inherent continuity and topology of underlying biological processes in data, in the form of a graph with various features such as branches and loops. In this article, we show how recent statistical techniques developed in TDA for the Mapper algorithm can be extended and leveraged to formally define and statistically quantify the presence of topological structures coming from biological phenomena, such as the cell cyle, in datasets of CCC contact maps.

Persistent Homology Analysis of Osmolyte Molecular Aggregation and Their Hydrogen-Bonding Networks (2019)

Kelin Xia, D. Vijay Anand, Saxena Shikhar, Yuguang Mu

Abstract

Dramatically different properties have been observed for two types of osmolytes, i.e., trimethylamine N-oxide (TMAO) and urea, in a protein folding process. Great progress has been made in revealing the potential underlying mechanism of these two osmolyte systems. However, many problems still remain unsolved. In this paper, we propose to use the persistent homology to systematically study the osmolytes’ molecular aggregation and their hydrogen-bonding network from a global topological perspective. It has been found that, for the first time, TMAO and urea show two extremely different topological behaviors, i.e., an extensive network and local clusters, respectively. In general, TMAO forms highly consistent large loop or circle structures in high concentrations. In contrast, urea is more tightly aggregated locally. Moreover, the resulting hydrogen-bonding networks also demonstrate distinguishable features. With a concentration increase, TMAO hydrogen-bonding networks vary greatly in their total number of loop structures and large-sized loop structures consistently increase. In contrast, urea hydrogen-bonding networks remain relatively stable with slight reduction of the total loop number. Moreover, the persistent entropy (PE) is, for the first time, used in characterization of the topological information of the aggregation and hydrogen-bonding networks. The average PE systematically increases with the concentration for both TMAO and urea, and decreases in their hydrogen-bonding networks. But their PE variances have totally different behaviors. Finally, topological features of the hydrogen-bonding networks are found to be highly consistent with those from the ion aggregation systems, indicating that our topological invariants can characterize intrinsic features of the “structure making” and “structure breaking” systems.

Visualizing Emergent Identity of Assemblages in the Consumer Internet of Things: A Topological Data Analysis Approach (2016)

Thomas Novak, Donna L. Hoffman

Abstract

The identity of a consumer Internet of Things (IoT) assemblage emerges through a historical process of ongoing interactions among consumers, smart devices, and digital information. Topological Data Analysis (TDA), consistent with mathematical aspects of assemblage theory, is used to visualize the underlying possibility space from which individual IoT assemblages emerge.

Advancing Precision Medicine: Algebraic Topology and Differential Geometry in Radiology and Computational Pathology (2024)

Richard M. Levenson, Yashbir Singh, Bastian Rieck, Ashok Choudhary, Gunnar Carlsson, Deepa Sarkar, Quincy A. Hathaway, Colleen Farrelly, Jennifer Rozenblit, Prateek Prasanna, Bradley Erickson

Abstract

Precision medicine aims to provide personalized care based on individual patient characteristics, rather than guideline-directed therapies for groups of diseases or patient demographics. Images—both radiology- and pathology-derived—are a major source of information on presence, type, and status of disease. Exploring the mathematical relationship of pixels in medical imaging (“radiomics”) and cellular-scale structures in digital pathology slides (“pathomics”) offers powerful tools for extracting both qualitative and, increasingly, quantitative data. These analytical approaches, however, may be significantly enhanced by applying additional methods arising from fields of mathematics such as differential geometry and algebraic topology that remain underexplored in this context. Geometry’s strength lies in its ability to provide precise local measurements, such as curvature, that can be crucial for identifying abnormalities at multiple spatial levels. These measurements can augment the quantitative features extracted in conventional radiomics, leading to more nuanced diagnostics. By contrast, topology serves as a robust shape descriptor, capturing essential features such as connected components and holes. The field of topological data analysis was initially founded to explore the shape of data, with functional network connectivity in the brain being a prominent example. Increasingly, its tools are now being used to explore organizational patterns of physical structures in medical images and digitized pathology slides. By leveraging tools from both differential geometry and algebraic topology, researchers and clinicians may be able to obtain a more comprehensive, multi-layered understanding of medical images and contribute to precision medicine’s armamentarium

A Topological Machine Learning Pipeline for Classification (2022)

Francesco Conti, Davide Moroni, Maria Antonietta Pascali

Abstract

In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.

Object-Oriented Persistent Homology (2016)

Bao Wang, Guo-Wei Wei

Abstract

Persistent homology provides a new approach for the topological simplification of big data via measuring the life time of intrinsic topological features in a filtration process and has found its success in scientific and engineering applications. However, such a success is essentially limited to qualitative data classification and analysis. Indeed, persistent homology has rarely been employed for quantitative modeling and prediction. Additionally, the present persistent homology is a passive tool, rather than a proactive technique, for classification and analysis. In this work, we outline a general protocol to construct object-oriented persistent homology methods. By means of differential geometry theory of surfaces, we construct an objective functional, namely, a surface free energy defined on the data of interest. The minimization of the objective functional leads to a Laplace-Beltrami operator which generates a multiscale representation of the initial data and offers an objective oriented filtration process. The resulting differential geometry based object-oriented persistent homology is able to preserve desirable geometric features in the evolutionary filtration and enhances the corresponding topological persistence. The cubical complex based homology algorithm is employed in the present work to be compatible with the Cartesian representation of the Laplace-Beltrami flow. The proposed Laplace-Beltrami flow based persistent homology method is extensively validated. The consistence between Laplace-Beltrami flow based filtration and Euclidean distance based filtration is confirmed on the Vietoris-Rips complex for a large amount of numerical tests. The convergence and reliability of the present Laplace-Beltrami flow based cubical complex filtration approach are analyzed over various spatial and temporal mesh sizes. The Laplace-Beltrami flow based persistent homology approach is utilized to study the intrinsic topology of proteins and fullerene molecules. Based on a quantitative model which correlates the topological persistence of fullerene central cavity with the total curvature energy of the fullerene structure, the proposed method is used for the prediction of fullerene isomer stability. The efficiency and robustness of the present method are verified by more than 500 fullerene molecules. It is shown that the proposed persistent homology based quantitative model offers good predictions of total curvature energies for ten types of fullerene isomers. The present work offers the first example to design object-oriented persistent homology to enhance or preserve desirable features in the original data during the filtration process and then automatically detect or extract the corresponding topological traits from the data.

Clique Topology Reveals Intrinsic Geometric Structure in Neural Correlations (2015)

Chad Giusti, Eva Pastalkova, Carina Curto, Vladimir Itskov

Abstract

Detecting structure in neural activity is critical for understanding the function of neural circuits. The coding properties of neurons are typically investigated by correlating their responses to external stimuli. It is not clear, however, if the structure of neural activity can be inferred intrinsically, without a priori knowledge of the relevant stimuli. We introduce a novel method, called clique topology, that detects intrinsic structure in neural activity that is invariant under nonlinear monotone transformations. Using pairwise correlations of neurons in the hippocampus, we demonstrate that our method is capable of detecting geometric structure from neural activity alone, without appealing to external stimuli or receptive fields.Detecting meaningful structure in neural activity and connectivity data is challenging in the presence of hidden nonlinearities, where traditional eigenvalue-based methods may be misleading. We introduce a novel approach to matrix analysis, called clique topology, that extracts features of the data invariant under nonlinear monotone transformations. These features can be used to detect both random and geometric structure, and depend only on the relative ordering of matrix entries. We then analyzed the activity of pyramidal neurons in rat hippocampus, recorded while the animal was exploring a 2D environment, and confirmed that our method is able to detect geometric organization using only the intrinsic pattern of neural correlations. Remarkably, we found similar results during nonspatial behaviors such as wheel running and rapid eye movement (REM) sleep. This suggests that the geometric structure of correlations is shaped by the underlying hippocampal circuits and is not merely a consequence of position coding. We propose that clique topology is a powerful new tool for matrix analysis in biological settings, where the relationship of observed quantities to more meaningful variables is often nonlinear and unknown.

Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology (2021)

David Pérez Fernández, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas

Abstract

Artificial Neural Networks (ANNs) are widely used for approximating complex functions. The process that is usually followed to define the most appropriate architecture for an ANN given a specific function is mostly empirical. Once this architecture has been defined, weights are usually optimized according to the error function. On the other hand, we observe that ANNs can be represented as graphs and their topological 'fingerprints' can be obtained using Persistent Homology (PH). In this paper, we describe a proposal focused on designing more principled architecture search procedures. To do this, different architectures for solving problems related to a heterogeneous set of datasets have been analyzed. The results of the evaluation corroborate that PH effectively characterizes the ANN invariants: when ANN density (layers and neurons) or sample feeding order is the only difference, PH topological invariants appear; in the opposite direction in different sub-problems (i.e. different labels), PH varies. This approach based on topological analysis helps towards the goal of designing more principled architecture search procedures and having a better understanding of ANNs.

Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials (2020)

Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov

Abstract

Machine learning has emerged as an attractive alternative to experiments and simulations for predicting material properties. Usually, such an approach relies on specific domain knowledge for feature design: each learning target requires careful selection of features that an expert recognizes as important for the specific task. The major drawback of this approach is that computation of only a few structural features has been implemented so far, and it is difficult to tell a priori which features are important for a particular application. The latter problem has been empirically observed for predictors of guest uptake in nanoporous materials: local and global porosity features become dominant descriptors at low and high pressures, respectively. We investigate a feature representation of materials using tools from topological data analysis. Specifically, we use persistent homology to describe the geometry of nanoporous materials at various scales. We combine our topological descriptor with traditional structural features and investigate the relative importance of each to the prediction tasks. We demonstrate an application of this feature representation by predicting methane adsorption in zeolites, for pressures in the range of 1-200 bar. Our results not only show a considerable improvement compared to the baseline, but they also highlight that topological features capture information complementary to the structural features: this is especially important for the adsorption at low pressure, a task particularly difficult for the traditional features. Furthermore, by investigation of the importance of individual topological features in the adsorption model, we are able to pinpoint the location of the pores that correlate best to adsorption at different pressure, contributing to our atom-level understanding of structure-property relationships.

Topological Persistence Machine of Phase Transitions (2020)

Quoc Hoan Tran, Mark Chen, Yoshihiko Hasegawa

Abstract

The study of phase transitions from experimental data becomes challenging, especially when little prior knowledge of the system is available. Topological data analysis is an emerging framework for characterizing the shape of data and has recently achieved success in detecting structural transitions in material science such as glass-liquid transition. However, data obtained from physical states may not have explicit shapes as structural materials. We propose a general framework called topological persistence machine to construct the shape of data from correlations in states; hence decipher phase transitions via the qualitative changes of the shape. Our framework enables an effective and unified approach in phase transition analysis. We demonstrate the impact in highly precise detection of Berezinskii-Kosterlitz-Thouless phase transitions in the classical XY model, and quantum phase transition in the transverse Ising model and Bose-Hubbard model. Intriguingly, these phase transitions have proven to be notoriously difficult in traditional methods but can be characterized in our framework without requiring prior knowledge about phases. Our approach is thus expected applicable and brings a remarkable perspective for exploring phases of experimental physical systems.

Chatter Diagnosis in Milling Using Supervised Learning and Topological Features Vector (2019)

Melih C. Yesilli, Sarah Tymochko, Firas A. Khasawneh, Elizabeth Munch

Abstract

Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning algorithms to diagnose chatter in the milling process. Persistence diagrams, a method of representing topological features, are not easily used in the context of machine learning, so they must be transformed into a form that is more amenable. Specifically, we will focus on two different methods for featurizing persistence diagrams, Carlsson coordinates and template functions. In this paper, we provide classification results for simulated data from various cutting configurations, including upmilling and downmilling, in addition to the same data with some added noise. Our results show that Carlsson Coordinates and Template Functions yield accuracies as high as 96% and 95%, respectively. We also provide evidence that these topological methods are noise robust descriptors for chatter detection.

Can Neural Networks Learn Persistent Homology Features? (2020)

Guido Montúfar, Nina Otter, Yuguang Wang

Abstract

Topological data analysis uses tools from topology -- the mathematical area that studies shapes -- to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.

Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

Mehmet Emin Aktas, Esra Akbas

Abstract

Due to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach.

A Persistent Weisfeiler-Lehman Procedure for Graph Classification (2019)

Bastian Rieck, Christian Bock, Karsten Borgwardt

Abstract

The Weisfeiler–Lehman graph kernel exhibits competitive performance in many graph classification tasks. However, its subtree features are not able to capture connected components and cycles, topological features known for characterising graphs. To extract such features, we leverage propagated node label information and transform unweighted graphs into metric ones. This permits us to augment the subtree features with topological information obtained using persistent homology, a concept from topological data analysis. Our method, which we formalise as a generalisation of Weisfeiler–Lehman subtree features, exhibits favourable classification accuracy and its improvements in predictive performance are mainly driven by including cycle information.

Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution (2020)

Michael Lesnick, Raúl Rabadán, Daniel I. S. Rosenbloom

Abstract

A topological approach to the study of genetic recombination, based on persistent homology, was introduced by Chan, Carlsson, and Rabadán in 2013. This associates a sequence of signatures called barcodes to genomic data sampled from an evolutionary history. In this paper, we develop theoretical foundations for this approach. First, we present a novel formulation of the underlying inference problem. Specifically, we introduce and study the novelty profile, a simple, stable statistic of an evolutionary history which not only counts recombination events but also quantifies how recombination creates genetic diversity. We propose that the (hitherto implicit) goal of the topological approach to recombination is the estimation of novelty profiles. We then study the problem of obtaining a lower bound on the novelty profile using barcodes. We focus on a low-recombination regime, where the evolutionary history can be described by a directed acyclic graph called a galled tree, which differs from a tree only by isolated topological defects. We show that in this regime, under a complete sampling assumption, the \$1\textasciicircum\mathrm\st\\$ barcode yields a lower bound on the novelty profile, and hence on the number of recombination events. For \$i\textgreater1\$, the \$i\textasciicircum\\mathrm\th\\\$ barcode is empty. In addition, we use a stability principle to strengthen these results to ones which hold for any subsample of an arbitrary evolutionary history. To establish these results, we describe the topology of the Vietoris--Rips filtrations arising from evolutionary histories indexed by galled trees. As a step towards a probabilistic theory, we also show that for a random history indexed by a fixed galled tree and satisfying biologically reasonable conditions, the intervals of the \$1\textasciicircum\\mathrm\st\\\$ barcode are independent random variables. Using simulations, we explore the sensitivity of these intervals to recombination.

Morse Theory and Persistent Homology for Topological Analysis of 3D Images of Complex Materials (2014)

O. Delgado-Friedrichs, V. Robins, A. Sheppard

Abstract

We develop topologically accurate and compatible definitions for the skeleton and watershed segmentation of a 3D digital object that are computed by a single algorithm. These definitions are based on a discrete gradient vector field derived from a signed distance transform. This gradient vector field is amenable to topological analysis and simplification via For-man's discrete Morse theory and provides a filtration that can be used as input to persistent homology algorithms. Efficient implementations allow us to process large-scale x-ray micro-CT data of rock cores and other materials.

Topological Data Analysis of Spatial Patterning in Heterogeneous Cell Populations: Clustering and Sorting With Varying Cell-Cell Adhesion (2023)

Dhananjay Bhaskar, William Y. Zhang, Alexandria Volkening, Björn Sandstede, Ian Y. Wong

Abstract

Different cell types aggregate and sort into hierarchical architectures during the formation of animal tissues. The resulting spatial organization depends (in part) on the strength of adhesion of one cell type to itself relative to other cell types. However, automated and unsupervised classification of these multicellular spatial patterns remains challenging, particularly given their structural diversity and biological variability. Recent developments based on topological data analysis are intriguing to reveal similarities in tissue architecture, but these methods remain computationally expensive. In this article, we show that multicellular patterns organized from two interacting cell types can be efficiently represented through persistence images. Our optimized combination of dimensionality reduction via autoencoders, combined with hierarchical clustering, achieved high classification accuracy for simulations with constant cell numbers. We further demonstrate that persistence images can be normalized to improve classification for simulations with varying cell numbers due to proliferation. Finally, we systematically consider the importance of incorporating different topological features as well as information about each cell type to improve classification accuracy. We envision that topological machine learning based on persistence images will enable versatile and robust classification of complex tissue architectures that occur in development and disease.

Differentiable Euler Characteristic Transforms for Shape Classification (2023)

Ernst Röell, Bastian Rieck

Abstract

The _Euler Characteristic Transform_ (ECT) is a powerful invariant, combining geometrical and topological characteristics of shapes and graphs. However, the ECT was hitherto unable to learn task-specific representations. We overcome this issue and develop a novel computational layer that enables learning the ECT in an end-to-end fashion. Our method, the _Differentiable Euler Characteristic Transform_ (DECT) is fast and computationally efficient, while exhibiting performance on a par with more complex models in both graph and point cloud classification tasks. Moreover, we show that this seemingly simple statistic provides the same topological expressivity as more complex topological deep learning layers.

Topology-Preserving Terrain Simplification (2020)

Ulderico Fugacci, Michael Kerber, Hugo Manet

Abstract

We give necessary and sufficient criteria for elementary operations in a two-dimensional terrain to preserve the persistent homology induced by the height function. These operations are edge flips and removals of interior vertices, re-triangulating the link of the removed vertex. This problem is motivated by topological terrain simplification, which means removing as many critical vertices of a terrain as possible while maintaining geometric closeness to the original surface. Existing methods manage to reduce the maximal possible number of critical vertices, but increase thereby the number of regular vertices. Our method can be used to post-process a simplified terrain, drastically reducing its size and preserving its favorable properties.

Community Resources

Code

Induction Motor Eccentricity Fault Detection and Quantification Using Topological Data Analysis (2024)

Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru

Abstract

In this paper, we propose a topological data analysis (TDA) method for the processing of induction motor stator current data, and apply it to the detection and quantification of eccentricity faults. Traditionally, physics-based models and involved signal processing techniques are required to identify and extract the subtle frequency components in current data related to a particular fault. We show that TDA offers an alternative way to extract fault related features, and effectively distinguish data from different fault conditions. We will introduce TDA method and the procedure of extracting topological features from time-domain data, and apply it to induction motor current data measured under different eccentricity fault conditions. We show that while the raw time-domain data are very challenging to distinguish, the extracted topological features from these data are distinct and highly associated with eccentricity fault level. With TDA processed data, we can effectively train machine learning models to predict fault levels with good accuracy, even for new data from eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of data-driven fault detection applications.

Persistence-Based Pooling for Shape Pose Recognition (2016)

Thomas Bonis, Maks Ovsjanikov, Steve Oudot, Frédéric Chazal

Abstract

In this paper, we propose a novel pooling approach for shape classification and recognition using the bag-of-words pipeline, based on topological persistence, a recent tool from Topological Data Analysis. Our technique extends the standard max-pooling, which summarizes the distribution of a visual feature with a single number, thereby losing any notion of spatiality. Instead, we propose to use topological persistence, and the derived persistence diagrams, to provide significantly more informative and spatially sensitive characterizations of the feature functions, which can lead to better recognition performance. Unfortunately, despite their conceptual appeal, persistence diagrams are difficult to handle, since they are not naturally represented as vectors in Euclidean space and even the standard metric, the bottleneck distance is not easy to compute. Furthermore, classical distances between diagrams, such as the bottleneck and Wasserstein distances, do not allow to build positive definite kernels that can be used for learning. To handle this issue, we provide a novel way to transform persistence diagrams into vectors, in which comparisons are trivial. Finally, we demonstrate the performance of our construction on the Non-Rigid 3D Human Models SHREC 2014 dataset, where we show that topological pooling can provide significant improvements over the standard pooling methods for the shape pose recognition within the bag-of-words pipeline.

Topological Machine Learning With Persistence Indicator Functions (2019)

Bastian Rieck, Filip Sadlo, Heike Leitte

Abstract

Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distance-based data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernel-based similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data.

Stable Topological Summaries for Analyzing the Organization of Cells in a Packed Tissue (2021)

Nieves Atienza, Maria-Jose Jimenez, Manuel Soriano-Trigueros

Abstract

We use topological data analysis tools for studying the inner organization of cells in segmented images of epithelial tissues. More specifically, for each segmented image, we compute different persistence barcodes, which codify the lifetime of homology classes (persistent homology) along different filtrations (increasing nested sequences of simplicial complexes) that are built from the regions representing the cells in the tissue. We use a complete and well-grounded set of numerical variables over those persistence barcodes, also known as topological summaries. A novel combination of normalization methods for both the set of input segmented images and the produced barcodes allows for the proven stability results for those variables with respect to small changes in the input, as well as invariance to image scale. Our study provides new insights to this problem, such as a possible novel indicator for the development of the drosophila wing disc tissue or the importance of centroids’ distribution to differentiate some tissues from their CVT-path counterpart (a mathematical model of epithelia based on Voronoi diagrams). We also show how the use of topological summaries may improve the classification accuracy of epithelial images using a Random Forest algorithm.

On the Local Behavior of Spaces of Natural Images (2008)

Gunnar Carlsson, Tigran Ishkhanov, Vin de Silva, Afra Zomorodian

Abstract

In this study we concentrate on qualitative topological analysis of the local behavior of the space of natural images. To this end, we use a space of 3 by 3 high-contrast patches ℳ. We develop a theoretical model for the high-density 2-dimensional submanifold of ℳ showing that it has the topology of the Klein bottle. Using our topological software package PLEX we experimentally verify our theoretical conclusions. We use polynomial representation to give coordinatization to various subspaces of ℳ. We find the best-fitting embedding of the Klein bottle into the ambient space of ℳ. Our results are currently being used in developing a compression algorithm based on a Klein bottle dictionary.

Topological Graph Neural Networks (2021)

Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt

Abstract

Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data.

Topological Pattern Recognition for Point Cloud Data* (2014)

Gunnar Carlsson

Abstract

In this paper we discuss the adaptation of the methods of homology from algebraic topology to the problem of pattern recognition in point cloud data sets. The method is referred to as persistent homology, and has numerous applications to scientific problems. We discuss the definition and computation of homology in the standard setting of simplicial complexes and topological spaces, then show how one can obtain useful signatures, called barcodes, from finite metric spaces, thought of as sampled from a continuous object. We present several different cases where persistent homology is used, to illustrate the different ways in which the method can be applied.

Identification of Relevant Genetic Alterations in Cancer Using Topological Data Analysis (2020)

Raúl Rabadán, Yamina Mohamedi, Udi Rubin, Tim Chu, Adam N. Alghalith, Oliver Elliott, Luis Arnés, Santiago Cal, Álvaro J. Obaya, Arnold J. Levine, Pablo G. Cámara

Abstract

Large-scale cancer genomic studies enable the systematic identification of mutations that lead to the genesis and progression of tumors, uncovering the underlying molecular mechanisms and potential therapies. While some such mutations are recurrently found in many tumors, many others exist solely within a few samples, precluding detection by conventional recurrence-based statistical approaches. Integrated analysis of somatic mutations and RNA expression data across 12 tumor types reveals that mutations of cancer genes are usually accompanied by substantial changes in expression. We use topological data analysis to leverage this observation and uncover 38 elusive candidate cancer-associated genes, including inactivating mutations of the metalloproteinase ADAMTS12 in lung adenocarcinoma. We show that ADAMTS12−/− mice have a five-fold increase in the susceptibility to develop lung tumors, confirming the role of ADAMTS12 as a tumor suppressor gene. Our results demonstrate that data integration through topological techniques can increase our ability to identify previously unreported cancer-related alterations., Rare cancer mutations are often missed using recurrence-based statistical approaches, but are usually accompanied by changes in expression. Here the authors leverage this information to uncover several elusive candidate cancer-associated genes using topological data analysis.

A Machine-Learning-Based Early Warning System Boosted by Topological Data Analysis (2019)

Devraj Basu, Tieqiang Li

Abstract

We propose a novel early warning system for detecting financial market crashes that utilizes the information extracted from the shape of financial market movement. Our system incorporates Topological Data Analysis (TDA), a new set of data analytics techniques specialised in profiling the shape of data, into a more traditional machine learning framework. Incorporating TDA leads to substantial improvements in timely detecting the onset of a sharp market decline. Our framework is both able to generate new features and also unlock more value from existing factors. Our results illustrate the importance of understanding the shape of financial market data and suggest that incorporating TDA into a machine learning framework could be beneficial in a number of financial market settings.

Spatial Embedding Imposes Constraints on Neuronal Network Architectures (2018)

Jennifer Stiso, Danielle S. Bassett

Abstract

Recent progress towards understanding circuit function has capitalized on tools from network science to parsimoniously describe the spatiotemporal architecture of neural systems. Such tools often address systems topology divorced from its physical instantiation. Nevertheless, for embedded systems such as the brain, physical laws directly constrain the processes of network growth, development, and function. We review here the rules imposed by the space and volume of the brain on the development of neuronal networks, and show that these rules give rise to a specific set of complex topologies. These rules also affect the repertoire of neural dynamics that can emerge from the system, and thereby inform our understanding of network dysfunction in disease. We close by discussing new tools and models to delineate the effects of spatial embedding.

A Stable Multi-Scale Kernel for Topological Machine Learning (2015)

Jan Reininghaus, Stefan Huber, Ulrich Bauer, Roland Kwitt

Abstract

Topological data analysis offers a rich source of valuable information to study vision problems. Yet, so far we lack a theoretically sound connection to popular kernel-based learning techniques, such as kernel SVMs or kernel PCA. In this work, we establish such a connection by designing a multi-scale kernel for persistence diagrams, a stable summary representation of topological features in data. We show that this kernel is positive definite and prove its stability with respect to the 1-Wasserstein distance. Experiments on two benchmark datasets for 3D shape classification/retrieval and texture recognition show considerable performance gains of the proposed method compared to an alternative approach that is based on the recently introduced persistence landscapes.

Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis (2015)

Jose A. Perea, John Harer

Abstract

We develop in this paper a theoretical framework for the topological study of time series data. Broadly speaking, we describe geometrical and topological properties of sliding window embeddings, as seen through the lens of persistent homology. In particular, we show that maximum persistence at the point-cloud level can be used to quantify periodicity at the signal level, prove structural and convergence theorems for the resulting persistence diagrams, and derive estimates for their dependency on window size and embedding dimension. We apply this methodology to quantifying periodicity in synthetic data sets and compare the results with those obtained using state-of-the-art methods in gene expression analysis. We call this new method SW1PerS, which stands for Sliding Windows and 1-Dimensional Persistence Scoring.

Characterizing Fluid Dynamical Systems Using Euler Characteristic Surface and Euler Metric (2023)

A. Roy, R. A. I. Haque, A. J. Mitra, S. Tarafdar, T. Dutta

Abstract

Euler characteristic ( χ ), a topological invariant, helps to understand the topology of a network or complex. We demonstrate that the multi-scale topological information of dynamically evolving fluid flow systems can be crystallized into their Euler characteristic surfaces χ s ( r , t ). Furthermore, we demonstrate the Euler Metric (EM), introduced by the authors, can be utilized to identify the stability regime of a given flow pattern, besides distinguishing between different flow systems. The potential of the Euler characteristic surface and the Euler metric have been demonstrated first on analyzing a simulated deterministic dynamical system before being applied to analyze experimental flow patterns that develop in micrometer sized drying droplets.

Euler Characteristic Surfaces: A Stable Multiscale Topological Summary of Time Series Data (2024)

Anamika Roy, Atish J. Mitra, Tapati Dutta

Abstract

We present Euler Characteristic Surfaces as a multiscale spatiotemporal topological summary of time series data encapsulating the topology of the system at different time instants and length scales. Euler Characteristic Surfaces with an appropriate metric is used to quantify stability and locate critical changes in a dynamical system with respect to variations in a parameter, while being substantially computationally cheaper than available alternate methods such as persistent homology. The stability of the construction is demonstrated by a quantitative comparison bound with persistent homology, and a quantitative stability bound under small changes in time is established. The proposed construction is used to analyze two different kinds of simulated disordered flow situations.

PI-Net: A Deep Learning Approach to Extract Topological Persistence Images (2020)

Anirudh Som, Hongjun Choi, Karthikeyan Natesan Ramamurthy, Matthew Buman, Pavan Turaga

Abstract

Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in real-world data, such as view-point, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difﬁculty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PINet respectively. To the best of our knowledge, we are the ﬁrst to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PI-Net architectures on two applications: human activity recognition using tri-axial accelerometer sensor data and image classiﬁcation. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.

Topological Data Analysis of Collective and Individual Epithelial Cells Using Persistent Homology of Loops (2021)

Dhananjay Bhaskar, William Y. Zhang, Ian Y. Wong

Abstract

Interacting, self-propelled particles such as epithelial cells can dynamically self-organize into complex multicellular patterns, which are challenging to classify without a priori information. Classically, different phases and phase transitions have been described based on local ordering, which may not capture structural features at larger length scales. Instead, topological data analysis (TDA) determines the stability of spatial connectivity at varying length scales (i.e. persistent homology), and can compare different particle configurations based on the “cost” of reorganizing one configuration into another. Here, we demonstrate a topology-based machine learning approach for unsupervised profiling of individual and collective phases based on large-scale loops. We show that these topological loops (i.e. dimension 1 homology) are robust to variations in particle number and density, particularly in comparison to connected components (i.e. dimension 0 homology). We use TDA to map out phase diagrams for simulated particles with varying adhesion and propulsion, at constant population size as well as when proliferation is permitted. Next, we use this approach to profile our recent experiments on the clustering of epithelial cells in varying growth factor conditions, which are compared to our simulations. Finally, we characterize the robustness of this approach at varying length scales, with sparse sampling, and over time. Overall, we envision TDA will be broadly applicable as a model-agnostic approach to analyze active systems with varying population size, from cytoskeletal motors to motile cells to flocking or swarming animals.

Hypothesis Testing for Shapes Using Vectorized Persistence Diagrams (2020)

Chul Moon, Nicole A. Lazar

Abstract

Topological data analysis involves the statistical characterization of the shape of data. Persistent homology is a primary tool of topological data analysis, which can be used to analyze those topological features and perform statistical inference. In this paper, we present a two-stage hypothesis test for vectorized persistence diagrams. The first stage filters elements in the vectorized persistence diagrams to reduce false positives. The second stage consists of multiple hypothesis tests, with false positives controlled by false discovery rates. We demonstrate applications of the proposed procedure on simulated point clouds and three-dimensional rock image data. Our results show that the proposed hypothesis tests can provide flexible and informative inferences on the shape of data with lower computational cost compared to the permutation test.

Topological Data Analysis of Financial Time Series: Landscapes of Crashes (2017)

Marian Gidea, Yuri Katz

Abstract

We explore the evolution of daily returns of four major US stock market indices during the technology crash of 2000, and the financial crisis of 2007-2009. Our methodology is based on topological data analysis (TDA). We use persistence homology to detect and quantify topological patterns that appear in multidimensional time series. Using a sliding window, we extract time-dependent point cloud data sets, to which we associate a topological space. We detect transient loops that appear in this space, and we measure their persistence. This is encoded in real-valued functions referred to as a 'persistence landscapes'. We quantify the temporal changes in persistence landscapes via their \$L\textasciicircump\$-norms. We test this procedure on multidimensional time series generated by various non-linear and non-equilibrium models. We find that, in the vicinity of financial meltdowns, the \$L\textasciicircump\$-norms exhibit strong growth prior to the primary peak, which ascends during a crash. Remarkably, the average spectral density at low frequencies of the time series of \$L\textasciicircump\$-norms of the persistence landscapes demonstrates a strong rising trend for 250 trading days prior to either dotcom crash on 03/10/2000, or to the Lehman bankruptcy on 09/15/2008. Our study suggests that TDA provides a new type of econometric analysis, which goes beyond the standard statistical measures. The method can be used to detect early warning signals of imminent market crashes. We believe that this approach can be used beyond the analysis of financial time series presented here.

Prediction in Cancer Genomics Using Topological Signatures and Machine Learning (2020)

Georgina Gonzalez, Arina Ushakova, Radmila Sazdanovic, Javier Arsuaga

Abstract

Copy Number Aberrations, gains and losses of genomic regions, are a hallmark of cancer and can be experimentally detected using microarray comparative genomic hybridization (aCGH). In previous works, we developed a topology based method to analyze aCGH data whose output are regions of the genome where copy number is altered in patients with a predetermined cancer phenotype. We call this method Topological Analysis of array CGH (TAaCGH). Here we combine TAaCGH with machine learning techniques to build classifiers using copy number aberrations. We chose logistic regression on two different binary phenotypes related to breast cancer to illustrate this approach. The first case consists of patients with over-expression of the ERBB2 gene. Over-expression of ERBB2 is commonly regulated by a copy number gain in chromosome arm 17q. TAaCGH found the region 17q11-q22 associated with the phenotype and using logistic regression we reduced this region to 17q12-q21.31 correctly classifying 78% of the ERBB2 positive individuals (sensitivity) in a validation data set. We also analyzed over-expression in Estrogen Receptor (ER), a second phenotype commonly observed in breast cancer patients and found that the region 5p14.3-12 together with six full arms were associated with the phenotype. Our method identified 4p, 6p and 16q as the strongest predictors correctly classifying 76% of ER positives in our validation data set. However, for this set there was a significant increase in the false positive rate (specificity). We suggest that topological and machine learning methods can be combined for prediction of phenotypes using genetic data.

Understanding Flow Features in Drying Droplets via Euler Characteristic Surfaces—A Topological Tool (2020)

A. Roy, R. A. I. Haque, A. J. Mitra, M. Dutta Choudhury, S. Tarafdar, T. Dutta

Abstract

In this paper, we propose a mathematical picture of flow in a drying multiphase droplet. The system studied consists of a suspension of microscopic polystyrene beads in water. The time development of the drying process is described by defining the “Euler characteristic surface,” which provides a multiscale topological map of this dynamical system. A novel method is adopted to analyze the images extracted from experimental video sequences. Experimental image data are converted to binary data through appropriate Gaussian filters and optimal thresholding and analyzed using the Euler characteristic determined on a hexagonal lattice. In order to do a multiscale analysis of the extracted image, we introduce the concept of Euler characteristic at a specific scale r > 0. This multiscale time evolution of the connectivity information on aggregates of polysterene beads in water is summarized in a Euler characteristic surface and, subsequently, in a Euler characteristic level curve plot. We introduce a metric between Euler characteristic surfaces as a possible similarity measure between two flow situations. The constructions proposed by us are used to interpret flow patterns (and their stability) generated on the upper surface of the drying droplet interface. The philosophy behind the topological tools developed in this work is to produce low-dimensional signatures of dynamical systems, which may be used to efficiently summarize and distinguish topological information in various types of flow situations.

Generalized Penalty for Circular Coordinate Representation (2020)

Hengrui Luo, Alice Patania, Jisu Kim, Mikael Vejdemo-Johansson

Abstract

Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account sparsity in high-dimensional applications. We use a generalized penalty function instead of an \$L_\2\\$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analysis to support our claim that circular coordinates with generalized penalty will accommodate the sparsity in high-dimensional datasets under different sampling schemes while preserving the topological structures.

The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)

Salvador Chulián, Bernadette J. Stolz, Álvaro Martínez-Rubio, Cristina Blázquez Goñi, Juan F. Rodríguez Gutiérrez, Teresa Caballero Velázquez, Águeda Molinos Quintana, Manuel Ramírez Orellana, Ana Castillo Robleda, José Luis Fuster Soler, Alfredo Minguela Puras, María Victoria Martínez Sánchez, María Rosa, Víctor M. Pérez-García, Helen Byrne

Abstract

Acute Lymphoblastic Leukaemia (ALL) is the most frequent paediatric cancer. Modern therapies have improved survival rates, but approximately 15-20 % of patients relapse. At present, patients’ risk of relapse are assessed by projecting high-dimensional flow cytometry data onto a subset of biomarkers and manually estimating the shape of this reduced data. Here, we apply methods from topological data analysis (TDA), which quantify shape in data via features such as connected components and loops, to pre-treatment ALL datasets with known outcomes. We combine these fully unsupervised analyses with machine learning to identify features in the pre-treatment data that are prognostic for risk of relapse. We find significant topological differences between relapsing and non-relapsing patients and confirm the predictive power of CD10, CD20, CD38, and CD45. Further, we are able to use the TDA descriptors to predict patients who relapsed. We propose three prognostic pipelines that readily extend to other haematological malignancies. Teaser Topology reveals features in flow cytometry data which predict relapse of patients with acute lymphoblastic leukemia

Weighted-Persistent-Homology-Based Machine Learning for RNA Flexibility Analysis (2020)

Chi Seng Pun, Brandon Yung Sin Yong, Kelin Xia

Abstract

With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.

The Persistence of Large Scale Structures I: Primordial Non-Gaussianity (2020)

Matteo Biagetti, Alex Cole, Gary Shiu

Abstract

We develop an analysis pipeline for characterizing the topology of large scale structure and extracting cosmological constraints based on persistent homology. Persistent homology is a technique from topological data analysis that quantifies the multiscale topology of a data set, in our context unifying the contributions of clusters, filament loops, and cosmic voids to cosmological constraints. We describe how this method captures the imprint of primordial local non-Gaussianity on the late-time distribution of dark matter halos, using a set of N-body simulations as a proxy for real data analysis. For our best single statistic, running the pipeline on several cubic volumes of size \$40~(\rm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes. Additionally we test our ability to resolve degeneracies between the topological signature of \$f_\\rm NL\\textasciicircum\\rm loc\\$ and variation of \$\sigma_8\$ and argue that correctly identifying nonzero \$f_\\rm NL\\textasciicircum\\rm loc\\$ in this case is possible via an optimal template method. Our method relies on information living at \$\mathcal\O\(10)\$ Mpc/h, a complementary scale with respect to commonly used methods such as the scale-dependent bias in the halo/galaxy power spectrum. Therefore, while still requiring a large volume, our method does not require sampling long-wavelength modes to constrain primordial non-Gaussianity. Moreover, our statistics are interpretable: we are able to reproduce previous results in certain limits and we make new predictions for unexplored observables, such as filament loops formed by dark matter halos in a simulation box.

Topology Highlights Mesoscopic Functional Equivalence Between Imagery and Perception: The Case of Hypnotizability (2019)

Esther Ibáñez-Marcelo, Lisa Campioni, Angkoon Phinyomark, Giovanni Petri, Enrica L. Santarcangelo

Abstract

The functional equivalence (FE) between imagery and perception or motion has been proposed on the basis of neuroimaging evidence of large spatially overlapping activations between real and imagined sensori-motor conditions. However, similar local activation patterns do not imply the same mesoscopic integration of brain regions, which can be described by tools from Topological Data Analysis (TDA). On the basis of behavioral findings, stronger FE has been hypothesized in the individuals with high scores of hypnotizability scores (highs) with respect to low hypnotizable participants (lows) who differ between each other in the proneness to modify memory, perception and behavior according to specific imaginative suggestions. Here we present the first EEG evidence of stronger FE in highs. In fact, persistent homology shows that the highs EEG topological asset during real and imagined sensory conditions is significantly more similar than the lows. As a corollary finding, persistent homology shows lower restructuring of the EEG asset in highs than in lows during both sensory and imagery tasks with respect to basal conditions. Present findings support the view that greater embodiment of mental images may be responsible for the highs greater proneness to respond to sensori-motor suggestions and to report involuntariness in action. In addition, findings indicate hypnotizability-related sensory and cognitive information processing and suggest that the psycho-physiological trait of hypnotizability may modulate more than one aspect of the everyday life.

Crystallographic Interacting Topological Phases and Equvariant Cohomology: To Assume or Not to Assume (2020)

Daniel Sheinbaum, Omar Antolín Camarena

Abstract

For symmorphic crystalline interacting gapped systems we derive a classification under adiabatic evolution. This classification is complete for non-degenerate ground states. For the degenerate case we discuss some invariants given by equivariant characteristic classes. We do not assume an emergent relativistic field theory nor that phases form a topological spectrum. We also do not assume short-range entanglement nor the existence of quasi-particles as is done in SPT and SET classifications respectively. Using a slightly generalized Bloch decomposition and Grassmanians made out of ground state spaces, we show that the \$P\$-equivariant cohomology of a \$d\$-dimensional torus gives rise to different interacting phases. We compare our results to bosonic symmorphic crystallographic SPT phases and to non-interacting fermionic crystallographic phases in class A. Finally we discuss the relation of our assumptions to those made for crystallographic SPT and SET phases.

A Topology-Based Object Representation for Clasping, Latching and Hooking (2013)

J. A. Stork, F. T. Pokorny, D. Kragic

Abstract

We present a loop-based topological object representation for objects with holes. The representation is used to model object parts suitable for grasping, e.g. handles, and it incorporates local volume information about these. Furthermore, we present a grasp synthesis framework that utilizes this representation for synthesizing caging grasps that are robust under measurement noise. The approach is complementary to a local contact-based force-closure analysis as it depends on global topological features of the object. We perform an extensive evaluation with four robotic hands on synthetic data. Additionally, we provide real world experiments using a Kinect sensor on two robotic platforms: a Schunk dexterous hand attached to a Kuka robot arm as well as a Nao humanoid robot. In the case of the Nao platform, we provide initial experiments showing that our approach can be used to plan whole arm hooking as well as caging grasps involving only one hand.

Gene Coexpression Network Comparison via Persistent Homology (2018)

Ali Nabi Duman, Harun Pirim

Abstract

Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies.

Semantic Segmentation of Microscopic Neuroanatomical Data by Combining Topological Priors With Encoder–decoder Deep Networks (2020)

Samik Banerjee, Lucas Magee, Dingkang Wang, Xu Li, Bing-Xing Huo, Jaikishan Jayakumar, Katherine Matho, Meng-Kuan Lin, Keerthi Ram, Mohanasankar Sivaprakasam, Josh Huang, Yusu Wang, Partha P. Mitra

Abstract

Understanding of neuronal circuitry at cellular resolution within the brain has relied on neuron tracing methods that involve careful observation and interpretation by experienced neuroscientists. With recent developments in imaging and digitization, this approach is no longer feasible with the large-scale (terabyte to petabyte range) images. Machine-learning-based techniques, using deep networks, provide an efficient alternative to the problem. However, these methods rely on very large volumes of annotated images for training and have error rates that are too high for scientific data analysis, and thus requires a substantial volume of human-in-the-loop proofreading. Here we introduce a hybrid architecture combining prior structure in the form of topological data analysis methods, based on discrete Morse theory, with the best-in-class deep-net architectures for the neuronal connectivity analysis. We show significant performance gains using our hybrid architecture on detection of topological structure (for example, connectivity of neuronal processes and local intensity maxima on axons corresponding to synaptic swellings) with precision and recall close to 90% compared with human observers. We have adapted our architecture to a high-performance pipeline capable of semantic segmentation of light-microscopic whole-brain image data into a hierarchy of neuronal compartments. We expect that the hybrid architecture incorporating discrete Morse techniques into deep nets will generalize to other data domains.

Topological Data Analysis of Contagion Maps for Examining Spreading Processes on Networks (2015)

Dane Taylor, Florian Klimm, Heather A. Harrington, Miroslav Kramár, Konstantin Mischaikow, Mason A. Porter, Peter J. Mucha

Abstract

Social and biological contagions are influenced by the spatial embeddedness of networks. Historically, many epidemics spread as a wave across part of the Earth’s surface; however, in modern contagions long-range edges—for example, due to airline transportation or communication media—allow clusters of a contagion to appear in distant locations. Here we study the spread of contagions on networks through a methodology grounded in topological data analysis and nonlinear dimension reduction. We construct ‘contagion maps’ that use multiple contagions on a network to map the nodes as a point cloud. By analysing the topology, geometry and dimensionality of manifold structure in such point clouds, we reveal insights to aid in the modelling, forecast and control of spreading processes. Our approach highlights contagion maps also as a viable tool for inferring low-dimensional structure in networks.

Statistical Topological Data Analysis - A Kernel Perspective (2015)

Roland Kwitt, Stefan Huber, Marc Niethammer, Weili Lin, Ulrich Bauer

Abstract

We consider the problem of statistical computations with persistence diagrams, a summary representation of topological features in data. These diagrams encode persistent homology, a widely used invariant in topological data analysis. While several avenues towards a statistical treatment of the diagrams have been explored recently, we follow an alternative route that is motivated by the success of methods based on the embedding of probability measures into reproducing kernel Hilbert spaces. In fact, a positive definite kernel on persistence diagrams has recently been proposed, connecting persistent homology to popular kernel-based learning techniques such as support vector machines. However, important properties of that kernel enabling a principled use in the context of probability measure embeddings remain to be explored. Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data.

Export citation

From Topological Analyses to Functional Modeling: The Case of Hippocampus (2021)

Yuri Dabaghian

Abstract

Topological data analyses are widely used for describing and conceptualizing large volumes of neurobiological data, e.g., for quantifying spiking outputs of large neuronal ensembles and thus understanding the functions of the corresponding networks. Below we discuss an approach in which convergent topological analyses produce insights into how information may be processed in mammalian hippocampus—a brain part that plays a key role in learning and memory. The resulting functional model provides a unifying framework for integrating spiking data at different timescales and following the course of spatial learning at different levels of spatiotemporal granularity. This approach allows accounting for contributions from various physiological phenomena into spatial cognition—the neuronal spiking statistics, the effects of spiking synchronization by different brain waves, the roles played by synaptic efficacies and so forth. In particular, it is possible to demonstrate that networks with plastic and transient synaptic architectures can encode stable cognitive maps, revealing the characteristic timescales of memory processing.

Two-Tier Mapper, an Unbiased Topology-Based Clustering Method for Enhanced Global Gene Expression Analysis (2019)

Rachel Jeitziner, Mathieu Carrière, Jacques Rougemont, Steve Oudot, Kathryn Hess, Cathrin Brisken

Abstract

MOTIVATION: Unbiased clustering methods are needed to analyze growing numbers of complex datasets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small datasets. To overcome these shortcomings we used topological data analysis, an emerging field of mathematics that discerns additional feature and discovers hidden insights on datasets and has a wide application range. RESULTS: We have developed a topology-based clustering method called Two-Tier Mapper (TTMap) for enhanced analysis of global gene expression datasets. First, TTMap discerns divergent features in the control group, adjusts for them, and identifies outliers. Second, the deviation of each test sample from the control group in a high-dimensional space is computed, and the test samples are clustered using a new Mapper-based topological algorithm at two levels: a global tier and local tiers. All parameters are either carefully chosen or data-driven, avoiding any user-induced bias. The method is stable, different datasets can be combined for analysis, and significant subgroups can be identified. It outperforms current clustering methods in sensitivity and stability on synthetic and biological datasets, in particular when sample sizes are small; outcome is not affected by removal of control samples, by choice of normalization, or by subselection of data. TTMap is readily applicable to complex, highly variable biological samples and holds promise for personalized medicine. AVAILABILITY AND IMPLEMENTATION: TTMap is supplied as an R package in Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Topological Data Analysis for Electric Motor Eccentricity Fault Detection (2022)

Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru

Abstract

In this paper, we develop topological data analysis (TDA) method for motor current signature analysis (MCSA), and apply it to induction motor eccentricity fault detection. We introduce TDA and present the procedure of extracting topological features from time-domain data that will be represented using persistence diagrams and vectorized Betti sequences. The procedure is applied to induction machine phase current signal analysis, and shown to be highly effective in differentiating signals from different eccentricity levels. With TDA, we are able to use a simple regression model that can predict the fault levels with reasonable accuracy, even for the data of eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of fault detection applications.

The Euler Characteristic: A General Topological Descriptor for Complex Data (2021)

Alexander Smith, Victor Zavala

Abstract

Datasets are mathematical objects (e.g., point clouds, matrices, graphs, images, fields/functions) that have shape. This shape encodes important knowledge about the system under study. Topology is an area of mathematics that provides diverse tools to characterize the shape of data objects. In this work, we study a specific tool known as the Euler characteristic (EC). The EC is a general, low-dimensional, and interpretable descriptor of topological spaces defined by data objects. We revise the mathematical foundations of the EC and highlight its connections with statistics, linear algebra, field theory, and graph theory. We discuss advantages offered by the use of the EC in the characterization of complex datasets; to do so, we illustrate its use in different applications of interest in chemical engineering such as process monitoring, flow cytometry, and microscopy. We show that the EC provides a descriptor that effectively reduces complex datasets and that this reduction facilitates tasks such as visualization, regression, classification, and clustering.

Topological Data Analysis: Concepts, Computation, and Applications in Chemical Engineering (2021)

Alexander D. Smith, Paweł Dłotko, Victor M. Zavala

Abstract

A primary hypothesis that drives scientific and engineering studies is that data has structure. The dominant paradigms for describing such structure are statistics (e.g., moments, correlation functions) and signal processing (e.g., convolutional neural nets, Fourier series). Topological Data Analysis (TDA) is a field of mathematics that analyzes data from a fundamentally different perspective. TDA represents datasets as geometric objects and provides dimensionality reduction techniques that project such objects onto low-dimensional descriptors. The key properties of these descriptors (also known as topological features) are that they provide multiscale information and that they are stable under perturbations (e.g., noise, translation, and rotation). In this work, we review the key mathematical concepts and methods of TDA and present different applications in chemical engineering.

Topological Analysis Reveals State Transitions in Human Gut and Marine Bacterial Communities (2020)

William K. Chang, David VanInsberghe, Libusha Kelly

Abstract

Microbiome dynamics influence the health and functioning of human physiology and the environment and are driven in part by interactions between large numbers of microbial taxa, making large-scale prediction and modeling a challenge. Here, using topological data analysis, we identify states and dynamical features relevant to macroscopic processes. We show that gut disease processes and marine geochemical events are associated with transitions between community states, defined as topological features of the data density. We find a reproducible two-state succession during recovery from cholera in the gut microbiomes of multiple patients, evidence of dynamic stability in the gut microbiome of a healthy human after experiencing diarrhea during travel, and periodic state transitions in a marine Prochlorococcus community driven by water column cycling. Our approach bridges small-scale fluctuations in microbiome composition and large-scale changes in phenotype without details of underlying mechanisms, and provides an assessment of microbiome stability and its relation to human and environmental health.

Reconceiving the Hippocampal Map as a Topological Template (2014)

Yuri Dabaghian, Vicky L. Brandt, Loren M. Frank

Abstract

The role of the hippocampus in spatial cognition is incontrovertible yet controversial. Place cells, initially thought to be location-specifiers, turn out to respond promiscuously to a wide range of stimuli. Here we test the idea, which we have recently demonstrated in a computational model, that the hippocampal place cells may ultimately be interested in a space's topological qualities (its connectivity) more than its geometry (distances and angles); such higher-order functioning would be more consistent with other known hippocampal functions. We recorded place cell activity in rats exploring morphing linear tracks that allowed us to dissociate the geometry of the track from its topology. The resulting place fields preserved the relative sequence of places visited along the track but did not vary with the metrical features of the track or the direction of the rat's movement. These results suggest a reinterpretation of previous studies and new directions for future experiments.

Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology (2015)

Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, Catherine Park

Abstract

DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer.

A Topological Perspective on Regimes in Dynamical Systems (2021)

Kristian Strommen, Matthew Chantry, Joshua Dorrington, Nina Otter

Abstract

The existence and behaviour of so-called `regimes' has been extensively studied in dynamical systems ranging from simple toy models to the atmosphere itself, due to their potential of drastically simplifying complex and chaotic dynamics. Nevertheless, no agreed-upon and clear-cut definition of a `regime' or a `regime system' exists in the literature. We argue here for a definition which equates the existence of regimes in a system with the existence of non-trivial topological structure. We show, using persistent homology, a tool in topological data analysis, that this definition is both computationally tractable, practically informative, and accounts for a variety of different examples. We further show that alternative, more strict definitions based on clustering and/or temporal persistence criteria fail to account for one or more examples of dynamical systems typically thought of as having regimes. We finally discuss how our methodology can shed light on regime behaviour in the atmosphere, and discuss future prospects.

Weighted Persistent Homology for Osmolyte Molecular Aggregation and Hydrogen-Bonding Network Analysis (2020)

D. Vijay Anand, Zhenyu Meng, Kelin Xia, Yuguang Mu

Abstract

It has long been observed that trimethylamine N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work on these two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). Boltzmann persistent entropy (BPE) is proposed to quantitatively characterize the topological features from LPH and IPH, together with persistent Betti number (PBN). More specifically, from the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO is found to exhibit a local network structure. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 Å and a few global circle elements at around 12 Å. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function. Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4 Å, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5 Å to 10 Å. These differences are also reflected in the PBNs and BPEs of the local-scale IPH. These localized topological information has never been revealed before. Since graphs can be transferred into simplicial complexes by the clique complex, our weighted persistent homology models can be used in the analysis of various networks and graphs from any molecular structures and aggregation systems.

Community Resources

Code
Code

Knowledge Gaps in the Early Growth of Semantic Feature Networks (2018)

Ann E. Sizemore, Elisabeth A. Karuza, Chad Giusti, Danielle S. Bassett

Abstract

Understanding language learning and more general knowledge acquisition requires the characterization of inherently qualitative structures. Recent work has applied network science to this task by creating semantic feature networks, in which words correspond to nodes and connections correspond to shared features, and then by characterizing the structure of strongly interrelated groups of words. However, the importance of sparse portions of the semantic network—knowledge gaps—remains unexplored. Using applied topology, we query the prevalence of knowledge gaps, which we propose manifest as cavities in the growing semantic feature network of toddlers. We detect topological cavities of multiple dimensions and find that, despite word order variation, the global organization remains similar. We also show that nodal network measures correlate with filling cavities better than basic lexical properties. Finally, we discuss the importance of semantic feature network topology in language learning and speculate that the progression through knowledge gaps may be a robust feature of knowledge acquisition.

Cliques and Cavities in the Human Connectome (2018)

Ann E. Sizemore, Chad Giusti, Ari Kahn, Jean M. Vettel, Richard F. Betzel, Danielle S. Bassett

Abstract

Encoding brain regions and their connections as a network of nodes and edges captures many of the possible paths along which information can be transmitted as humans process and perform complex behaviors. Because cognitive processes involve large, distributed networks of brain areas, principled examinations of multi-node routes within larger connection patterns can offer fundamental insights into the complexities of brain function. Here, we investigate both densely connected groups of nodes that could perform local computations as well as larger patterns of interactions that would allow for parallel processing. Finding such structures necessitates that we move from considering exclusively pairwise interactions to capturing higher order relations, concepts naturally expressed in the language of algebraic topology. These tools can be used to study mesoscale network structures that arise from the arrangement of densely connected substructures called cliques in otherwise sparsely connected brain networks. We detect cliques (all-to-all connected sets of brain regions) in the average structural connectomes of 8 healthy adults scanned in triplicate and discover the presence of more large cliques than expected in null networks constructed via wiring minimization, providing architecture through which brain network can perform rapid, local processing. We then locate topological cavities of different dimensions, around which information may flow in either diverging or converging patterns. These cavities exist consistently across subjects, differ from those observed in null model networks, and – importantly – link regions of early and late evolutionary origin in long loops, underscoring their unique role in controlling brain function. These results offer a first demonstration that techniques from algebraic topology offer a novel perspective on structural connectomics, highlighting loop-like paths as crucial features in the human brain’s structural architecture.

Topological Detection of Phenomenological Bifurcations With Unreliable Kernel Density Estimates (2024)

Sunia Tanweer, Firas A. Khasawneh

Abstract

Phenomenological (P-type) bifurcations are qualitative changes in stochastic dynamical systems whereby the stationary probability density function (PDF) changes its topology. The current state of the art for detecting these bifurcations requires reliable kernel density estimates computed from an ensemble of system realizations. However, in several real world signals such as Big Data, only a single system realization is available—making it impossible to estimate a reliable kernel density. This study presents an approach for detecting P-type bifurcations using unreliable density estimates. The approach creates an ensemble of objects from Topological Data Analysis (TDA) called persistence diagrams from the system’s sole realization and statistically analyzes the resulting set. We compare several methods for replicating the original persistence diagram including Gibbs point process modelling, Pairwise Interaction Point Modelling, and subsampling. We show that for the purpose of predicting a bifurcation, the simple method of subsampling exceeds the other two methods of point process modelling in performance.

A Novel Approach for Wafer Defect Pattern Classification Based on Topological Data Analysis (2023)

Seungchan Ko, Dowan Koo

Abstract

In semiconductor manufacturing, wafer map defect pattern provides critical information for facility maintenance and yield management, so the classification of defect patterns is one of the most important tasks in the manufacturing process. In this paper, we propose a novel way to represent the shape of the defect pattern as a finite-dimensional vector, which will be used as an input for a neural network algorithm for classification. The main idea is to extract the topological features of each pattern by using the theory of persistent homology from topological data analysis (TDA). Through some experiments with a simulated dataset, we show that the proposed method is faster and much more efficient in training with higher accuracy, compared with the method using convolutional neural networks (CNN) which is the most common approach for wafer map defect pattern classification. Moreover, it was shown that our method outperforms the CNN-based method when the number of training data is not enough and is imbalanced.

Community Resources

Code

Single-Cell Topological RNA-Seq Analysis Reveals Insights Into Cellular Differentiation and Development (2017)

Abbas H. Rizvi, Pablo G. Camara, Elena K. Kandror, Thomas J. Roberts, Ira Schieren, Tom Maniatis, Raul Rabadan

Abstract

Transcriptional programs control cellular lineage commitment and differentiation during development. Understanding cell fate has been advanced by studying single-cell RNA-seq, but is limited by the assumptions of current analytic methods regarding the structure of data. We present single-cell topological data analysis (scTDA), an algorithm for topology-based computational analyses to study temporal, unbiased transcriptional regulation. Compared to other methods, scTDA is a non-linear, model-independent, unsupervised statistical framework that can characterize transient cellular states. We applied scTDA to the analysis of murine embryonic stem cell (mESC) differentiation in vitro in response to inducers of motor neuron differentiation. scTDA resolved asynchrony and continuity in cellular identity over time, and identified four transient states (pluripotent, precursor, progenitor, and fully differentiated cells) based on changes in stage-dependent combinations of transcription factors, RNA-binding proteins and long non-coding RNAs. scTDA can be applied to study asynchronous cellular responses to either developmental cues or environmental perturbations.

TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images (2021)

Mustafa Hajij, Ghada Zamzmi, Fawwaz Batayneh

Abstract

Topological Data Analysis (TDA) has emerged recently as a robust tool to extract and compare the structure of datasets. TDA identifies features in data (e.g., connected components and holes) and assigns a quantitative measure to these features. Several studies reported that topological features extracted by TDA tools provide unique information about the data, discover new insights, and determine which feature is more related to the outcome. On the other hand, the overwhelming success of deep neural networks in learning patterns and relationships has been proven on various data applications including images. To capture the characteristics of both worlds, we propose TDA-Net, a novel ensemble network that fuses topological and deep features for the purpose of enhancing model generalizability and accuracy. We apply the proposed TDA-Net to a critical application, which is the automated detection of COVID-19 from CXR images. Experimental results showed that the proposed network achieved excellent performance and suggested the applicability of our method in practice.

A Topological Data Analysis Based Classification Method for Multiple Measurements (2019)

Henri Riihimäki, Wojciech Chachólski, Jakob Theorell, Jan Hillert, Ryan Ramanujam

Abstract

\textlessh3\textgreaterAbstract\textless/h3\textgreater \textlessh3\textgreaterBackground\textless/h3\textgreater \textlessp\textgreaterMachine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values.\textless/p\textgreater\textlessh3\textgreaterResults\textless/h3\textgreater \textlessp\textgreaterFor 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model.\textless/p\textgreater\textlessh3\textgreaterConclusions\textless/h3\textgreater \textlessp\textgreaterThis algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.\textless/p\textgreater

Topological Data Analysis Generates High-Resolution, Genome-Wide Maps of Human Recombination (2016)

Pablo G. Camara, Daniel I. S. Rosenbloom, Kevin J. Emmett, Arnold J. Levine, Raul Rabadan

Abstract

Meiotic recombination is a fundamental evolutionary process driving diversity in eukaryotes. In mammals, recombination is known to occur preferentially at specific genomic regions. Using topological data analysis (TDA), a branch of applied topology that extracts global features from large data sets, we developed an efficient method for mapping recombination at fine scales. When compared to standard linkage-based methods, TDA can deal with a larger number of SNPs and genomes without incurring prohibitive computational costs. We applied TDA to 1,000 Genomes Project data and constructed high-resolution whole-genome recombination maps of seven human populations. Our analysis shows that recombination is generally under-represented within transcription start sites. However, the binding sites of specific transcription factors are enriched for sites of recombination. These include transcription factors that regulate the expression of meiosis- and gametogenesis-specific genes, cell cycle progression, and differentiation blockage. Additionally, our analysis identifies an enrichment for sites of recombination at repeat-derived loci matched by piwi-interacting RNAs.

Confinement in Non-Abelian Lattice Gauge Theory via Persistent Homology (2022)

Daniel Spitz, Julian M. Urban, Jan M. Pawlowski

Abstract

We investigate the structure of confining and deconfining phases in SU(2) lattice gauge theory via persistent homology, which gives us access to the topology of a hierarchy of combinatorial objects constructed from given data. Specifically, we use filtrations by traced Polyakov loops, topological densities, holonomy Lie algebra fields, as well as electric and magnetic fields. This allows for a comprehensive picture of confinement. In particular, topological densities form spatial lumps which show signatures of the classical probability distribution of instanton-dyons. Signatures of well-separated dyons located at random positions are encoded in holonomy Lie algebra fields, following the semi-classical temperature dependence of the instanton appearance probability. Debye screening discriminating between electric and magnetic fields is visible in persistent homology and pronounced at large gauge coupling. All employed constructions are gauge-invariant without a priori assumptions on the configurations under study. This work showcases the versatility of persistent homology for statistical and quantum physics studies, barely explored to date.

Persistent Homology in Cosmic Shear: Constraining Parameters With Topological Data Analysis (2021)

Sven Heydenreich, Benjamin Brück, Joachim Harnois-Déraps

Abstract

In recent years, cosmic shear has emerged as a powerful tool for studying the statistical distribution of matter in our Universe. Apart from the standard two-point correlation functions, several alternative methods such as peak count statistics offer competitive results. Here we show that persistent homology, a tool from topological data analysis, can extract more cosmological information than previous methods from the same data set. For this, we use persistent Betti numbers to efficiently summarise the full topological structure of weak lensing aperture mass maps. This method can be seen as an extension of the peak count statistics, in which we additionally capture information about the environment surrounding the maxima. We first demonstrate the performance in a mock analysis of the KiDS+VIKING-450 data: We extract the Betti functions from a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations and use these to train a Gaussian process emulator that provides rapid model predictions; we next run a Markov chain Monte Carlo analysis on independent mock data to infer the cosmological parameters and their uncertainties. When comparing our results, we recover the input cosmology and achieve a constraining power on that is 3% tighter than that on peak count statistics. Performing the same analysis on 100 deg\textlesssup\textgreater2\textlesssup/\textgreater of \textlessi\textgreaterEuclid\textlessi/\textgreater-like simulations, we are able to improve the constraints on \textlessi\textgreaterS\textlessi/\textgreater\textlesssub\textgreater8\textlesssub/\textgreater and Ω\textlesssub\textgreaterm\textlesssub/\textgreater by 19% and 12%, respectively, while breaking some of the degeneracy between \textlessi\textgreaterS\textlessi/\textgreater\textlesssub\textgreater8\textlesssub/\textgreater and the dark energy equation of state. To our knowledge, the methods presented here are the most powerful topological tools for constraining cosmological parameters with lensing data.

The Accumulated Persistence Function, a New Useful Functional Summary Statistic for Topological Data Analysis, With a View to Brain Artery Trees and Spatial Point Process Applications (2019)

C.A.N. Biscio, J. Møller

Abstract

We start with a simple introduction to topological data analysis where the most popular tool is called a persistence diagram. Briefly, a persistence diagram is a multiset of points in the plane describing the persistence of topological features of a compact set when a scale parameter varies. Since statistical methods are difficult to apply directly on persistence diagrams, various alternative functional summary statistics have been suggested, but either they do not contain the full information of the persistence diagram or they are two-dimensional functions. We suggest a new functional summary statistic that is one-dimensional and hence easier to handle, and which under mild conditions contains the full information of the persistence diagram. Its usefulness is illustrated in statistical settings concerned with point clouds and brain artery trees. The supplementary materials include additional methods and examples, technical details, and the R code used for all examples. © 2019, © 2019 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

John T. Nardini, Bernadette J. Stolz, Kevin B. Flores, Heather A. Harrington, Helen M. Byrne

Abstract

Angiogenesis is the process by which blood vessels form from pre-existing vessels. It plays a key role in many biological processes, including embryonic development and wound healing, and contributes to many diseases including cancer and rheumatoid arthritis. The structure of the resulting vessel networks determines their ability to deliver nutrients and remove waste products from biological tissues. Here we simulate the Anderson-Chaplain model of angiogenesis at different parameter values and quantify the vessel architectures of the resulting synthetic data. Specifically, we propose a topological data analysis (TDA) pipeline for systematic analysis of the model. TDA is a vibrant and relatively new field of computational mathematics for studying the shape of data. We compute topological and standard descriptors of model simulations generated by different parameter values. We show that TDA of model simulation data stratifies parameter space into regions with similar vessel morphology. The methodologies proposed here are widely applicable to other synthetic and experimental data including wound healing, development, and plant biology.

Inferring COVID-19 Biological Pathways From Clinical Phenotypes via Topological Analysis (2021)

Negin Karisani, Daniel E. Platt, Saugata Basu, Laxmi Parida

Abstract

COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this process in a timely manner. However, medical records are often unstructured clinical notes, and this poses significant challenges to developing the automated systems. In this article, we propose a pipeline to aid practitioners in analyzing clinical notes and revealing the pathways associated with this disease. Our pipeline relies on topological properties and consists of three steps: 1) pre-processing the clinical notes to extract the salient concepts, 2) constructing a feature space of the patients to characterize the extracted concepts, and finally, 3) leveraging the topological properties to distill the available knowledge and visualize the result. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.

Identification of Key Features Using Topological Data Analysis for Accurate Prediction of Manufacturing System Outputs (2017)

Wei Guo, Ashis G. Banerjee

Abstract

Topological data analysis (TDA) has emerged as one of the most promising approaches to extract insights from high-dimensional data of varying types such as images, point clouds, and meshes, in an unsupervised manner. To the best of our knowledge, here, we provide the first successful application of TDA in the manufacturing systems domain. We apply a widely used TDA method, known as the Mapper algorithm, on two benchmark data sets for chemical process yield prediction and semiconductor wafer fault detection, respectively. The algorithm yields topological networks that capture the intrinsic clusters and connections among the clusters present in the data sets, which are difficult to detect using traditional methods. We select key process variables or features that impact the system outcomes by analyzing the network shapes. We then use predictive models to evaluate the impact of the selected features. Results show that the models achieve at least the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.

The Topology of Higher-Order Complexes Associated With Brain Hubs in Human Connectomes (2020)

Miroslav Andjelković, Bosiljka Tadić, Roderick Melnik

Abstract

Higher-order connectivity in complex systems described by simplexes of different orders provides a geometry for simplex-based dynamical variables and interactions. Simplicial complexes that constitute a functional geometry of the human connectome can be crucial for the brain complex dynamics. In this context, the best-connected brain areas, designated as hub nodes, play a central role in supporting integrated brain function. Here, we study the structure of simplicial complexes attached to eight global hubs in the female and male connectomes and identify the core networks among the affected brain regions. These eight hubs (Putamen, Caudate, Hippocampus and Thalamus-Proper in the left and right cerebral hemisphere) are the highest-ranking according to their topological dimension, defined as the number of simplexes of all orders in which the node participates. Furthermore, we analyse the weight-dependent heterogeneity of simplexes. We demonstrate changes in the structure of identified core networks and topological entropy when the threshold weight is gradually increased. These results highlight the role of higher-order interactions in human brain networks and provide additional evidence for (dis)similarity between the female and male connectomes.

Geometric Anomaly Detection in Data (2020)

Bernadette J. Stolz, Jared Tanner, Heather A. Harrington, Vidit Nanda

Abstract

The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors.

HERMES: Persistent Spectral Graph Software (2020)

Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei Wei

Abstract

Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacians (PLs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.

Persistent Homology Based Graph Convolution Network for Fine-Grained 3D Shape Segmentation (2021)

Chi-Chong Wong, Chi-Man Vong

Abstract

Fine-grained 3D segmentation is an important task in 3D object understanding, especially in applications such as intelligent manufacturing or parts analysis for 3D objects. However, many challenges involved in such problem are yet to be solved, such as i) interpreting the complex structures located in different regions for 3D objects; ii) capturing fine-grained structures with sufficient topology correctness. Current deep learning and graph machine learning methods fail to tackle such challenges and thus provide inferior performance in fine-grained 3D analysis. In this work, methods in topological data analysis are incorporated with geometric deep learning model for the task of fine-grained segmentation for 3D objects. We propose a novel neural network model called Persistent Homology based Graph Convolution Network (PHGCN), which i) integrates persistent homology into graph convolution network to capture multi-scale structural information that can accurately represent complex structures for 3D objects; ii) applies a novel Persistence Diagram Loss (ℒPD) that provides sufficient topology correctness for segmentation over the fine-grained structures. Extensive experiments on fine-grained 3D segmentation validate the effectiveness of the proposed PHGCN model and show significant improvements over current state-of-the-art methods.

Persistent Homology and the Branching Topologies of Plants (2017)

Mao Li, Keith Duncan, Christopher N. Topp, Daniel H. Chitwood

Characterising Epithelial Tissues Using Persistent Entropy (2019)

N. Atienza, L. M. Escudero, M. J. Jimenez, M. Soriano-Trigueros

Abstract

In this paper, we apply persistent entropy, a novel topological statistic, for characterization of images of epithelial tissues. We have found out that persistent entropy is able to summarize topological and geometric information encoded by \$\$\alpha \$\$α-complexes and persistent homology. After using some statistical tests, we can guarantee the existence of significant differences in the studied tissues.

Topological Characteristics of Oil and Gas Reservoirs and Their Applications (2017)

V. A. Baikov, R. R. Gilmanov, I. A. Taimanov, A. A. Yakovlev

Abstract

We demonstrate applications of topological characteristics of oil and gas reservoirs considered as three-dimensional bodies to geological modeling.

ChainNet: Learning on Blockchain Graphs With Topological Features (2019)

N. C. Abay, C. G. Akcora, Y. R. Gel, M. Kantarcioglu, U. D. Islambekov, Y. Tian, B. Thuraisingham

Abstract

The following topics are dealt with: learning (artificial intelligence); graph theory; neural nets; pattern classification; data mining; feature extraction; recommender systems; pattern clustering; social networking (online); optimisation.

Measurement of the Topological Dimension of Hippocampal Place Cell Activity (2018)

Steven E. Fox, James B. Ranck

Export citation

Complex Politics: A Quantitative Semantic and Topological Analysis of Uk House of Commons Debates (2015)

Stefano Gurciullo, Michael Smallegan, María Pereda, Federico Battiston, Alice Patania, Sebastian Poledna, Daniel Hedblom, Bahattin Tolga Oztan, Alexander Herzog, Peter John

Export citation

Persistence Weighted Gaussian Kernel for Topological Data Analysis (2016)

Genki Kusano, Yasuaki Hiraoka, Kenji Fukumizu

Abstract

Topological data analysis (TDA) is an emerging mathematical concept for characterizing shapes in complex data. In TDA, persistence diagrams are widely recognized as a useful descriptor of data, and...

Topological Feature Tracking for Submesoscale Eddies (2022)

Sam Voisin, Jay Hineman, James B. Polly, Gary Koplik, Ken Ball, Paul Bendich, Joseph D‘Addezio, Gregg A. Jacobs, Tamay Özgökmen

Topo-Cxr: Chest X-Ray TB and Pneumonia Screening With Topological Machine Learning (2023)

Faisal Ahmed, Brighton Nuwagira, Furkan Torlak, Baris Coskunuzer

Community Resources

Code
Data

Topology of Viral Evolution (2013)

Joseph Minhow Chan, Gunnar Carlsson, Raul Rabadan

Abstract

The tree structure is currently the accepted paradigm to represent evolutionary relationships between organisms, species or other taxa. However, horizontal, or reticulate, genomic exchanges are pervasive in nature and confound characterization of phylogenetic trees. Drawing from algebraic topology, we present a unique evolutionary framework that comprehensively captures both clonal and reticulate evolution. We show that whereas clonal evolution can be summarized as a tree, reticulate evolution exhibits nontrivial topology of dimension greater than zero. Our method effectively characterizes clonal evolution, reassortment, and recombination in RNA viruses. Beyond detecting reticulate evolution, we succinctly recapitulate the history of complex genetic exchanges involving more than two parental strains, such as the triple reassortment of H7N9 avian influenza and the formation of circulating HIV-1 recombinants. In addition, we identify recurrent, large-scale patterns of reticulate evolution, including frequent PB2-PB1-PA-NP cosegregation during avian influenza reassortment. Finally, we bound the rate of reticulate events (i.e., 20 reassortments per year in avian influenza). Our method provides an evolutionary perspective that not only captures reticulate events precluding phylogeny, but also indicates the evolutionary scales where phylogenetic inference could be accurate.

A Sheaf and Topology Approach to Generating Local Branch Numbers in Digital Images (2020)

Chuan-Shen Hu, Yu-Min Chung

Abstract

This paper concerns a theoretical approach that combines topological data analysis (TDA) and sheaf theory. Topological data analysis, a rising field in mathematics and computer science, concerns the shape of the data and has been proven effective in many scientific disciplines. Sheaf theory, a mathematics subject in algebraic geometry, provides a framework for describing the local consistency in geometric objects. Persistent homology (PH) is one of the main driving forces in TDA, and the idea is to track changes of geometric objects at different scales. The persistence diagram (PD) summarizes the information of PH in the form of a multi-set. While PD provides useful information about the underlying objects, it lacks fine relations about the local consistency of specific pairs of generators in PD, such as the merging relation between two connected components in the PH. The sheaf structure provides a novel point of view for describing the merging relation of local objects in PH. It is the goal of this paper to establish a theoretic framework that utilizes the sheaf theory to uncover finer information from the PH. We also show that the proposed theory can be applied to identify the branch numbers of local objects in digital images.

Topological Eulerian Synthesis of Slow Motion Periodic Videos (2018)

Christopher Tralie, Matthew Berger

Abstract

We consider the problem of taking a video that is comprised of multiple periods of repetitive motion, and reordering the frames of the video into a single period, producing a detailed, single cycle video of motion. This problem is challenging, as such videos often contain noise, drift due to camera motion and from cycle to cycle, and irrelevant background motion/occlusions, and these factors can confound the relevant periodic motion we seek in the video. To address these issues in a simple and eﬃcient manner, we introduce a tracking free Eulerian approach for synthesizing a single cycle of motion. Our approach is geometric: we treat each frame as a point in high-dimensional Euclidean space, and analyze the sliding window embedding formed by this sequence of points, which yields samples along a topological loop regardless of the type of periodic motion. We combine tools from topological data analysis and spectral geometric analysis to estimate the phase of each window, and we exploit the sliding window structure to robustly reorder frames. We show quantitative results that highlight the robustness of our technique to camera shake, noise, and occlusions, and qualitative results of single-cycle motion synthesis across a variety of scenarios.

Data-Driven and Automatic Surface Texture Analysis Using Persistent Homology (2021)

Melih C. Yesilli, Firas A. Khasawneh

Abstract

Surface roughness plays an important role in analyzing engineering surfaces. It quantifies the surface topography and can be used to determine whether the resulting surface finish is acceptable or not. Nevertheless, while several existing tools and standards are available for computing surface roughness, these methods rely heavily on user input thus slowing down the analysis and increasing manufacturing costs. Therefore, fast and automatic determination of the roughness level is essential to avoid costs resulting from surfaces with unacceptable finish, and user-intensive analysis. In this study, we propose a Topological Data Analysis (TDA) based approach to classify the roughness level of synthetic surfaces using both their areal images and profiles. We utilize persistent homology from TDA to generate persistence diagrams that encapsulate information on the shape of the surface. We then obtain feature matrices for each surface or profile using Carlsson coordinates, persistence images, and template functions. We compare our results to two widely used methods in the literature: Fast Fourier Transform (FFT) and Gaussian filtering. The results show that our approach yields mean accuracies as high as 97%. We also show that, in contrast to existing surface analysis tools, our TDA-based approach is fully automatable and provides adaptive feature extraction.

Unexpected Topology of the Temperature Fluctuations in the Cosmic Microwave Background (2019)

Pratyush Pranav, Robert J. Adler, Thomas Buchert, Herbert Edelsbrunner, Bernard J. T. Jones, Armin Schwartzman, Hubert Wagner, Rien van de Weygaert

Abstract

We study the topology generated by the temperature fluctuations of the cosmic microwave background (CMB) radiation, as quantified by the number of components and holes, formally given by the Betti numbers, in the growing excursion sets. We compare CMB maps observed by the Planck satellite with a thousand simulated maps generated according to the ΛCDM paradigm with Gaussian distributed fluctuations. The comparison is multi-scale, being performed on a sequence of degraded maps with mean pixel separation ranging from 0.05 to 7.33°. The survey of the CMB over 𝕊2 is incomplete due to obfuscation effects by bright point sources and other extended foreground objects like our own galaxy. To deal with such situations, where analysis in the presence of “masks” is of importance, we introduce the concept of relative homology. The parametric χ2-test shows differences between observations and simulations, yielding p-values at percent to less than permil levels roughly between 2 and 7°, with the difference in the number of components and holes peaking at more than 3σ sporadically at these scales. The highest observed deviation between the observations and simulations for b0 and b1 is approximately between 3σ and 4σ at scales of 3–7°. There are reports of mildly unusual behaviour of the Euler characteristic at 3.66° in the literature, computed from independent measurements of the CMB temperature fluctuations by Planck’s predecessor, the Wilkinson Microwave Anisotropy Probe (WMAP) satellite. The mildly anomalous behaviour of the Euler characteristic is phenomenologically related to the strongly anomalous behaviour of components and holes, or the zeroth and first Betti numbers, respectively. Further, since these topological descriptors show consistent anomalous behaviour over independent measurements of Planck and WMAP, instrumental and systematic errors may be an unlikely source. These are also the scales at which the observed maps exhibit low variance compared to the simulations, and approximately the range of scales at which the power spectrum exhibits a dip with respect to the theoretical model. Non-parametric tests show even stronger differences at almost all scales. Crucially, Gaussian simulations based on power-spectrum matching the characteristics of the observed dipped power spectrum are not able to resolve the anomaly. Understanding the origin of the anomalies in the CMB, whether cosmological in nature or arising due to late-time effects, is an extremely challenging task. Regardless, beyond the trivial possibility that this may still be a manifestation of an extreme Gaussian case, these observations, along with the super-horizon scales involved, may motivate the study of primordial non-Gaussianity. Alternative scenarios worth exploring may be models with non-trivial topology, including topological defect models.

Cell Complex Neural Networks (2020)

Mustafa Hajij, Kyle Istvan, Ghada Zamzami

Abstract

Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. We propose a general, combinatorial, and unifying construction for performing neural network-type computations on cell complexes. Furthermore, we introduce inter-cellular message passing schemes, message passing schemes on cell complexes that take the topology of the underlying space into account. In particular, our method generalizes many of the most popular types of graph neural networks.

Relational Persistent Homology for Multispecies Data With Application to the Tumor Microenvironment (2023)

Bernadette J. Stolz, Jagdeep Dhesi, Joshua A. Bull, Heather A. Harrington, Helen M. Byrne, Iris H. R. Yoon

Abstract

Topological data analysis (TDA) is an active field of mathematics for quantifying shape in complex data. Standard methods in TDA such as persistent homology (PH) are typically focused on the analysis of data consisting of a single entity (e.g., cells or molecular species). However, state-of-the-art data collection techniques now generate exquisitely detailed multispecies data, prompting a need for methods that can examine and quantify the relations among them. Such heterogeneous data types arise in many contexts, ranging from biomedical imaging, geospatial analysis, to species ecology. Here, we propose two methods for encoding spatial relations among different data types that are based on Dowker complexes and Witness complexes. We apply the methods to synthetic multispecies data of a tumor microenvironment and analyze topological features that capture relations between different cell types, e.g., blood vessels, macrophages, tumor cells, and necrotic cells. We demonstrate that relational topological features can extract biological insight, including the dominant immune cell phenotype (an important predictor of patient prognosis) and the parameter regimes of a data-generating model. The methods provide a quantitative perspective on the relational analysis of multispecies spatial data, overcome the limits of traditional PH, and are readily computable.

A Novel Quality Clustering Methodology on Fab-Wide Wafer Map Images in Semiconductor Manufacturing (2022)

Yuan-Ming Hsu, Xiaodong Jia, Wenzhe Li, Jay Lee

Abstract

Abstract. In semiconductor manufacturing, clustering the fab-wide wafer map images is of critical importance for practitioners to understand the subclusters of wafer defects, recognize novel clusters or anomalies, and develop fast reactions to quality issues. However, due to the high-mix manufacturing of diversified wafer products of different sizes and technologies, it is difficult to cluster the wafer map images across the fab. This paper addresses this challenge by proposing a novel methodology for fab-wide wafer map data clustering. In the proposed methodology, a well-known deep learning technique, vision transformer with multi-head attention is first trained to convert binary wafer images of different sizes into condensed feature vectors for efficient clustering. Then, the Topological Data Analysis (TDA), which is widely used in biomedical applications, is employed to visualize the data clusters and identify the anomalies. The TDA yields a topological representation of high-dimensional big data as well as its local clusters by creating a graph that shows nodes corresponding to the clusters within the data. The effectiveness of the proposed methodology is demonstrated by clustering the public wafer map dataset WM-811k from the real application which has a total of 811,457 wafer map images. We further demonstrate the potential applicability of topology data analytics in the semiconductor area by visualization.

Parametric Inference Using Persistence Diagrams: a Case Study in Population Genetics (2014)

Kevin Emmett, Daniel Rosenbloom, Pablo Camara, Raul Rabadan

Abstract

Persistent homology computes topological invariants from point cloud data. Recent work has focused on developing statistical methods for data analysis in this framework. We show that, in certain models, parametric inference can be performed using statistics deﬁned on the computed invariants. We develop this idea with a model from population genetics, the coalescent with recombination. We apply our model to an inﬂuenza dataset, identifying two scales of topological structure which have a distinct biological interpretation.

Export citation

Exploring Surface Texture Quantification in Piezo Vibration Striking Treatment (PVST) Using Topological Measures (2022)

Melih C. Yesilli, Max M. Chumley, Jisheng Chen, Firas A. Khasawneh, Yang Guo

Abstract

Abstract. Surface texture influences wear and tribological properties of manufactured parts, and it plays a critical role in end-user products. Therefore, quantifying the order or structure of a manufactured surface provides important information on the quality and life expectancy of the product. Although texture can be intentionally introduced to enhance aesthetics or to satisfy a design function, sometimes it is an inevitable byproduct of surface treatment processes such as Piezo Vibration Striking Treatment (PVST). Measures of order for surfaces have been characterized using statistical, spectral, and geometric approaches. For nearly hexagonal lattices, topological tools have also been used to measure the surface order. This paper explores utilizing tools from Topological Data Analysis for measuring surface texture. We compute measures of order based on optical digital microscope images of surfaces treated using PVST. These measures are applied to the grid obtained from estimating the centers of tool impacts, and they quantify the grid’s deviations from the nominal one. Our results show that TDA provides a convenient framework for characterization of pattern type that bypasses some limitations of existing tools such as difficult manual processing of the data and the need for an expert user to analyze and interpret the surface images.

Community Resources

Code

Hepatic Tumor Classification Using Texture and Topology Analysis of Non-Contrast-Enhanced Three-Dimensional T1-Weighted MR Images With a Radiomics Approach (2019)

Asuka Oyama, Yasuaki Hiraoka, Ippei Obayashi, Yusuke Saikawa, Shigeru Furui, Kenshiro Shiraishi, Shinobu Kumagai, Tatsuya Hayashi, Jun’ichi Kotoku

Abstract

The purpose of this study is to evaluate the accuracy for classification of hepatic tumors by characterization of T1-weighted magnetic resonance (MR) images using two radiomics approaches with machine learning models: texture analysis and topological data analysis using persistent homology. This study assessed non-contrast-enhanced fat-suppressed three-dimensional (3D) T1-weighted images of 150 hepatic tumors. The lesions included 50 hepatocellular carcinomas (HCCs), 50 metastatic tumors (MTs), and 50 hepatic hemangiomas (HHs) found respectively in 37, 23, and 33 patients. For classification, texture features were calculated, and also persistence images of three types (degree 0, degree 1 and degree 2) were obtained for each lesion from the 3D MR imaging data. We used three classification models. In the classification of HCC and MT (resp. HCC and HH, HH and MT), we obtained accuracy of 92% (resp. 90%, 73%) by texture analysis, and the highest accuracy of 85% (resp. 84%, 74%) when degree 1 (resp. degree 1, degree 2) persistence images were used. Our methods using texture analysis or topological data analysis allow for classification of the three hepatic tumors with considerable accuracy, and thus might be useful when applied for computer-aided diagnosis with MR images.

Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome (2014)

David Romano, Monica Nicolau, Eve-Marie Quintin, Paul K. Mazaika, Amy A. Lightbody, Heather Cody Hazlett, Joseph Piven, Gunnar Carlsson, Allan L. Reiss

Abstract

Fragile X syndrome (FXS), due to mutations of the FMR1 gene, is the most common known inherited cause of developmental disability as well as the most common single-gene risk factor for autism. Our goal was to examine variation in brain structure in FXS with topological data analysis (TDA), and to assess how such variation is associated with measures of IQ and autism-related behaviors. To this end, we analyzed imaging and behavioral data from young boys (n = 52; aged 1.57–4.15 years) diagnosed with FXS. Application of topological methods to structural MRI data revealed two large subgroups within the study population. Comparison of these subgroups showed significant between-subgroup neuroanatomical differences similar to those previously reported to distinguish children with FXS from typically developing controls (e.g., enlarged caudate). In addition to neuroanatomy, the groups showed significant differences in IQ and autism severity scores. These results suggest that despite arising from a single gene mutation, FXS may encompass two biologically, and clinically separable phenotypes. In addition, these findings underscore the potential of TDA as a powerful tool in the search for biological phenotypes of neuropsychiatric disorders. Hum Brain Mapp 35:4904–4915, 2014. © 2014 Wiley Periodicals, Inc.

Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)

Rayna Andreeva, Anwesha Sarkar, Rik Sarkar

Abstract

The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics.

Community Resources

Code

Detecting Bifurcations in Dynamical Systems With CROCKER Plots (2022)

İsmail Güzel, Elizabeth Munch, Firas A. Khasawneh

Abstract

Existing tools for bifurcation detection from signals of dynamical systems typically are either limited to a special class of systems or they require carefully chosen input parameters and a significant expertise to interpret the results. Therefore, we describe an alternative method based on persistent homology—a tool from topological data analysis—that utilizes Betti numbers and CROCKER plots. Betti numbers are topological invariants of topological spaces, while the CROCKER plot is a coarsened but easy to visualize data representation of a one-parameter varying family of persistence barcodes. The specific bifurcations we investigate are transitions from periodic to chaotic behavior or vice versa in a one-parameter collection of differential equations. We validate our methods using numerical experiments on ten dynamical systems and contrast the results with existing tools that use the maximum Lyapunov exponent. We further prove the relationship between the Wasserstein distance to the empty diagram and the norm of the Betti vector, which shows that an even more simplified version of the information has the potential to provide insight into the bifurcation parameter. The results show that our approach reveals more information about the shape of the periodic attractor than standard tools, and it has more favorable computational time in comparison with the Rösenstein algorithm for computing the maximum Lyapunov exponent.

Detecting Bifurcations in Dynamical Systems With CROCKER Plots (2022)

İsmail Güzel, Elizabeth Munch, Firas A. Khasawneh

Abstract

Existing tools for bifurcation detection from signals of dynamical systems typically are either limited to a special class of systems or they require carefully chosen input parameters and a significant expertise to interpret the results. Therefore, we describe an alternative method based on persistent homology—a tool from topological data analysis—that utilizes Betti numbers and CROCKER plots. Betti numbers are topological invariants of topological spaces, while the CROCKER plot is a coarsened but easy to visualize data representation of a one-parameter varying family of persistence barcodes. The specific bifurcations we investigate are transitions from periodic to chaotic behavior or vice versa in a one-parameter collection of differential equations. We validate our methods using numerical experiments on ten dynamical systems and contrast the results with existing tools that use the maximum Lyapunov exponent. We further prove the relationship between the Wasserstein distance to the empty diagram and the norm of the Betti vector, which shows that an even more simplified version of the information has the potential to provide insight into the bifurcation parameter. The results show that our approach reveals more information about the shape of the periodic attractor than standard tools, and it has more favorable computational time in comparison with the Rösenstein algorithm for computing the maximum Lyapunov exponent.

Possible Clinical Use of Big Data: Personal Brain Connectomics (2018)

Dong Soo Lee

Abstract

The biggest data is brain imaging data, which waited for clinical use during the last three decades. Topographic data interpretation prevailed for the first two decades, and only during the last decade, connectivity or connectomics data began to be analyzed properly. Owing to topological data interpretation and timely introduction of likelihood method based on hierarchical generalized linear model, we now foresee the clinical use of personal connectomics for classification and prediction of disease prognosis for brain diseases without any clue by currently available diagnostic methods.

Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury (2015)

Jessica L. Nielson, Jesse Paquette, Aiwen W. Liu, Cristian F. Guandique, C. Amy Tovar, Tomoo Inoue, Karen-Amanda Irvine, John C. Gensel, Jennifer Kloke, Tanya C. Petrossian, Pek Y. Lum, Gunnar E. Carlsson, Geoffrey T. Manley, Wise Young, Michael S. Beattie, Jacqueline C. Bresnahan, Adam R. Ferguson

Abstract

Data-driven discovery in complex neurological disorders has potential to extract meaningful knowledge from large, heterogeneous datasets. Here the authors apply topological data analysis to assess therapeutic effects in preclinical traumatic brain injury and spinal cord injury research studies.

The Classification of Endoscopy Images With Persistent Homology (2016)

Olga Dunaeva, Herbert Edelsbrunner, Anton Lukyanov, Michael Machin, Daria Malkova, Roman Kuvaev, Sergey Kashin

Abstract

Aiming at the automatic diagnosis of tumors using narrow band imaging (NBI) magnifying endoscopic (ME) images of the stomach, we combine methods from image processing, topology, geometry, and machine learning to classify patterns into three classes: oval, tubular and irregular. Training the algorithm on a small number of images of each type, we achieve a high rate of correct classifications. The analysis of the learning algorithm reveals that a handful of geometric and topological features are responsible for the overwhelming majority of decisions.

Topological Data Analysis in Medical Imaging: Current State of the Art (2023)

Yashbir Singh, Colleen M. Farrelly, Quincy A. Hathaway, Tim Leiner, Jaidip Jagtap, Gunnar E. Carlsson, Bradley J. Erickson

Abstract

Machine learning, and especially deep learning, is rapidly gaining acceptance and clinical usage in a wide range of image analysis applications and is regarded as providing high performance in detecting anatomical structures and identification and classification of patterns of disease in medical images. However, there are many roadblocks to the widespread implementation of machine learning in clinical image analysis, including differences in data capture leading to different measurements, high dimensionality of imaging and other medical data, and the black-box nature of machine learning, with a lack of insight into relevant features. Techniques such as radiomics have been used in traditional machine learning approaches to model the mathematical relationships between adjacent pixels in an image and provide an explainable framework for clinicians and researchers. Newer paradigms, such as topological data analysis (TDA), have recently been adopted to design and develop innovative image analysis schemes that go beyond the abilities of pixel-to-pixel comparisons. TDA can automatically construct filtrations of topological shapes of image texture through a technique known as persistent homology (PH); these features can then be fed into machine learning models that provide explainable outputs and can distinguish different image classes in a computationally more efficient way, when compared to other currently used methods. The aim of this review is to introduce PH and its variants and to review TDA’s recent successes in medical imaging studies.

Segmentation of Biomedical Images by a Computational Topology Framework (2017)

Rodrigo Rojas Moraleda, Wei Xiong, Niels Halama, Katja Breitkopf-Heinlein, Steven Steven, Luis Salinas, Dieter W. Heermann, Nektarios A. Valous

Abstract

The segmentation of cell nuclei is an important step towards the automated analysis of histological images. The presence of a large number of nuclei in whole-slide images necessitates methods that are computationally tractable in addition to being effective. In this work, a method is developed for the robust segmentation of cell nuclei in histological images based on the principles of persistent homology. More specifically, an abstract simplicial homology approach for image segmentation is established. Essentially, the approach deals with the persistence of disconnected sets in the image, thus identifying salient regions that express patterns of persistence. By introducing an image representation based on topological features, the task of segmentation is less dependent on variations of color or texture. This results in a novel approach that generalizes well and provides stable performance. The method conceptualizes regions of interest (cell nuclei) pertinent to their topological features in a successful manner. The time cost of the proposed approach is lower-bounded by an almost linear behavior and upper-bounded by O(n2) in a worst-case scenario. Time complexity matches a quasilinear behavior which is O(n1+ɛ) for ε \textless 1. Images acquired from histological sections of liver tissue are used as a case study to demonstrate the effectiveness of the approach. The histological landscape consists of hepatocytes and non-parenchymal cells. The accuracy of the proposed methodology is verified against an automated workflow created by the output of a conventional filter bank (validated by experts) and the supervised training of a random forest classifier. The results are obtained on a per-object basis. The proposed workflow successfully detected both hepatocyte and non-parenchymal cell nuclei with an accuracy of 84.6%, and hepatocyte cell nuclei only with an accuracy of 86.2%. A public histological dataset with supplied ground-truth data is also used for evaluating the performance of the proposed approach (accuracy: 94.5%). Further validations are carried out with a publicly available dataset and ground-truth data from the Gland Segmentation in Colon Histology Images Challenge (GlaS) contest. The proposed method is useful for obtaining unsupervised robust initial segmentations that can be further integrated in image/data processing and management pipelines. The development of a fully automated system supporting a human expert provides tangible benefits in the context of clinical decision-making.

Congestion Barcodes: Exploring the Topology of Urban Congestion Using Persistent Homology (2017)

Yu Wu, Gabriel Shindnes, Vaibhav Karve, Derrek Yager, Daniel B. Work, Arnab Chakraborty, Richard B. Sowers

Abstract

This work presents a new method to quantify connectivity in transportation networks. Inspired by the field of topological data analysis, we propose a novel approach to explore the robustness of road network connectivity in the presence of congestion on the roadway. The robustness of the pattern is summarized in a congestion barcode, which can be constructed directly from traffic datasets commonly used for navigation. As an initial demonstration, we illustrate the main technique on a publicly available traffic dataset in a neighborhood in New York City.

A Topological Framework for Identifying Phenomenological Bifurcations in Stochastic Dynamical Systems (2024)

Sunia Tanweer, Firas A. Khasawneh, Elizabeth Munch, Joshua R. Tempelman

Abstract

Changes in the parameters of dynamical systems can cause the state of the system to shift between different qualitative regimes. These shifts, known as bifurcations, are critical to study as they can indicate when the system is about to undergo harmful changes in its behavior. In stochastic dynamical systems, there is particular interest in P-type (phenomenological) bifurcations, which can include transitions from a monostable state to multi-stable states, the appearance of stochastic limit cycles and other features in the probability density function (PDF) of the system’s state. Current practices are limited to systems with small state spaces, cannot detect all possible behaviors of the PDFs and mandate human intervention for visually identifying the change in the PDF. In contrast, this study presents a new approach based on Topological Data Analysis that uses superlevel persistence to mathematically quantify P-type bifurcations in stochastic systems through a “homological bifurcation plot”—which shows the changing ranks of 0th and 1st homology groups, through Betti vectors. Using these plots, we demonstrate the successful detection of P-bifurcations on the stochastic Duffing, Raleigh-Vander Pol and Quintic Oscillators given their analytical PDFs, and elaborate on how to generate an estimated homological bifurcation plot given a kernel density estimate (KDE) of these systems by employing a tool for finding topological consistency between PDFs and KDEs.

Investigation of Flash Crash via Topological Data Analysis (2020)

Wonse Kim, Younng-Jin Kim, Gihyun Lee, Woong Kook

Abstract

Topological data analysis has been acknowledged as one of the most successful mathematical data analytic methodologies in various fields including medicine, genetics, and image analysis. In this paper, we explore the potential of this methodology in finance by applying persistence landscape and dynamic time series analysis to analyze an extreme event in the stock market, known as Flash Crash. We will provide results of our empirical investigation to confirm the effectiveness of our new method not only for the characterization of this extreme event but also for its prediction purposes.

Topological Data Analysis for Data Mining Small Educational Samples With Application to Studies of the Gifted (2017)

Colleen Farrelly

Abstract

Studies of highly and profoundly gifted children typically involve small sample sizes, as the population is relatively rare, and many statistical methods cannot handle these small sample sizes well. However, topological data analysis (TDA) tools are robust, even with very small samples, and can provide useful information as well as robust statistical tests. This study demonstrates these capabilities on data simulated from previous talent search results (small and large samples), as well as a subset of data from Ruf’s cohort of gifted children. TDA methods show strong, robust performance and uncover insight into sample characteristics and subgroups, including the appearance of similar subgroups across assessment populations.

Topological Data Analysis for Arrhythmia Detection Through Modular Neural Networks (2020)

Meryll Dindin, Yuhei Umeda, Frederic Chazal

Abstract

This paper presents an innovative and generic deep learning approach to monitor heart conditions from ECG signals. We focus our attention on both the detection and classification of abnormal heartbeats, known as arrhythmia. We strongly insist on generalization throughout the construction of a shallow deep-learning model that turns out to be effective for new unseen patient. The novelty of our approach relies on the use of topological data analysis to deal with individual differences. We show that our structure reaches the performances of the state-of-the-art methods for both arrhythmia detection and classification.

Using Persistent Homology to Quantify a Diurnal Cycle in Hurricanes (2020)

Sarah Tymochko, Elizabeth Munch, Jason Dunion, Kristen Corbosiero, Ryan Torn

Abstract

The diurnal cycle of tropical cyclones (TCs) is a daily cycle in clouds that appears in satellite images and may have implications for TC structure and intensity. The diurnal pattern can be seen in infrared (IR) satellite imagery as cyclical pulses in the cloud field that propagate radially outward from the center of nearly all Atlantic-basin TCs. These diurnal pulses, a distinguishing characteristic of this diurnal cycle, begin forming in the storm’s inner core near sunset each day, appearing as a region of cooling cloud-top temperatures. The area of cooling takes on a ring-like appearance as cloud-top warming occurs on its inside edge and the cooling moves away from the storm overnight, reaching several hundred kilometers from the circulation center by the following afternoon. The state-of-the-art TC diurnal cycle measurement in IR satellite imagery has a limited ability to analyze the behavior beyond qualitative observations. We present a method for quantifying the TC diurnal cycle using one-dimensional persistent homology, a tool from Topological Data Analysis, by tracking maximum persistence and quantifying the cycle using the discrete Fourier transform. Using Geostationary Operational Environmental Satellite IR imagery from Hurricanes Felix and Ivan, our method is able to detect an approximate daily cycle.

Characterizing Scales of Genetic Recombination and Antibiotic Resistance in Pathogenic Bacteria Using Topological Data Analysis (2014)

Kevin J. Emmett, Raul Rabadan

Abstract

Pathogenic bacteria present a large disease burden on human health. Control of these pathogens is hampered by rampant lateral gene transfer, whereby pathogenic strains may acquire genes conferring resistance to common antibiotics. Here we introduce tools from topological data analysis to characterize the frequency and scale of lateral gene transfer in bacteria, focusing on a set of pathogens of significant public health relevance. As a case study, we examine the spread of antibiotic resistance in Staphylococcus aureus. Finally, we consider the possible role of the human microbiome as a reservoir for antibiotic resistance genes.

Persistent Betti Numbers for a Noise Tolerant Shape-Based Approach to Image Retrieval (2011)

Patrizio Frosini, Claudia Landi

Abstract

In content-based image retrieval a major problem is the presence of noisy shapes. It is well known that persistent Betti numbers are a shape descriptor that admits a dissimilarity distance, the matching distance, stable under continuous shape deformations. In this paper we focus on the problem of dealing with noise that changes the topology of the studied objects. We present a general method to turn persistent Betti numbers into stable descriptors also in the presence of topological changes. Retrieval tests on the Kimia-99 database show the effectiveness of the method.

Raman Spectroscopy and Topological Machine Learning for Cancer Grading (2023)

Francesco Conti, Mario D’Acunto, Claudia Caudai, Sara Colantonio, Raffaele Gaeta, Davide Moroni, Maria Antonietta Pascali

Abstract

In the last decade, Raman Spectroscopy is establishing itself as a highly promising technique for the classification of tumour tissues as it allows to obtain the biochemical maps of the tissues under investigation, making it possible to observe changes among different tissues in terms of biochemical constituents (proteins, lipid structures, DNA, vitamins, and so on). In this paper, we aim to show that techniques emerging from the cross-fertilization of persistent homology and machine learning can support the classification of Raman spectra extracted from cancerous tissues for tumour grading. In more detail, topological features of Raman spectra and machine learning classifiers are trained in combination as an automatic classification pipeline in order to select the best-performing pair. The case study is the grading of chondrosarcoma in four classes: cross and leave-one-patient-out validations have been used to assess the classification accuracy of the method. The binary classification achieves a validation accuracy of 81% and a test accuracy of 90%. Moreover, the test dataset has been collected at a different time and with different equipment. Such results are achieved by a support vector classifier trained with the Betti Curve representation of the topological features extracted from the Raman spectra, and are excellent compared with the existing literature. The added value of such results is that the model for the prediction of the chondrosarcoma grading could easily be implemented in clinical practice, possibly integrated into the acquisition system.

Persistent Homology on Grassmann Manifolds for Analysis of Hyperspectral Movies (2016)

Sofya Chepushtanova, Michael Kirby, Chris Peterson, Lori Ziegelmeier

Abstract

The existence of characteristic structure, or shape, in complex data sets has been recognized as increasingly important for mathematical data analysis. This realization has motivated the development of new tools such as persistent homology for exploring topological invariants, or features, in large data sets. In this paper, we apply persistent homology to the characterization of gas plumes in time dependent sequences of hyperspectral cubes, i.e. the analysis of 4-way arrays. We investigate hyperspectral movies of Long-Wavelength Infrared data monitoring an experimental release of chemical simulant into the air. Our approach models regions of interest within the hyperspectral data cubes as points on the real Grassmann manifold Gk,ï źn whose points parameterize the k-dimensional subspaces of \$\$\mathbb \R\\textasciicircumn\$\$Rn, contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows a sequence of time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmann manifold. This motivates the search for topological features, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the Grassmann manifold. The proposed mathematical model affords the processing of large data sets while retaining valuable discriminatory information. In this paper, we discuss how embedding our data in the Grassmann manifold, together with topological data analysis, captures dynamical events that occur as the chemical plume is released and evolves.

A Multi-Parameter Persistence Framework for Mathematical Morphology (2021)

Yu-Min Chung, Sarah Day, Chuan-Shen Hu

Abstract

The field of mathematical morphology offers well-studied techniques for image processing. In this work, we view morphological operations through the lens of persistent homology, a tool at the heart of the field of topological data analysis. We demonstrate that morphological operations naturally form a multiparameter filtration and that persistent homology can then be used to extract information about both topology and geometry in the images as well as to automate methods for optimizing the study and rendering of structure in images. For illustration, we apply this framework to analyze noisy binary, grayscale, and color images.

Manifold Learning for Coherent Design Interpolation Based on Geometrical and Topological Descriptors (2023)

D. Muñoz, O. Allix, F. Chinesta, J. J. Ródenas, E. Nadal

Abstract

In the context of intellectual property in the manufacturing industry, know-how is referred to practical knowledge on how to accomplish a specific task. This know-how is often difficult to be synthesised in a set of rules or steps as it remains in the intuition and expertise of engineers, designers, and other professionals. Today, a new research line in this concern spot-up thanks to the explosion of Artificial Intelligence and Machine Learning algorithms and its alliance with Computational Mechanics and Optimisation tools. However, a key aspect with industrial design is the scarcity of available data, making it problematic to rely on deep-learning approaches. Assuming that the existing designs live in a manifold, in this paper, we propose a synergistic use of existing Machine Learning tools to infer a reduced manifold from the existing limited set of designs and, then, to use it to interpolate between the individuals, working as a generator basis, to create new and coherent designs. For this, a key aspect is to be able to properly interpolate in the reduced manifold, which requires a proper clustering of the individuals. From our experience, due to the scarcity of data, adding topological descriptors to geometrical ones considerably improves the quality of the clustering. Thus, a distance, mixing topology and geometry is proposed. This distance is used both, for the clustering and for the interpolation. For the interpolation, relying on optimal transport appear to be mandatory. Examples of growing complexity are proposed to illustrate the goodness of the method.

Learning Representations of Persistence Barcodes (2019)

Christoph D. Hofer, Roland Kwitt, Marc Niethammer

Abstract

We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals.

Uncovering Precision Phenotype-Biomarker Associations in Traumatic Brain Injury Using Topological Data Analysis (2017)

Jessica L. Nielson, Shelly R. Cooper, John K. Yue, Marco D. Sorani, Tomoo Inoue, Esther L. Yuh, Pratik Mukherjee, Tanya C. Petrossian, Jesse Paquette, Pek Y. Lum, Gunnar E. Carlsson, Mary J. Vassar, Hester F. Lingsma, Wayne A. Gordon, Alex B. Valadka, David O. Okonkwo, Geoffrey T. Manley, Adam R. Ferguson, Track-Tbi Investigators

Abstract

Background Traumatic brain injury (TBI) is a complex disorder that is traditionally stratified based on clinical signs and symptoms. Recent imaging and molecular biomarker innovations provide unprecedented opportunities for improved TBI precision medicine, incorporating patho-anatomical and molecular mechanisms. Complete integration of these diverse data for TBI diagnosis and patient stratification remains an unmet challenge. Methods and findings The Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Pilot multicenter study enrolled 586 acute TBI patients and collected diverse common data elements (TBI-CDEs) across the study population, including imaging, genetics, and clinical outcomes. We then applied topology-based data-driven discovery to identify natural subgroups of patients, based on the TBI-CDEs collected. Our hypothesis was two-fold: 1) A machine learning tool known as topological data analysis (TDA) would reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery, and 2) TDA-identified biomarkers would significantly predict patient outcome recovery after TBI using more traditional methods of univariate statistical tests. TDA algorithms organized and mapped the data of TBI patients in multidimensional space, identifying a subset of mild TBI patients with a specific multivariate phenotype associated with unfavorable outcome at 3 and 6 months after injury. Further analyses revealed that this patient subset had high rates of post-traumatic stress disorder (PTSD), and enrichment in several distinct genetic polymorphisms associated with cellular responses to stress and DNA damage (PARP1), and in striatal dopamine processing (ANKK1, COMT, DRD2). Conclusions TDA identified a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms. Machine learning methods such as TDA may provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials in TBI patients. Trial Registration ClinicalTrials.gov Identifier NCT01565551

Community Resources

Code (Software)

Fruit Flies and Moduli: Interactions Between Biology and Mathematics (2015)

Ezra Miller

Abstract

Possibilities for using geometry and topology to analyze statistical problems in biology raise a host of novel questions in geometry, probability, algebra, and combinatorics that demonstrate the power of biology to influence the future of pure mathematics. This expository article is a tour through some biological explorations and their mathematical ramifications. The article starts with evolution of novel topological features in wing veins of fruit flies, which are quantified using the algebraic structure of multiparameter persistent homology. The statistical issues involved highlight mathematical implications of sampling from moduli spaces. These lead to geometric probability on stratified spaces, including the sticky phenomenon for Frechet means and the origin of this mathematical area in the reconstruction of phylogenetic trees.

Topological Data Analysis of Escherichia Coli O157:H7 and Non-O157 Survival in Soils (2014)

Abasiofiok M. Ibekwe, Jincai Ma, David E. Crowley, Ching-Hong Yang, Alexis M. Johnson, Tanya C. Petrossian, Pek Y. Lum

Abstract

Shiga toxin-producing E. coli O157:H7 and non-O157 have been implicated in many foodborne illnesses caused by the consumption of contaminated fresh produce. However, data on their persistence in soils are limited due to the complexity in datasets generated from different environmental variables and bacterial taxa. There is a continuing need to distinguish the various environmental variables and different bacterial groups to understand the relationships among these factors and the pathogen survival. Using an approach called Topological Data Analysis (TDA); we reconstructed the relationship structure of E. coli O157 and non-O157 survival in 32 soils (16 organic and 16 conventionally managed soils) from California (CA) and Arizona (AZ) with a multi-resolution output. In our study, we took a community approach based on total soil microbiome to study community level survival and examining the network of the community as a whole and the relationship between its topology and biological processes. TDA produces a geometric representation of complex data sets. Network analysis showed that Shiga toxin negative strain E. coli O157:H7 4554 survived significantly longer in comparison to E. coli O157:H7 EDL933, while the survival time of E. coli O157:NM was comparable to that of E. coli O157:H7 strain 933 in all of the tested soils. Two non-O157 strains, E. coli O26:H11 and E. coli O103:H2 survived much longer than E. coli O91:H21 and the three strains of E. coli O157. We show that there are complex interactions between E. coli strain survival, microbial community structures, and soil parameters.

Mapping Firms' Locations in Technological Space: A Topological Analysis of Patent Statistics (2020)

Emerson G. Escolar, Yasuaki Hiraoka, Mitsuru Igami, Yasin Ozcan

Abstract

Where do ﬁrms innovate? Mapping their locations in technological space is diﬃcult, because it is high dimensional and unstructured. We address this issue by using a method in computational topology called the Mapper algorithm, which combines local clustering with global reconstruction. We apply this method to a panel of 333 major ﬁrms’ patent portfolios in 1976–2005 across 430 technological areas. Results suggest the Mapper graph captures salient patterns in ﬁrms’ patenting histories, and our measures of their uniqueness (the length of “ﬂares”) are correlated with ﬁrms’ ﬁnancial performances in a statistically and economically signiﬁcant manner. We then compare this approach with a widely used clustering method by Jaﬀe (1989) to highlight additional ﬁndings.

Airway Pathological Heterogeneity in Asthma: Visualization of Disease Microclusters Using Topological Data Analysis (2018)

Salman Siddiqui, Aarti Shikotra, Matthew Richardson, Emma Doran, David Choy, Alex Bell, Cary D. Austin, Jeffrey Eastham-Anderson, Beverley Hargadon, Joseph R. Arron, Andrew Wardlaw, Christopher E. Brightling, Liam G. Heaney, Peter Bradding

Abstract

Background Asthma is a complex chronic disease underpinned by pathological changes within the airway wall. How variations in structural airway pathology and cellular inflammation contribute to the expression and severity of asthma are poorly understood. Objectives Therefore we evaluated pathological heterogeneity using topological data analysis (TDA) with the aim of visualizing disease clusters and microclusters. Methods A discovery population of 202 adult patients (142 asthmatic patients and 60 healthy subjects) and an external replication population (59 patients with severe asthma) were evaluated. Pathology and gene expression were examined in bronchial biopsy samples. TDA was applied by using pathological variables alone to create pathology-driven visual networks. Results In the discovery cohort TDA identified 4 groups/networks with multiple microclusters/regions of interest that were masked by group-level statistics. Specifically, TDA group 1 consisted of a high proportion of healthy subjects, with a microcluster representing a topological continuum connecting healthy subjects to patients with mild-to-moderate asthma. Three additional TDA groups with moderate-to-severe asthma (Airway Smooth MuscleHigh, Reticular Basement MembraneHigh, and RemodelingLow groups) were identified and contained numerous microclusters with varying pathological and clinical features. Mutually exclusive TH2 and TH17 tissue gene expression signatures were identified in all pathological groups. Discovery and external replication applied to the severe asthma subgroup identified only highly similar “pathological data shapes” through analyses of persistent homology. Conclusions We have identified and replicated novel pathological phenotypes of asthma using TDA. Our methodology is applicable to other complex chronic diseases.

Classification of COVID-19 via Homology of CT-SCAN (2021)

Sohail Iqbal, H. Fareed Ahmed, Talha Qaiser, Muhammad Imran Qureshi, Nasir Rajpoot

Abstract

In this worldwide spread of SARS-CoV-2 (COVID-19) infection, it is of utmost importance to detect the disease at an early stage especially in the hot spots of this epidemic. There are more than 110 Million infected cases on the globe, sofar. Due to its promptness and effective results computed tomography (CT)-scan image is preferred to the reverse-transcription polymerase chain reaction (RT-PCR). Early detection and isolation of the patient is the only possible way of controlling the spread of the disease. Automated analysis of CT-Scans can provide enormous support in this process. In this article, We propose a novel approach to detect SARS-CoV-2 using CT-scan images. Our method is based on a very intuitive and natural idea of analyzing shapes, an attempt to mimic a professional medic. We mainly trace SARS-CoV-2 features by quantifying their topological properties. We primarily use a tool called persistent homology, from Topological Data Analysis (TDA), to compute these topological properties. We train and test our model on the "SARS-CoV-2 CT-scan dataset" i̧tep\soares2020sars\, an open-source dataset, containing 2,481 CT-scans of normal and COVID-19 patients. Our model yielded an overall benchmark F1 score of \$99.42\% \$, accuracy \$99.416\%\$, precision \$99.41\%\$, and recall \$99.42\%\$. The TDA techniques have great potential that can be utilized for efficient and prompt detection of COVID-19. The immense potential of TDA may be exploited in clinics for rapid and safe detection of COVID-19 globally, in particular in the low and middle-income countries where RT-PCR labs and/or kits are in a serious crisis.

Topological Singularity Detection at Multiple Scales (2023)

Julius von Rohrscheidt, Bastian Rieck

Abstract

The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.

Conserved Abundance and Topological Features in Chromatin-Remodeling Protein Interaction Networks (2015)

Mihaela E Sardiu, Joshua M Gilmore, Brad D Groppe, Damir Herman, Sreenivasa R Ramisetty, Yong Cai, Jingji Jin, Ronald C Conaway, Joan W Conaway, Laurence Florens, Michael P Washburn

Abstract

Abstract The study of conserved protein interaction networks seeks to better understand the evolution and regulation of protein interactions. Here, we present a quantitative proteomic analysis of 18 orthologous baits from three distinct chromatin-remodeling complexes in Saccharomyces cerevisiae and Homo sapiens. We demonstrate that abundance levels of orthologous proteins correlate strongly between the two organisms and both networks have highly similar topologies. We therefore used the protein abundances in one species to cross-predict missing protein abundance levels in the other species. Lastly, we identified a novel conserved low-abundance subnetwork further demonstrating the value of quantitative analysis of networks.

Topology-Driven Trajectory Synthesis With an Example on Retinal Cell Motions (2014)

Chen Gu, Leonidas Guibas, Michael Kerber

Abstract

We design a probabilistic trajectory synthesis algorithm for generating time-varying sequences of geometric configuration data. The algorithm takes a set of observed samples (each may come from a different trajectory) and simulates the dynamic evolution of the patterns in O(n2 logn) time. To synthesize geometric configurations with indistinct identities, we use the pair correlation function to summarize point distribution, and α-shapes to maintain topological shape features based on a fast persistence matching approach. We apply our method to build a computational model for the geometric transformation of the cone mosaic in retinitis pigmentosa — an inherited and currently untreatable retinal degeneration.

Topological Data Analysis Reveals Robust Alterations in the Whole-Brain and Frontal Lobe Functional Connectomes in Attention-Deficit/Hyperactivity Disorder (2020)

Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Sarael Alcauter

Abstract

Visual Abstract \textlessimg class="highwire-fragment fragment-image" alt="Figure" src="https://www.eneuro.org/content/eneuro/7/3/ENEURO.0543-19.2020/F1.medium.gif" width="369" height="440"/\textgreaterDownload figureOpen in new tabDownload powerpoint Attention-deficit/hyperactivity disorder (ADHD) is a developmental disorder characterized by difficulty to control the own behavior. Neuroimaging studies have related ADHD with the interplay of fronto-parietal attention systems with the default mode network (DMN; Castellanos and Aoki, 2016). However, some results have been inconsistent, potentially due to methodological differences in the analytical strategies when defining the brain functional network, i.e., the functional connectivity threshold and/or the brain parcellation scheme. Here, we make use of topological data analysis (TDA) to explore the brain connectome as a function of the filtration value (i.e., the connectivity threshold), instead of using a static connectivity threshold. Specifically, we characterized the transition from all nodes being isolated to being connected into a single component as a function of the filtration value. We explored the utility of such a method to identify differences between 81 children with ADHD (45 male, age: 7.26–17.61 years old) and 96 typically developing children (TDC; 59 male, age: 7.17–17.96 years old), using a public dataset of resting state (rs)fMRI in human subjects. Results were highly congruent when using four different brain segmentations (atlases), and exhibited significant differences for the brain topology of children with ADHD, both at the whole-brain network and the functional subnetwork levels, particularly involving the frontal lobe and the DMN. Therefore, this is a solid approach that complements connectomics-related methods and may contribute to identify the neurophysio-pathology of ADHD.

A Framework for Topological Music Analysis (TMA) (2022)

Alberto Alcalá-Alvarez, Pablo Padilla-Longoria

Abstract

In the present article we describe and discuss a framework for applying different topological data analysis (TDA) techniques to a music fragment given as a score in traditional Western notation. We first consider different sets of points in Euclidean spaces of different dimensions that correspond to musical events in the score, and obtain their persistent homology features. Then we introduce two families of simplicial complexes that can be associated to chord sequences, and calculate their main homological descriptors. These complexes lead us to the definition of dynamical systems modeling harmonic progressions. Finally, we show the results of applying the described methods to the analysis and stylistic comparison of fragments from three Brandenburg Concertos by J.S. Bach and two Graffiti by Mexican composer Armando Luna.

Topology-Based Kernels With Application to Inference Problems in Alzheimer’s Disease (2011)

Deepti Pachauri, Chris Hinrichs, Moo K. Chung, Sterling C. Johnson, Vikas Singh

Abstract

Alzheimer’s disease (AD) research has recently witnessed a great deal of activity focused on developing new statistical learning tools for automated inference using imaging data. The workhorse for many of these techniques is the Support Vector Machine (SVM) framework (or more generally kernel based methods). Most of these require, as a first step, specification of a kernel matrix between input examples (i.e., images). The inner product between images Ii and Ij in a feature space can generally be written in closed form, and so it is convenient to treat as “given”. However, in certain neuroimaging applications such an assumption becomes problematic. As an example, it is rather challenging to provide a scalar measure of similarity between two instances of highly attributed data such as cortical thickness measures on cortical surfaces. Note that cortical thickness is known to be discriminative for neurological disorders, so leveraging such information in an inference framework, especially within a multi-modal method, is potentially advantageous. But despite being clinically meaningful, relatively few works have successfully exploited this measure for classification or regression. Motivated by these applications, our paper presents novel techniques to compute similarity matrices for such topologically-based attributed data. Our ideas leverage recent developments to characterize signals (e.g., cortical thickness) motivated by the persistence of their topological features, leading to a scheme for simple constructions of kernel matrices. As a proof of principle, on a dataset of 356 subjects from the ADNI study, we report good performance on several statistical inference tasks without any feature selection, dimensionality reduction, or parameter tuning.

Community Resources

Data

A Proof-of-Concept Investigation Into Predicting Follicular Carcinoma on Ultrasound Using Topological Data Analysis and Radiomics (2025)

Andrew M. Thomas, Ann C. Lin, Grace Deng, Yuchen Xu, Gustavo Fernandez Ranvier, Aida Taye, David S. Matteson, Denise Lee

Abstract

Background Sonographic risk patterns identified in established risk stratification systems (RSS) may not accurately stratify follicular carcinoma from adenoma, which share many similar US characteristics. The purpose of this study is to investigate the performance of a multimodal machine learning model utilizing radiomics and topological data analysis (TDA) to predict malignancy in follicular thyroid neoplasms on ultrasound. Patients & Methods This is a retrospective study of patients who underwent thyroidectomy with pathology confirmed follicular adenoma or carcinoma at a single academic medical center between 2010 and 2022. Features derived from radiomics and TDA were calculated from processed ultrasound images and high-dimensional features in each modality were projected onto their first two principal components. Logistic regression with L2 penalty was used to predict malignancy and performance was evaluated using leave-one-out cross-validation and area under the curve (AUC). Results Patients with follicular adenomas (n = 7) and follicular carcinomas (n = 11) with available imaging were included. The best multimodal model achieved an AUC of 0.88 (95% CI: [0.85, 1]), whereas the best radiomics model achieved an AUC of 0.68 (95% CI: [0.61, 0.84]). Conclusions We demonstrate that inclusion of topological features yields strong improvement over radiomics-based features alone in the prediction of follicular carcinoma on ultrasound. Despite low volume data, the TDA features explicitly capture shape information that likely augments performance of the multimodal machine learning model. This approach suggests that a quantitative based US RSS may contribute to the preoperative prediction of follicular carcinoma.

Community Resources

Code

Robust Crossings Detection in Noisy Signals Using Topological Signal Processing (2024)

Sunia Tanweer, Firas A. Khasawneh, Elizabeth Munch

Abstract

This article explores a novel method of bracketing zero-crossings for both 1-D functions and discretely sampled time series by the application of 0-D persistent homology from algebraic topology. We introduce an algorithm and demonstrate its capability of detecting crossing in noisy signals across various sampling frequencies. Compared to other software-based methods for crossing-detection in signals, our approach is typically faster, shows a higher accuracy, and has the unique ability to identify all roots within the provided interval instead of detecting only one out of all. We also discuss different options for mathematically estimating the persistence threshold— a parameter which impacts and controls the correct bracketing of roots. Finally, we explore the potential of extending our algorithm to higher dimensions.

Multiphase Mixing Quantification by Computational Homology and Imaging Analysis (2011)

Jianxin Xu, Hua Wang, Hui Fang

Abstract

The purpose of this study is to introduce a new technique for quantifying the efficiency of multiphase mixing. This technique based on algebraic topology is illustrated by using the hydraulic modeling of gas agitated reactors stirred by top lance gas injection and image analysis. The zeroth Betti numbers are used to estimate the numbers of pieces in the patterns, leading to a useful parameter to characterize the mixture homogeneity. The first Betti numbers are introduced to characterize the nonhomogeneity of the mixture. The mixing efficiency can be characterized by the Betti numbers for binary images of the patterns. This novel method may be applied for studying a variety of multiphase mixing problems in which multiphase components or tracers are visually distinguishable.

Topological Data Analysis Reveals a Core Gene Expression Backbone That Defines Form and Function Across Flowering Plants (2023)

Sourabh Palande, Joshua A. M. Kaste, Miles D. Roberts, Kenia Segura Abá, Carly Claucherty, Jamell Dacon, Rei Doko, Thilani B. Jayakody, Hannah R. Jeffery, Nathan Kelly, Andriana Manousidaki, Hannah M. Parks, Emily M. Roggenkamp, Ally M. Schumacher, Jiaxin Yang, Sarah Percival, Jeremy Pardo, Aman Y. Husbands, Arjun Krishnan, Beronda L. Montgomery, Elizabeth Munch, Addie M. Thompson, Alejandra Rougon-Cardoso, Daniel H. Chitwood, Robert VanBuren

Abstract

Since they emerged approximately 125 million years ago, flowering plants have evolved to dominate the terrestrial landscape and survive in the most inhospitable environments on earth. At their core, these adaptations have been shaped by changes in numerous, interconnected pathways and genes that collectively give rise to emergent biological phenomena. Linking gene expression to morphological outcomes remains a grand challenge in biology, and new approaches are needed to begin to address this gap. Here, we implemented topological data analysis (TDA) to summarize the high dimensionality and noisiness of gene expression data using lens functions that delineate plant tissue and stress responses. Using this framework, we created a topological representation of the shape of gene expression across plant evolution, development, and environment for the phylogenetically diverse flowering plants. The TDA-based Mapper graphs form a well-defined gradient of tissues from leaves to seeds, or from healthy to stressed samples, depending on the lens function. This suggests that there are distinct and conserved expression patterns across angiosperms that delineate different tissue types or responses to biotic and abiotic stresses. Genes that correlate with the tissue lens function are enriched in central processes such as photosynthetic, growth and development, housekeeping, or stress responses. Together, our results highlight the power of TDA for analyzing complex biological data and reveal a core expression backbone that defines plant form and function.

Community Resources

Code

Topological Detection of Trojaned Neural Networks (2021)

Songzhu Zheng, Yikai Zhang, Hubert Wagner, Mayank Goswami, Chao Chen

Abstract

Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model’s behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles, we discover subtle – yet critical – structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from shallow to deep layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.

Topological Detection of Trojaned Neural Networks (2021)

Songzhu Zheng, Yikai Zhang, Hubert Wagner, Mayank Goswami, Chao Chen

Abstract

Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model’s behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles, we discover subtle – yet critical – structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from shallow to deep layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.

Unsupervised Topological Learning for Identification of Atomic Structures (2022)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

We propose an unsupervised learning methodology with descriptors based on topological data analysis (TDA) concepts to describe the local structural properties of materials at the atomic scale. Based only on atomic positions and without a priori knowledge, our method allows for an autonomous identification of clusters of atomic structures through a Gaussian mixture model. We apply successfully this approach to the analysis of elemental Zr in the crystalline and liquid states as well as homogeneous nucleation events under deep undercooling conditions. This opens the way to deeper and autonomous study of complex phenomena in materials at the atomic scale.

Topology of Force Networks in Granular Media Under Impact (2017)

M. X. Lim, R. P. Behringer

Abstract

We investigate the evolution of the force network in experimental systems of two-dimensional granular materials under impact. We use the first Betti number, , and persistence diagrams, as measures of the topological properties of the force network. We show that the structure of the network has a complex, hysteretic dependence on both the intruder acceleration and the total force response of the granular material. can also distinguish between the nonlinear formation and relaxation of the force network. In addition, using the persistence diagram of the force network, we show that the size of the loops in the force network has a Poisson-like distribution, the characteristic size of which changes over the course of the impact.

The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis (2024)

Ondřej Draganov, Steven Skiena

Abstract

Word embeddings represent language vocabularies as clouds of d-dimensional points. We investigate how information is conveyed by the general shape of these clouds, instead of representing the semantic meaning of each token. Specifically, we use the notion of persistent homology from topological data analysis (TDA) to measure the distances between language pairs from the shape of their unlabeled embeddings. These distances quantify the degree of non-isometry of the embeddings. To distinguish whether these differences are random training errors or capture real information about the languages, we use the computed distance matrices to construct language phylogenetic trees over 81 Indo-European languages. Careful evaluation shows that our reconstructed trees exhibit strong and statistically-significant similarities to the reference.

Community Resources

Code
Data

Phase-Field Investigation of the Coarsening of Porous Structures by Surface Diffusion (2019)

Pierre-Antoine Geslin, Mickaël Buchet, Takeshi Wada, Hidemi Kato

Abstract

Nano and microporous connected structures have attracted increasing attention in the past decades due to their high surface area, presenting interesting properties for a number of applications. These structures generally coarsen by surface diffusion, leading to an enlargement of the structure characteristic length scale. We propose to study this coarsening behavior using a phase-field model for surface diffusion. In addition to reproducing the expected scaling law, our simulations enable to investigate precisely the evolution of the topological and morphological characteristics along the coarsening process. In particular, we show that after a transient regime, the coarsening is self-similar as exhibited by the evolution of both morphological and topological features. In addition, the influence of surface anisotropy is discussed and comparisons with experimental tomographic observations are presented.

Inference of Ancestral Recombination Graphs Through Topological Data Analysis (2016)

Pablo G. Cámara, Arnold J. Levine, Raúl Rabadán

Abstract

The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands., Evolution occurs through different mechanisms, including point mutations, gene duplication, horizontal gene transfer, and recombinations. Some of these mechanisms cannot be captured by tree graphs. We present a framework, based on the mathematical tools of computational topology, that can explicitly accommodate both recombination and mutation events across the evolutionary history of a sample of genomic sequences. This approach generates a new type of summary graph and algebraic structures that provide quantitative information on the evolutionary scale and frequency of recombination events. The accompanying software, TARGet, is applied to several examples, including migration between sexually-reproducing populations, human recombination, and recombination in Darwin’s finches.

Lung Topology Characteristics in Patients With Chronic Obstructive Pulmonary Disease (2018)

Francisco Belchi, Mariam Pirashvili, Joy Conway, Michael Bennett, Ratko Djukanovic, Jacek Brodzki

Abstract

Quantitative features that can currently be obtained from medical imaging do not provide a complete picture of Chronic Obstructive Pulmonary Disease (COPD). In this paper, we introduce a novel analytical tool based on persistent homology that extracts quantitative features from chest CT scans to describe the geometric structure of the airways inside the lungs. We show that these new radiomic features stratify COPD patients in agreement with the GOLD guidelines for COPD and can distinguish between inspiratory and expiratory scans. These CT measurements are very different to those currently in use and we demonstrate that they convey significant medical information. The results of this study are a proof of concept that topological methods can enhance the standard methodology to create a finer classification of COPD and increase the possibilities of more personalized treatment.

Persistent Homology Machine Learning for Fingerprint Classification (2019)

N. Giansiracusa, R. Giansiracusa, C. Moon

Abstract

The fingerprint classification problem is to sort fingerprints into predetermined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near state-of-the-art classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3-dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint ink-roll images, which yields a lower accuracy rate but still shows promise; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD-27.

Chatter Detection in Turning Using Persistent Homology (2016)

Firas A. Khasawneh, Elizabeth Munch

Abstract

This paper describes a new approach for ascertaining the stability of stochastic dynamical systems in their parameter space by examining their time series using topological data analysis (TDA). We illustrate the approach using a nonlinear delayed model that describes the tool oscillations due to self-excited vibrations in turning. Each time series is generated using the Euler-Maruyama method and a corresponding point cloud is obtained using the Takens embedding. The point cloud can then be analyzed using a tool from TDA known as persistent homology. The results of this study show that the described approach can be used for analyzing datasets of delay dynamical systems generated both from numerical simulation and experimental data. The contributions of this paper include presenting for the first time a topological approach for investigating the stability of a class of nonlinear stochastic delay equations, and introducing a new application of TDA to machining processes.

Determining Clinically Relevant Features in Cytometry Data Using Persistent Homology (2022)

Soham Mukherjee, Darren Wethington, Tamal K. Dey, Jayajit Das

Abstract

Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as ‘elbows’. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.

Community Resources

Code
Data

Using Persistent Homology as a New Approach for Super-Resolution Localization Microscopy Data Analysis and Classification of γH2AX Foci/Clusters (2018)

Andreas Hofmann, Matthias Krufczik, Dieter W. Heermann, Michael Hausmann

Abstract

DNA double strand breaks (DSB) are the most severe damages in chromatin induced by ionizing radiation. In response to such environmentally determined stress situations, cells have developed repair mechanisms. Although many investigations have contributed to a detailed understanding of repair processes, e.g., homologous recombination repair or non-homologous end-joining, the question is not sufficiently answered, how a cell decides to apply a certain repair process at a certain damage site, since all different repair pathways could simultaneously occur in the same cell nucleus. One of the first processes after DSB induction is phosphorylation of the histone variant H2AX to γH2AX in the given surroundings of the damaged locus. Since the spatial organization of chromatin is not random, it may be conclusive that the spatial organization of γH2AX foci is also not random, and rather, contributes to accessibility of special repair proteins to the damaged site, and thus, to the following repair pathway at this given site. The aim of this article is to demonstrate a new approach to analyze repair foci by their topology in order to obtain a cell independent method of categorization. During the last decade, novel super-resolution fluorescence light microscopic techniques have enabled new insights into genome structure and spatial organization on the nano-scale in the order of 10 nm. One of these techniques is single molecule localization microscopy (SMLM) with which the spatial coordinates of single fluorescence molecules can precisely be determined and density and distance distributions can be calculated. This method is an appropriate tool to quantify complex changes of chromatin and to describe repair foci on the single molecule level. Based on the pointillist information obtained by SMLM from specifically labeled heterochromatin and γH2AX foci reflecting the chromatin morphology and repair foci topology, we have developed a new analytical methodology of foci or foci cluster characterization, respectively, by means of persistence homology. This method allows, for the first time, a cell independent comparison of two point distributions (here the point distributions of two γH2AX clusters) with each other of a selected ensample and to give a mathematical measure of their similarity. In order to demonstrate the feasibility of this approach, cells were irradiated by low LET (linear energy transfer) radiation with different doses and the heterochromatin and γH2AX foci were fluorescently labeled by antibodies for SMLM. By means of our new analysis method, we were able to show that the topology of clusters of γH2AX foci can be categorized depending on the distance to heterochromatin. This method opens up new possibilities to categorize spatial organization of point patterns by parameterization of topological similarity.

Finding Universal Structures in Quantum Many-Body Dynamics via Persistent Homology (2020)

Daniel Spitz, Jürgen Berges, Markus K. Oberthaler, Anna Wienhard

Abstract

Inspired by topological data analysis techniques, we introduce persistent homology observables and apply them in a geometric analysis of the dynamics of quantum field theories. As a prototype application, we consider simulated data of a two-dimensional Bose gas far from equilibrium. We discover a continuous spectrum of dynamical scaling exponents, which provides a refined classification of nonequilibrium universal phenomena. A possible explanation of the underlying processes is provided in terms of mixing wave turbulence and vortex kinetics components in point clouds. We find that the persistent homology scaling exponents are inherently linked to the geometry of the system, as the derivation of a packing relation reveals. The approach opens new ways of analyzing quantum many-body dynamics in terms of robust topological structures beyond standard field theoretic techniques.

Topological Extraction and Tracking of Defects in Crystal Structures (2011)

Sebastian Grottel, Carlos A. Dietrich, João L. D. Comba, Thomas Ertl

Abstract

Interfaces between materials with different mechanical properties play an important role in technical applications. Nowadays molecular dynamics simulations are used to observe the behavior of such compound materials at the atomic level. Due to different atom crystal sizes, dislocations in the atom crystal structure occur once external forces are applied, and it has been observed that studying the change of thesedislocations can provide further understanding of macroscopic attributes like elasticity and plasticity. Standard visualization techniques such as the rendering of individual atoms work for 2D data or sectional views; however, visualizingdislocations in 3D using such methods usually fail due to occlusion and clutter. In this work we propose to extract and visualize the structure ofdislocations, which summarizes the commonly employed filtered atomistic renderings into a concise representation. The benefits of our approach are clearer images while retaining relevant data and easier visual tracking of topological changes over time.

Efficient Map Reconstruction and Augmentation via Topological Methods (2015)

Suyi Wang, Yusu Wang, Yanjie Li

Abstract

In recent years, with the rapid growth in the amount of publicly available Volunteered Geographic Information (VGI) data, automatic map generation from GPS trajectories has attracted great attention. Maps generated from these data can for example complement commercial maps in less developed areas. Two main challenges in the automatic generation of maps from volunteered GPS data are the handling of noise and of non-homogeneous sampling of road segments (for example, roads in downtown area can receive significantly more GPS traces than roads in residential areas). In this paper, we present a novel framework for map reconstruction based on a topological idea: the Morse theory. In particular, the use of Morse theory and topological simplification allows us to handle the issues of both noise and non-homogeneous sampling in an elegant unified framework. Our algorithm is significantly simpler than previous approaches, both conceptually and implementation speaking. Little pre- and post-processing is required, and yet the algorithm can reconstruct robust road-networks from challenging data sets (such as GPS traces for Berlin or Beijing cities) that are comparable or better than the output of previous state-of-the-art approaches. The new algorithm is also orders of magnitude faster than previous approaches on large data sets (for example, the entire processing of the Berlin city data with about 27189 trajectories takes less than one minute).Furthermore, our framework can be easily extended to handle the map integration problem, where one wishes to integrate multiple maps into a single one. Here, roads in different maps can have different confidence levels, and higher confident roads will have larger influence in the final integrated road. We also present an effective algorithm for a slightly different map augmentation problem, where one wishes to augment a map, say G2, using partial but more trust-worthy map G1, in the sense that in the final map, information in G1 needs to be completely preserved.

Explainable Machine Learning Approach to Yield and Quality Improvements Using Deep Topological Data Analytics (2023)

Janhavi Giri, Attila Lengyel

Abstract

Abstract. In wafer fabrication, data is collected and analyzed to prevent process deviations that could affect product quality and wafer yield. However, the high-dimensional, sparse, and imbalanced nature of the data poses significant challenges to yield and quality root cause analysis. Deep Topological Data Analysis (DTDA) is an unsupervised machine learning method that clusters and models the data in the form of geometric objects such as graphs and their higher-dimensional versions. This method reduces the multidimensional dataset to two-dimensional networks or graphs, where each node represents a cluster of samples with similar characteristics, and an edge represents the presence of overlapping characteristics between the connecting nodes. DTDA provides insights into the necessary data elements required to conduct accurate analysis and helps engineers identify the features contributing to yield and quality issues, enabling corrective actions. Moreover, the approach prevents the waste of engineering resources and mitigates the impact on final manufacturing cost.

Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)

Bryan Bischof, Eric Bunch

Abstract

We experimentally investigate a collection of feature engineering pipelines for use with a CNN for classifying eyes-open or eyes-closed from electroencephalogram (EEG) time-series from the Bonn dataset. Using the Takens' embedding--a geometric representation of time-series--we construct simplicial complexes from EEG data. We then compare \$\epsilon\$-series of Betti-numbers and \$\epsilon\$-series of graph spectra (a novel construction)--two topological invariants of the latent geometry from these complexes--to raw time series of the EEG to fill in a gap in the literature for benchmarking. These methods, inspired by Topological Data Analysis, are used for feature engineering to capture local geometry of the time-series. Additionally, we test these feature pipelines' robustness to downsampling and data reduction. This paper seeks to establish clearer expectations for both time-series classification via geometric features, and how CNNs for time-series respond to data of degraded resolution.

Using Multidimensional Topological Data Analysis to Identify Traits of Hip Osteoarthritis (2018)

Jasmine Rossi‐deVries, Valentina Pedoia, Michael A. Samaan, Adam R. Ferguson, Richard B. Souza, Sharmila Majumdar

Abstract

Background Osteoarthritis (OA) is a multifaceted disease with many variables affecting diagnosis and progression. Topological data analysis (TDA) is a state-of-the-art big data analytics tool that can combine all variables into multidimensional space. TDA is used to simultaneously analyze imaging and gait analysis techniques. Purpose To identify biochemical and biomechanical biomarkers able to classify different disease progression phenotypes in subjects with and without radiographic signs of hip OA. Study Type Longitudinal study for comparison of progressive and nonprogressive subjects. Population In all, 102 subjects with and without radiographic signs of hip osteoarthritis. Field Strength/Sequence 3T, SPGR 3D MAPSS T1ρ/T2, intermediate-weighted fat-suppressed fast spin-echo (FSE). Assessment Multidimensional data analysis including cartilage composition, bone shape, Kellgren–Lawrence (KL) classification of osteoarthritis, scoring hip osteoarthritis with MRI (SHOMRI), hip disability and osteoarthritis outcome score (HOOS). Statistical Tests Analysis done using TDA, Kolmogorov–Smirnov (KS) testing, and Benjamini-Hochberg to rank P-value results to correct for multiple comparisons. Results Subjects in the later stages of the disease had an increased SHOMRI score (P \textless 0.0001), increased KL (P = 0.0012), and older age (P \textless 0.0001). Subjects in the healthier group showed intact cartilage and less pain. Subjects found between these two groups had a range of symptoms. Analysis of this subgroup identified knee biomechanics (P \textless 0.0001) as an initial marker of the disease that is noticeable before the morphological progression and degeneration. Further analysis of an OA subgroup with femoroacetabular impingement (FAI) showed anterior labral tears to be the most significant marker (P = 0.0017) between those FAI subjects with and without OA symptoms. Data Conclusion The data-driven analysis obtained with TDA proposes new phenotypes of these subjects that partially overlap with the radiographic-based classical disease status classification and also shows the potential for further examination of an early onset biomechanical intervention. Level of Evidence: 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;48:1046–1058.

Community Resources

Code (Software)

Acute Lymphoblastic Leukemia Classification Using Persistent Homology (2024)

Waqar Hussain Shah, Abdullah Baloch, Rider Jaimes-Reátegui, Sohail Iqbal, Syeda Rafia Fatima, Alexander N. Pisarchik

Abstract

Acute Lymphoblastic Leukemia (ALL) is a prevalent form of childhood blood cancer characterized by the proliferation of immature white blood cells that rapidly replace normal cells in the bone marrow. The exponential growth of these leukemic cells can be fatal if not treated promptly. Classifying lymphoblasts and healthy cells poses a significant challenge, even for domain experts, due to their morphological similarities. Automated computer analysis of ALL can provide substantial support in this domain and potentially save numerous lives. In this paper, we propose a novel classification approach that involves analyzing shapes and extracting topological features of ALL cells. We employ persistent homology to capture these topological features. Our technique accurately and efficiently detects and classifies leukemia blast cells, achieving a recall of 98.2% and an F1-score of 94.6%. This approach has the potential to significantly enhance leukemia diagnosis and therapy.

Constructing Shape Spaces From a Topological Perspective (2017)

Christoph Hofer, Roland Kwitt, Marc Niethammer, Yvonne Höller, Eugen Trinka, Andreas Uhl

Abstract

We consider the task of constructing (metric) shape space(s) from a topological perspective. In particular, we present a generic construction scheme and demonstrate how to apply this scheme when shape is interpreted as the differences that remain after factoring out translation, scaling and rotation. This is achieved by leveraging a recently proposed injective functional transform of 2D/3D (binary) objects, based on persistent homology. The resulting shape space is then equipped with a similarity measure that is (1) by design robust to noise and (2) fulfills all metric axioms. From a practical point of view, analyses of object shape can then be carried out directly on segmented objects obtained from some imaging modality without any preprocessing, such as alignment, smoothing, or landmark selection. We demonstrate the utility of the approach on the problem of distinguishing segmented hippocampi from normal controls vs. patients with Alzheimer’s disease in a challenging setup where volume changes are no longer discriminative.

(Quasi)Periodicity Quantification in Video Data, Using Topology (2018)

Christopher J. Tralie, Jose A. Perea

Abstract

This work introduces a novel framework for quantifying the presence and strength of recurrent dynamics in video data. Specifically, we provide continuous measures of periodicity (perfect repetition) and quasiperiodicity (superposition of periodic modes with noncommensurate periods), in a way which does not require segmentation, training, object tracking, or 1-dimensional surrogate signals. Our methodology operates directly on video data. The approach combines ideas from nonlinear time series analysis (delay embeddings) and computational topology (persistent homology) by translating the problem of finding recurrent dynamics in video data into the problem of determining the circularity or toroidality of an associated geometric space. Through extensive testing, we show the robustness of our scores with respect to several noise models/levels; we show that our periodicity score is superior to other methods when compared to human-generated periodicity rankings; and furthermore, we show that our quasiperiodicity score clearly indicates the presence of biphonation in videos of vibrating vocal folds, which has never before been accomplished quantitatively end to end.

Theory and Algorithms for Constructing Discrete Morse Complexes From Grayscale Digital Images (2011)

V. Robins, P. J. Wood, A. P. Sheppard

Abstract

We present an algorithm for determining the Morse complex of a two or three-dimensional grayscale digital image. Each cell in the Morse complex corresponds to a topological change in the level sets (i.e., a critical point) of the grayscale image. Since more than one critical point may be associated with a single image voxel, we model digital images by cubical complexes. A new homotopic algorithm is used to construct a discrete Morse function on the cubical complex that agrees with the digital image and has exactly the number and type of critical cells necessary to characterize the topological changes in the level sets. We make use of discrete Morse theory and simple homotopy theory to prove correctness of this algorithm. The resulting Morse complex is considerably simpler than the cubical complex originally used to represent the image and may be used to compute persistent homology.

Topological Data Analysis on Simple English Wikipedia Articles (2020)

Matthew Wright, Xiaojun Zheng

Abstract

Single-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems, with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology, and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. These approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. We demonstrate the applicability of our methods by distinguishing certain subsets of the Wikipedia data, and by comparison with random data. Results include insights into the construction of null distributions and stability of our methods with respect to noisy data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter.

Extracting Insights From the Shape of Complex Data Using Topology (2013)

P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson

Abstract

This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.

Unsupervised Topological Learning Approach of Crystal Nucleation in Pure Tantalum (2021)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unraveled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometer length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to a monatomic metal, namely Tantalum (Ta), it shows that both translational and orientational ordering always come into play simultaneously when homogeneous nucleation starts in regions with low five-fold symmetry.

Quantitative and Interpretable Order Parameters for Phase Transitions From Persistent Homology (2020)

Alex Cole, Gregory J. Loges, Gary Shiu

Abstract

We apply modern methods in computational topology to the task of discovering and characterizing phase transitions. As illustrations, we apply our method to four two-dimensional lattice spin models: the Ising, square ice, XY, and fully-frustrated XY models. In particular, we use persistent homology, which computes the births and deaths of individual topological features as a coarse-graining scale or sublevel threshold is increased, to summarize multiscale and high-point correlations in a spin configuration. We employ vector representations of this information called persistence images to formulate and perform the statistical task of distinguishing phases. For the models we consider, a simple logistic regression on these images is sufficient to identify the phase transition. Interpretable order parameters are then read from the weights of the regression. This method suffices to identify magnetization, frustration, and vortex-antivortex structure as relevant features for phase transitions in our models. We also define "persistence" critical exponents and study how they are related to those critical exponents usually considered.

Cliques of Neurons Bound Into Cavities Provide a Missing Link Between Structure and Function (2017)

Michael W. Reimann, Max Nolte, Martina Scolamiero, Katharine Turner, Rodrigo Perin, Giuseppe Chindemi, Paweł Dłotko, Ran Levi, Kathryn Hess, Henry Markram

Abstract

The lack of a formal link between neural network structure and its emergent function has hampered our understanding of how the brain processes information. We have now come closer to describing such a link by taking the direction of synaptic transmission into account, constructing graphs of a network that reflect the direction of information flow, and analyzing these directed graphs using algebraic topology. Applying this approach to a local network of neurons in the neocortex revealed a remarkably intricate and previously unseen topology of synaptic connectivity. The synaptic network contains an abundance of cliques of neurons bound into cavities that guide the emergence of correlated activity. In response to stimuli, correlated activity binds synaptically connected neurons into functional cliques and cavities that evolve in a stereotypical sequence towards peak complexity. We propose that the brain processes stimuli by forming increasingly complex functional cliques and cavities.

TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through Topology-Based Machine Learning (2021)

Parker Edwards, Kristen Skruber, Nikola Milićević, James B. Heidings, Tracy-Ann Read, Peter Bubenik, Eric A. Vitriol

Abstract

Recent advances in machine learning have greatly enhanced automatic methods to extract information from fluorescence microscopy data. However, current machine-learning-based models can require hundreds to thousands of images to train, and the most readily accessible models classify images without describing which parts of an image contributed to classification. Here, we introduce TDAExplore, a machine learning image analysis pipeline based on topological data analysis. It can classify different types of cellular perturbations after training with only 20–30 high-resolution images and performs robustly on images from multiple subjects and microscopy modes. Using only images and whole-image labels for training, TDAExplore provides quantitative, spatial information, characterizing which image regions contribute to classification. Computational requirements to train TDAExplore models are modest and a standard PC can perform training with minimal user input. TDAExplore is therefore an accessible, powerful option for obtaining quantitative information about imaging data in a wide variety of applications.

Topological Analysis of Low Dimensional Phase Space Trajectories of High Dimensional EEG Signals for Classification of Interictal Epileptiform Discharges (2023)

A. Stiehl, M. Flammer, F. Anselstetter, N. Ille, H. Bornfleth, S. Geißelsöder, C. Uhl

Abstract

A new topology based feature extraction method for classification of interictal epileptiform discharges (IEDs) in EEG recordings from patients with epilepsy is proposed. After dimension reduction of the recorded EEG signal, using dynamical component analysis (DyCA) or principal component analysis (PCA), a persistent homology analysis of the resulting phase space trajectories is performed. Features are extracted from the persistent homology analysis and used to train and evaluate a support vector machine (SVM). Classification results based on these persistent features are compared with statistical features of the dimension-reduced signals and combinations of all of these features. Combining the persistent and statistical features improves the results (accuracy 94.7 %) compared to using only statistical feature extraction, whereas applying only persistent features does not achieve sufficient performance. For this classification example the choice of the dimension reduction technique does not significantly influence the classification performance of the algorithm.

Rapid and Precise Topological Comparison With Merge Tree Neural Networks (2024)

Yu Qin, Brittany Terese Fasy, Carola Wenk, Brian Summa

Abstract

Merge trees are a valuable tool in the scientific visualization of scalar fields; however, current methods for merge tree comparisons are computationally expensive, primarily due to the exhaustive matching between tree nodes. To address this challenge, we introduce the Merge Tree Neural Network (MTNN), a learned neural network model designed for merge tree comparison. The MTNN enables rapid and high-quality similarity computation. We first demonstrate how to train graph neural networks, which emerged as effective encoders for graphs, in order to produce embeddings of merge trees in vector spaces for efficient similarity comparison. Next, we formulate the novel MTNN model that further improves the similarity comparisons by integrating the tree and node embeddings with a new topological attention mechanism. We demonstrate the effectiveness of our model on real-world data in different domains and examine our model's generalizability across various datasets. Our experimental analysis demonstrates our approach's superiority in accuracy and efficiency. In particular, we speed up the prior state-of-the-art by more than \$100\times\$ on the benchmark datasets while maintaining an error rate below \$0.1\%\$.

Statistical Topology of Bond Networks With Applications to Silica (2020)

B. Schweinhart, D. Rodney, J. K. Mason

Abstract

Whereas knowledge of a crystalline material's unit cell is fundamental to understanding the material's properties and behavior, there are no obvious analogs to unit cells for disordered materials despite the frequent existence of considerable medium-range order. This article views a material's structure as a collection of local atomic environments that are sampled from some underlying probability distribution of such environments, with the advantage of offering a unified description of both ordered and disordered materials. Crystalline materials can then be regarded as special cases where the underlying probability distribution is highly concentrated around the traditional unit cell. The 𝐻1 barcode is proposed as a descriptor of local atomic environments suitable for disordered bond networks and is applied with three other descriptors to molecular dynamics simulations of silica glasses. Each descriptor reliably distinguishes the structure of glasses produced at different cooling rates, with the 𝐻1 barcode and coordination profile providing the best separation. The approach is generally applicable to any system that can be represented as a sparse graph.

Community Resources

Code

Towards a Philological Metric Through a Topological Data Analysis Approach (2020)

Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Miguel A. Gutiérrez-Naranjo

Abstract

The canon of the baroque Spanish literature has been thoroughly studied with philological techniques. The major representatives of the poetry of this epoch are Francisco de Quevedo and Luis de Góngora y Argote. They are commonly classified by the literary experts in two different streams: Quevedo belongs to the Conceptismo and G\ńgora to the Culteranismo. Besides, traditionally, even if Quevedo is considered the most representative of the Conceptismo, Lope de Vega is also considered to be, at least, closely related to this literary trend. In this paper, we use Topological Data Analysis techniques to provide a first approach to a metric distance between the literary style of these poets. As a consequence, we reach results that are under the literary experts' criteria, locating the literary style of Lope de Vega, closer to the one of Quevedo than to the one of G\'ǵora.

Community Resources

Data

Topological Phase Estimation Method for Reparameterized Periodic Functions (2022)

Thomas Bonis, Frédéric Chazal, Bertrand Michel, Wojciech Reise

Abstract

We consider a signal composed of several periods of a periodic function, of which we observe a noisy reparametrisation. The phase estimation problem consists of finding that reparametrisation, and, in particular, the number of observed periods. Existing methods are well-suited to the setting where the periodic function is known, or at least, simple. We consider the case when it is unknown and we propose an estimation method based on the shape of the signal. We use the persistent homology of sublevel sets of the signal to capture the temporal structure of its local extrema. We infer the number of periods in the signal by counting points in the persistence diagram and their multiplicities. Using the estimated number of periods, we construct an estimator of the reparametrisation. It is based on counting the number of sufficiently prominent local minima in the signal. This work is motivated by a vehicle positioning problem, on which we evaluated the proposed method.

Visual Detection of Structural Changes in Time-Varying Graphs Using Persistent Homology (2018)

Mustafa Hajij, Bei Wang, Carlos Scheidegger, Paul Rosen

Abstract

Topological data analysis is an emerging area in exploratory data analysis and data mining. Its main tool, persistent homology, has become a popular technique to study the structure of complex, high-dimensional data. In this paper, we propose a novel method using persistent homology to quantify structural changes in time-varying graphs. Specifically, we transform each instance of the time-varying graph into a metric space, extract topological features using persistent homology, and compare those features over time. We provide a visualization that assists in time-varying graph exploration and helps to identify patterns of behavior within the data. To validate our approach, we conduct several case studies on real-world datasets and show how our method can find cyclic patterns, deviations from those patterns, and one-time events in time-varying graphs. We also examine whether a persistence-based similarity measure satisfies a set of well-established, desirable properties for graph metrics.

Topological Analysis of Gene Expression Arrays Identifies High Risk Molecular Subtypes in Breast Cancer (2012)

Javier Arsuaga, Nils A. Baas, Daniel DeWoskin, Hideaki Mizuno, Aleksandr Pankov, Catherine Park

Abstract

Genomic technologies measure thousands of molecular signals with the goal of understanding complex biological processes. In cancer these molecular signals have been used to characterize disease subtypes, signaling pathways and to identify subsets of patients with specific prognosis. However molecular signals for any disease type are so vast and complex that novel mathematical approaches are required for further analyses. Persistent and computational homology provide a new method for these analyses. In our previous work we presented a new homology-based supervised classification method to identify copy number aberrations from comparative genomic hybridization arrays. In this work we first propose a theoretical framework for our classification method and second we extend our analysis to gene expression data. We analyze a published breast cancer data set and find that that our method can distinguish most, but not all, different breast cancer subtypes. This result suggests that specific relationships between genes, captured by our algorithm, help distinguish between breast cancer subtypes. We propose that topological methods can be used for the classification and clustering of gene expression profiles.

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction (2020)

Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov, Katherine Yelick

Abstract

Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structure-based prediction schemes. We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own as well as a baseline neural network that learns from the same information. PersGNN achieves a 9.3% boost in area under the precision recall curve (AUPR) compared to the best individual model, as well as high F1 scores across different gene ontology categories, indicating the transferability of this approach.

RGB Image-Based Data Analysis via Discrete Morse Theory and Persistent Homology (2018)

Chuan Du, Christopher Szul, Adarsh Manawa, Nima Rasekh, Rosemary Guzman, Ruth Davidson

Abstract

Understanding and comparing images for the purposes of data analysis is currently a very computationally demanding task. A group at Australian National University (ANU) recently developed open-source code that can detect fundamental topological features of a grayscale image in a computationally feasible manner. This is made possible by the fact that computers store grayscale images as cubical cellular complexes. These complexes can be studied using the techniques of discrete Morse theory. We expand the functionality of the ANU code by introducing methods and software for analyzing images encoded in red, green, and blue (RGB), because this image encoding is very popular for publicly available data. Our methods allow the extraction of key topological information from RGB images via informative persistence diagrams by introducing novel methods for transforming RGB-to-grayscale. This paradigm allows us to perform data analysis directly on RGB images representing water scarcity variability as well as crime variability. We introduce software enabling a a user to predict future image properties, towards the eventual aim of more rapid image-based data behavior prediction.

Improved Understanding of Aqueous Solubility Modeling Through Topological Data Analysis (2018)

Mariam Pirashvili, Lee Steinberg, Francisco Belchi Guillamon, Mahesan Niranjan, Jeremy G. Frey, Jacek Brodzki

Abstract

Topological data analysis is a family of recent mathematical techniques seeking to understand the ‘shape’ of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.

Skeletonization and Partitioning of Digital Images Using Discrete Morse Theory (2015)

Olaf Delgado-Friedrichs, Vanessa Robins, Adrian Sheppard

Abstract

We show how discrete Morse theory provides a rigorous and unifying foundation for defining skeletons and partitions of grayscale digital images. We model a grayscale image as a cubical complex with a real-valued function defined on its vertices (the voxel values). This function is extended to a discrete gradient vector field using the algorithm presented in Robins, Wood, Sheppard TPAMI 33:1646 (2011). In the current paper we define basins (the building blocks of a partition) and segments of the skeleton using the stable and unstable sets associated with critical cells. The natural connection between Morse theory and homology allows us to prove the topological validity of these constructions; for example, that the skeleton is homotopic to the initial object. We simplify the basins and skeletons via Morse-theoretic cancellation of critical cells in the discrete gradient vector field using a strategy informed by persistent homology. Simple working Python code for our algorithms for efficient vector field traversal is included. Example data are taken from micro-CT images of porous materials, an application area where accurate topological models of pore connectivity are vital for fluid-flow modelling.

Statistical Inference for Persistent Homology Applied to Simulated fMRI Time Series Data (2023)

Hassan Abdallah, Adam Regalski, Mohammad Behzad Kang, Maria Berishaj, Nkechi Nnadi, Asadur Chowdury, Vaibhav A. Diwadkar, Andrew Salch

Abstract

Time-series data are amongst the most widely-used in biomedical sciences, including domains such as functional Magnetic Resonance Imaging (fMRI). Structure within time series data can be captured by the tools of topological data analysis (TDA). Persistent homology is the mostly commonly used data-analytic tool in TDA, and can effectively summarize complex high-dimensional data into an interpretable 2-dimensional representation called a persistence diagram. Existing methods for statistical inference for persistent homology of data depend on an independence assumption being satisfied. While persistent homology can be computed for each time index in a time-series, time-series data often fail to satisfy the independence assumption. This paper develops a statistical test that obviates the independence assumption by implementing a multi-level block sampled Monte Carlo test with sets of persistence diagrams. Its efficacy for detecting task-dependent topological organization is then demonstrated on simulated fMRI data. This new statistical test is therefore suitable for analyzing persistent homology of fMRI data, and of non-independent data in general.

Dissecting Glial Scar Formation by Spatial Point Pattern and Topological Data Analysis (2024)

Daniel Manrique-Castano, Dhananjay Bhaskar, Ayman ElAli

Abstract

Glial scar formation represents a fundamental response to central nervous system (CNS) injuries. It is mainly characterized by a well-defined spatial rearrangement of reactive astrocytes and microglia. The mechanisms underlying glial scar formation have been extensively studied, yet quantitative descriptors of the spatial arrangement of reactive glial cells remain limited. Here, we present a novel approach using point pattern analysis (PPA) and topological data analysis (TDA) to quantify spatial patterns of reactive glial cells after experimental ischemic stroke in mice. We provide open and reproducible tools using R and Julia to quantify spatial intensity, cell covariance and conditional distribution, cell-to-cell interactions, and short/long-scale arrangement, which collectively disentangle the arrangement patterns of the glial scar. This approach unravels a substantial divergence in the distribution of GFAP+ and IBA1+ cells after injury that conventional analysis methods cannot fully characterize. PPA and TDA are valuable tools for studying the complex spatial arrangement of reactive glia and other nervous cells following CNS injuries and have potential applications for evaluating glial-targeted restorative therapies.

Community Resources

Code
Data

Molecular Phenotyping Using Networks, Diffusion, and Topology: Soft Tissue Sarcoma (2019)

James C. Mathews, Maryam Pouryahya, Caroline Moosmüller, Yannis G. Kevrekidis, Joseph O. Deasy, Allen Tannenbaum

Abstract

Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.

Path Homologies of Motifs and Temporal Network Representations (2022)

Samir Chowdhury, Steve Huntsman, Matvey Yutin

Abstract

Path homology is a powerful method for attaching algebraic invariants to digraphs. While there have been growing theoretical developments on the algebro-topological framework surrounding path homology, bona fide applications to the study of complex networks have remained stagnant. We address this gap by presenting an algorithm for path homology that combines efficient pruning and indexing techniques and using it to topologically analyze a variety of real-world complex temporal networks. A crucial step in our analysis is the complete characterization of path homologies of certain families of small digraphs that appear as subgraphs in these complex networks. These families include all digraphs, directed acyclic graphs, and undirected graphs up to certain numbers of vertices, as well as some specially constructed cases. Using information from this analysis, we identify small digraphs contributing to path homology in dimension two for three temporal networks in an aggregated representation and relate these digraphs to network behavior. We then investigate alternative temporal network representations and identify complementary subgraphs as well as behavior that is preserved across representations. We conclude that path homology provides insight into temporal network structure, and in turn, emergent structures in temporal networks provide us with new subgraphs having interesting path homology.

A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science (2023)

Lander Ver Hoef, Henry Adams, Emily J. King, Imme Ebert-Uphoff

Abstract

Abstract Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code. Significance Statement Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.

Persistence Images: A Stable Vector Representation of Persistent Homology (2017)

Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, Lori Ziegelmeier

Abstract

Many data sets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a data set. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite-dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs.

Community Resources

Topological Data Analysis for True Step Detection in Periodic Piecewise Constant Signals (2018)

Firas A. Khasawneh, Elizabeth Munch

Abstract

This paper introduces a simple yet powerful approach based on topological data analysis for detecting true steps in a periodic, piecewise constant (PWC) signal. The signal is a two-state square wave with randomly varying in-between-pulse spacing, subject to spurious steps at the rising or falling edges which we call digital ringing. We use persistent homology to derive mathematical guarantees for the resulting change detection which enables accurate identification and counting of the true pulses. The approach is tested using both synthetic and experimental data obtained using an engine lathe instrumented with a laser tachometer. The described algorithm enables accurate and automatic calculations of the spindle speed without any choice of parameters. The results are compared with the frequency and sequency methods of the Fourier and Walsh–Hadamard transforms, respectively. Both our approach and the Fourier analysis yield comparable results for pulses with regular spacing and digital ringing while the latter causes large errors using the Walsh–Hadamard method. Further, the described approach significantly outperforms the frequency/sequency analyses when the spacing between the peaks is varied. We discuss generalizing the approach to higher dimensional PWC signals, although using this extension remains an interesting question for future research.

Persistent Brain Network Homology From the Perspective of Dendrogram (2012)

Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, Dong Soo Lee

Abstract

The brain network is usually constructed by estimating the connectivity matrix and thresholding it at an arbitrary level. The problem with this standard method is that we do not have any generally accepted criteria for determining a proper threshold. Thus, we propose a novel multiscale framework that models all brain networks generated over every possible threshold. Our approach is based on persistent homology and its various representations such as the Rips filtration, barcodes, and dendrograms. This new persistent homological framework enables us to quantify various persistent topological features at different scales in a coherent manner. The barcode is used to quantify and visualize the evolutionary changes of topological features such as the Betti numbers over different scales. By incorporating additional geometric information to the barcode, we obtain a single linkage dendrogram that shows the overall evolution of the network. The difference between the two networks is then measured by the Gromov-Hausdorff distance over the dendrograms. As an illustration, we modeled and differentiated the FDG-PET based functional brain networks of 24 attention-deficit hyperactivity disorder children, 26 autism spectrum disorder children, and 11 pediatric control subjects.

A Novel Multi-Task Machine Learning Classifier for Rare Disease Patterning Using Cardiac Strain Imaging Data (2024)

Nanda K. Siva, Yashbir Singh, Quincy A. Hathaway, Partho P. Sengupta, Naveena Yanamala

Abstract

To provide accurate predictions, current machine learning-based solutions require large, manually labeled training datasets. We implement persistent homology (PH), a topological tool for studying the pattern of data, to analyze echocardiography-based strain data and differentiate between rare diseases like constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM). Patient population (retrospectively registered) included those presenting with heart failure due to CP (n = 51), RCM (n = 47), and patients without heart failure symptoms (n = 53). Longitudinal, radial, and circumferential strains/strain rates for left ventricular segments were processed into topological feature vectors using Machine learning PH workflow. In differentiating CP and RCM, the PH workflow model had a ROC AUC of 0.94 (Sensitivity = 92%, Specificity = 81%), compared with the GLS model AUC of 0.69 (Sensitivity = 65%, Specificity = 66%). In differentiating between all three conditions, the PH workflow model had an AUC of 0.83 (Sensitivity = 68%, Specificity = 84%), compared with the GLS model AUC of 0.68 (Sensitivity = 52% and Specificity = 76%). By employing persistent homology to differentiate the “pattern” of cardiac deformations, our machine-learning approach provides reasonable accuracy when evaluating small datasets and aids in understanding and visualizing patterns of cardiac imaging data in clinically challenging disease states.

Unsupervised Topological Learning Approach of Crystal Nucleation (2022)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unravelled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometre length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to monatomic metals, it shows that both translational and orientational ordering always come into play simultaneously as a result of the strong bonding when homogeneous nucleation starts in regions with low five-fold symmetry. It also reveals the specificity of the nucleation pathways depending on the element considered, with features beyond the hypothesis of Classical Nucleation Theory.

Community Resources

Code

Topological Echoes of Primordial Physics in the Universe at Large Scales (2020)

Alex Cole, Matteo Biagetti, Gary Shiu

Abstract

We present a pipeline for characterizing and constraining initial conditions in cosmology via persistent homology. The cosmological observable of interest is the cosmic web of large scale structure, and the initial conditions in question are non-Gaussianities (NG) of primordial density perturbations. We compute persistence diagrams and derived statistics for simulations of dark matter halos with Gaussian and non-Gaussian initial conditions. For computational reasons and to make contact with experimental observations, our pipeline computes persistence in sub-boxes of full simulations and simulations are subsampled to uniform halo number. We use simulations with large NG (\$f_\\rm NL\\textasciicircum\\rm loc\=250\$) as templates for identifying data with mild NG (\$f_\\rm NL\\textasciicircum\\rm loc\=10\$), and running the pipeline on several cubic volumes of size \$40~(\textrm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes for our best single statistic. Throughout we benefit from the interpretability of topological features as input for statistical inference, which allows us to make contact with previous first-principles calculations and make new predictions.

Genomics Data Analysis via Spectral Shape and Topology (2022)

Erik J. Amézquita, Farzana Nasrin, Kathleen M. Storey, Masato Yoshizawa

Abstract

Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper and differential gene expression. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-SNE. Although Mapper shows promise in analyzing high-dimensional data, building tools to statistically analyze Mapper graphical structures is limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.

Topological Gene Expression Networks Recapitulate Brain Anatomy and Function (2019)

Alice Patania, Pierluigi Selvaggi, Mattia Veronese, Ottavia Dipasquale, Paul Expert, Giovanni Petri

Abstract

Understanding how gene expression translates to and affects human behavior is one of the ultimate goals of neuroscience. In this paper, we present a pipeline based on Mapper, a topological simplification tool, to analyze gene co-expression data. We first validate the method by reproducing key results from the literature on the Allen Human Brain Atlas and the correlations between resting-state fMRI and gene co-expression maps. We then analyze a dopamine-related gene set and find that co-expression networks produced by Mapper return a structure that matches the well-known anatomy of the dopaminergic pathway. Our results suggest that network based descriptions can be a powerful tool to explore the relationships between genetic pathways and their association with brain function and its perturbation due to illness and/or pharmacological challenges., In this paper, we described a gene co-expression analysis pipeline that produces networks that we show to be closely related to either brain function and to neurotransmitter pathways. Our results suggest that this pipeline could be developed into a platform enabling the exploration of the effects of physiological and pathological alterations to specific gene sets, including profiling drugs effects.

Persistent Voids: A New Structural Metric for Membrane Fusion (2007)

Peter M. Kasson, Afra Zomorodian, Sanghyun Park, Nina Singhal, Leonidas J. Guibas, Vijay S. Pande

Abstract

Motivation: Membrane fusion constitutes a key stage in cellular processes such as synaptic neurotransmission and infection by enveloped viruses. Current experimental assays for fusion have thus far been unable to resolve early fusion events in fine structural detail. We have previously used molecular dynamics simulations to develop mechanistic models of fusion by small lipid vesicles. Here, we introduce a novel structural measurement of vesicle topology and fusion geometry: persistent voids.Results: Persistent voids calculations enable systematic measurement of structural changes in vesicle fusion by assessing fusion stalk widths. They also constitute a generally applicable technique for assessing lipid topological change. We use persistent voids to compute dynamic relationships between hemifusion neck widening and formation of a full fusion pore in our simulation data. We predict that a tightly coordinated process of hemifusion neck expansion and pore formation is responsible for the rapid vesicle fusion mechanism, while isolated enlargement of the hemifusion diaphragm leads to the formation of a metastable hemifused intermediate. These findings suggest that rapid fusion between small vesicles proceeds via a small hemifusion diaphragm rather than a fully expanded one.Availability: Software available upon request pending public release.Contact:kasson@cmgm.stanford-edu or pande@stanford.eduSupplementary information: Supplementary data are available on Bioinformatics online.

Efficient Planning of Multi-Robot Collective Transport Using Graph Reinforcement Learning With Higher Order Topological Abstraction (2023)

Steve Paul, Wenyuan Li, Brian Smyth, Yuzhou Chen, Yulia Gel, Souma Chowdhury

Abstract

Efficient multi-robot task allocation (MRTA) is fundamental to various time-sensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call MRTA-collective transport or MRTA-CT - here tasks present varying workloads and deadlines, and robots are subject to flight range, communication range, and payload constraints. For large instances of these problems involving 100s-1000's of tasks and 10s-100s of robots, traditional non-learning solvers are often time-inefficient, and emerging learning-based policies do not scale well to larger-sized problems without costly retraining. To address this gap, we use a recently proposed encoder-decoder graph neural network involving Capsule networks and multi-head attention mechanism, and innovatively add topological descriptors (TD) as new features to improve transferability to unseen problems of similar and larger size. Persistent homology is used to derive the TD, and proximal policy optimization is used to train our TD-augmented graph neural network. The resulting policy model compares favorably to state-of-the-art non-learning baselines while being much faster. The benefit of using TD is readily evident when scaling to test problems of size larger than those used in training.

When Remote Sensing Meets Topological Data Analysis (2018)

Ludovic Duponchel

Abstract

Author Summary: Hyperspectral remote sensing plays an increasingly important role in many scientific domains and everyday life problems. Indeed, this imaging concept ends up in applications as varied as catching tax-evaders red-handed by locating new construction and building alterations, searching for aircraft and saving lives after fatal crashes, detecting oil spills for marine life and environmental preservation, spying on enemies with reconnaissance satellites, watching algae grow as an indicator of environmental health, forecasting weather to warn about natural disasters and much more. From an instrumental point of view, we can say that the actual spectrometers have rather good characteristics, even if we can always increase spatial resolution and spectral range. In order to extract ever more information from such experiments and develop new applications, we must, therefore, propose multivariate data analysis tools able to capture the shape of data sets and their specific features. Nevertheless, actual methods often impose a data model which implicitly defines the geometry of the data set. The aim of the paper is thus to introduce the concept of topological data analysis in the framework of remote sensing, making no assumptions about the global shape of the data set, but also allowing the capture of its local features.

Community Resources

Code
Data

Using Persistent Homology and Dynamical Distances to Analyze Protein Binding (2016)

Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolić, Giseon Heo

Abstract

Persistent homology captures the evolution of topological features of a model as a parameter changes. The most commonly used summary statistics of persistent homology are the barcode and the persistence diagram. Another summary statistic, the persistence landscape, was recently introduced by Bubenik. It is a functional summary, so it is easy to calculate sample means and variances, and it is straightforward to construct various test statistics. Implementing a permutation test we detect conformational changes between closed and open forms of the maltose-binding protein, a large biomolecule consisting of 370 amino acid residues. Furthermore, persistence landscapes can be applied to machine learning methods. A hyperplane from a support vector machine shows the clear separation between the closed and open proteins conformations. Moreover, because our approach captures dynamical properties of the protein our results may help in identifying residues susceptible to ligand binding; we show that the majority of active site residues and allosteric pathway residues are located in the vicinity of the most persistent loop in the corresponding filtered Vietoris-Rips complex. This finding was not observed in the classical anisotropic network model.

The Persistent Homology Mathematical Framework Provides Enhanced Genotype-to-Phenotype Associations for Plant Morphology (2018)

Mao Li, Margaret H. Frank, Viktoriya Coneva, Washington Mio, Daniel H. Chitwood, Christopher N. Topp

Abstract

Efforts to understand the genetic and environmental conditioning of plant morphology are hindered by the lack of flexible and effective tools for quantifying morphology. Here, we demonstrate that persistent-homology-based topological methods can improve measurement of variation in leaf shape, serrations, and root architecture. We apply these methods to 2D images of leaves and root systems in field-grown plants of a domesticated introgression line population of tomato (Solanum pennellii). We find that compared with some commonly used conventional traits, (1) persistent-homology-based methods can more comprehensively capture morphological variation; (2) these techniques discriminate between genotypes with a larger normalized effect size and detect a greater number of unique quantitative trait loci (QTLs); (3) multivariate traits, whether statistically derived from univariate or persistent-homology-based traits, improve our ability to understand the genetic basis of phenotype; and (4) persistent-homology-based techniques detect unique QTLs compared to conventional traits or their multivariate derivatives, indicating that previously unmeasured aspects of morphology are now detectable. The QTL results further imply that genetic contributions to morphology can affect both the shoot and root, revealing a pleiotropic basis to natural variation in tomato. Persistent homology is a versatile framework to quantify plant morphology and developmental processes that complements and extends existing methods.

Representations of Energy Landscapes by Sublevelset Persistent Homology: An Example With N-Alkanes (2020)

Joshua Mirth, Yanqin Zhai, Johnathan Bush, Enrique G. Alvarado, Howie Jordan, Mark Heim, Bala Krishnamoorthy, Markus Pflaum, Aurora Clark, Y. Z, Henry Adams

Abstract

Encoding the complex features of an energy landscape is a challenging task, and often chemists pursue the most salient features (minima and barriers) along a highly reduced space, i.e. 2- or 3-dimensions. Even though disconnectivity graphs or merge trees summarize the connectivity of the local minima of an energy landscape via the lowest-barrier pathways, there is more information to be gained by also considering the topology of each connected component at different energy thresholds (or sublevelsets). We propose sublevelset persistent homology as an appropriate tool for this purpose. Our computations on the configuration phase space of n-alkanes from butane to octane allow us to conjecture, and then prove, a complete characterization of the sublevelset persistent homology of the alkane \$C_m H_\2m+2\\$ potential energy landscapes, for all \$m\$, and in all homological dimensions. We further compare both the analytical configurational potential energy landscapes and sampled data from molecular dynamics simulation, using the united and all-atom descriptions of the intramolecular interactions. In turn, this supports the application of distance metrics to quantify sampling fidelity and lays the foundation for future work regarding new metrics that quantify differences between the topological features of high-dimensional energy landscapes.

A Novel Multi-Task Machine Learning Classifier for Rare Disease Patterning Using Cardiac Strain Imaging Data (2024)

Nanda K. Siva, Yashbir Singh, Quincy A. Hathaway, Partho P. Sengupta, Naveena Yanamala

Abstract

To provide accurate predictions, current machine learning-based solutions require large, manually labeled training datasets. We implement persistent homology (PH), a topological tool for studying the pattern of data, to analyze echocardiography-based strain data and differentiate between rare diseases like constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM). Patient population (retrospectively registered) included those presenting with heart failure due to CP (n = 51), RCM (n = 47), and patients without heart failure symptoms (n = 53). Longitudinal, radial, and circumferential strains/strain rates for left ventricular segments were processed into topological feature vectors using Machine learning PH workflow. In differentiating CP and RCM, the PH workflow model had a ROC AUC of 0.94 (Sensitivity = 92%, Specificity = 81%), compared with the GLS model AUC of 0.69 (Sensitivity = 65%, Specificity = 66%). In differentiating between all three conditions, the PH workflow model had an AUC of 0.83 (Sensitivity = 68%, Specificity = 84%), compared with the GLS model AUC of 0.68 (Sensitivity = 52% and Specificity = 76%). By employing persistent homology to differentiate the “pattern” of cardiac deformations, our machine-learning approach provides reasonable accuracy when evaluating small datasets and aids in understanding and visualizing patterns of cardiac imaging data in clinically challenging disease states.

Community Resources

Code
Code

Diverse 3D Cellular Patterns Underlie the Development of Cardamine Hirsuta and Arabidopsis Thaliana Ovules (2023)

Tejasvinee Atul Mody, Alexander Rolle, Nico Stucki, Fabian Roll, Ulrich Bauer, Kay Schneitz

Abstract

A fundamental question in biology is how organ morphogenesis comes about. The ovules of Arabidopsis thaliana have been established as a successful model to study numerous aspects of tissue morphogenesis; however, little is known regarding the relative contributions and dynamics of differential tissue and cellular growth and architecture in establishing ovule morphogenesis in different species. To address this issue, we generated a 3D digital atlas of Cardamine hirsuta ovule development with full cellular resolution. We combined quantitative comparative morphometrics and topological analysis to explore similarities and differences in the 3D cellular architectures underlying ovule development of the two species. We discovered that they show diversity in the way the three radial cell layers of the primordium contribute to its growth, in the formation of a new cell layer in the inner integument and, in certain cases, in the topological properties of the 3D cell architectures of homologous tissues despite their similar shape. Our work demonstrates the power of comparative 3D cellular morphometry and the importance of internal tissues and their cellular architecture in organ morphogenesis. Summary Statement Quantitative morphometric comparison of 3D digital ovules at full cellular resolution reveals diversity in internal 3D cellular architectures between similarly shaped ovules of Cardamine hirsuta and Arabidopsis thaliana.

Community Resources

Code

Chatter Classification in Turning Using Machine Learning and Topological Data Analysis (2018)

Firas A. Khasawneh, Elizabeth Munch, Jose A. Perea

Abstract

Chatter identification and detection in machining processes has been an active area of research in the past two decades. Part of the challenge in studying chatter is that machining equations that describe its occurrence are often nonlinear delay differential equations. The majority of the available tools for chatter identification rely on defining a metric that captures the characteristics of chatter, and a threshold that signals its occurrence. The difficulty in choosing these parameters can be somewhat alleviated by utilizing machine learning techniques. However, even with a successful classification algorithm, the transferability of typical machine learning methods from one data set to another remains very limited. In this paper we combine supervised machine learning with Topological Data Analysis (TDA) to obtain a descriptor of the process which can detect chatter. The features we use are derived from the persistence diagram of an attractor reconstructed from the time series via Takens embedding. We test the approach using deterministic and stochastic turning models, where the stochasticity is introduced via the cutting coefficient term. Our results show a 97% successful classification rate on the deterministic model labeled by the stability diagram obtained using the spectral element method. The features gleaned from the deterministic model are then utilized for characterization of chatter in a stochastic turning model where there are very limited analysis methods.

Ultrahigh-Pressure Form of \$\Mathrm\Si\\\mathrm\O\\_\2\\$ Glass With Dense Pyrite-Type Crystalline Homology (2019)

M. Murakami, S. Kohara, N. Kitamura, J. Akola, H. Inoue, A. Hirata, Y. Hiraoka, Y. Onodera, I. Obayashi, J. Kalikka, N. Hirao, T. Musso, A. S. Foster, Y. Idemoto, O. Sakata, Y. Ohishi

Abstract

High-pressure synthesis of denser glass has been a longstanding interest in condensed-matter physics and materials science because of its potentially broad industrial application. Nevertheless, understanding its nature under extreme pressures has yet to be clarified due to experimental and theoretical challenges. Here we reveal the formation of OSi4 tetraclusters associated with that of SiO7 polyhedra in SiO2 glass under ultrahigh pressures to 200 gigapascal confirmed both experimentally and theoretically. Persistent homology analyses with molecular dynamics simulations found increased packing fraction of atoms whose topological diagram at ultrahigh pressures is similar to a pyrite-type crystalline phase, although the formation of tetraclusters is prohibited in the crystalline phase. This critical difference would be caused by the potential structural tolerance in the glass for distortion of oxygen clusters. Furthermore, an expanded electronic band gap demonstrates that chemical bonds survive at ultrahigh pressure. This opens up the synthesis of topologically disordered dense oxide glasses.

Community Resources

Code

Branching and Circular Features in High Dimensional Data (2011)

B. Wang, B. Summa, V. Pascucci, M. Vejdemo-Johansson

Abstract

Large observations and simulations in scientific research give rise to high-dimensional data sets that present many challenges and opportunities in data analysis and visualization. Researchers in application domains such as engineering, computational biology, climate study, imaging and motion capture are faced with the problem of how to discover compact representations of highdimensional data while preserving their intrinsic structure. In many applications, the original data is projected onto low-dimensional space via dimensionality reduction techniques prior to modeling. One problem with this approach is that the projection step in the process can fail to preserve structure in the data that is only apparent in high dimensions. Conversely, such techniques may create structural illusions in the projection, implying structure not present in the original high-dimensional data. Our solution is to utilize topological techniques to recover important structures in high-dimensional data that contains non-trivial topology. Specifically, we are interested in high-dimensional branching structures. We construct local circle-valued coordinate functions to represent such features. Subsequently, we perform dimensionality reduction on the data while ensuring such structures are visually preserved. Additionally, we study the effects of global circular structures on visualizations. Our results reveal never-before-seen structures on real-world data sets from a variety of applications.

Dynamic State Analysis of a Driven Magnetic Pendulum Using Ordinal Partition Networks and Topological Data Analysis (2020)

Audun Myers, Firas A. Khasawneh

Abstract

Abstract. The use of complex networks for time series analysis has recently shown to be useful as a tool for detecting dynamic state changes for a wide variety of applications. In this work, we implement the commonly used ordinal partition network to transform a time series into a network for detecting these state changes for the simple magnetic pendulum. The time series that we used are obtained experimentally from a base-excited magnetic pendulum apparatus, and numerically from the corresponding governing equations. The magnetic pendulum provides a relatively simple, non-linear example demonstrating transitions from periodic to chaotic motion with the variation of system parameters. For our method, we implement persistent homology, a shape measuring tool from Topological Data Analysis (TDA), to summarize the shape of the resulting ordinal partition networks as a tool for detecting state changes. We show that this network analysis tool provides a clear distinction between periodic and chaotic time series. Another contribution of this work is the successful application of the networks-TDA pipeline, for the first time, to signals from non-autonomous nonlinear systems. This opens the door for our approach to be used as an automatic design tool for studying the effect of design parameters on the resulting system response. Other uses of this approach include fault detection from sensor signals in a wide variety of engineering operations.

Multiscale Topology Classifies Cells in Subcellular Spatial Transcriptomics (2024)

Katherine Benjamin, Aneesha Bhandari, Jessica D. Kepple, Rui Qi, Zhouchun Shang, Yanan Xing, Yanru An, Nannan Zhang, Yong Hou, Tanya L. Crockford, Oliver McCallion, Fadi Issa, Joanna Hester, Ulrike Tillmann, Heather A. Harrington, Katherine R. Bull

Abstract

Spatial transcriptomics measures in situ gene expression at millions of locations within a tissue1, hitherto with some trade-off between transcriptome depth, spatial resolution and sample size2. Although integration of image-based segmentation has enabled impactful work in this context, it is limited by imaging quality and tissue heterogeneity. By contrast, recent array-based technologies offer the ability to measure the entire transcriptome at subcellular resolution across large samples3–6. Presently, there exist no approaches for cell type identification that directly leverage this information to annotate individual cells. Here we propose a multiscale approach to automatically classify cell types at this subcellular level, using both transcriptomic information and spatial context. We showcase this on both targeted and whole-transcriptome spatial platforms, improving cell classification and morphology for human kidney tissue and pinpointing individual sparsely distributed renal mouse immune cells without reliance on image data. By integrating these predictions into a topological pipeline based on multiparameter persistent homology7–9, we identify cell spatial relationships characteristic of a mouse model of lupus nephritis, which we validate experimentally by immunofluorescence. The proposed framework readily generalizes to new platforms, providing a comprehensive pipeline bridging different levels of biological organization from genes through to tissues.

Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

Paolo Bajardi, Matteo Delfino, André Panisson, Giovanni Petri, Michele Tizzoni

Abstract

We analyse a large mobile phone activity dataset provided by Telecom Italia for the Telecom Big Data Challenge contest. The dataset reports the international country codes of every call/SMS made and received by mobile phone users in Milan, Italy, between November and December 2013, with a spatial resolution of about 200 meters. We first show that the observed spatial distribution of international codes well matches the distribution of international communities reported by official statistics, confirming the value of mobile phone data for demographic research. Next, we define an entropy function to measure the heterogeneity of the international phone activity in space and time. By comparing the entropy function to empirical data, we show that it can be used to identify the city’s hotspots, defined by the presence of points of interests. Eventually, we use the entropy function to characterize the spatial distribution of international communities in the city. Adopting a topological data analysis approach, we find that international mobile phone users exhibit some robust clustering patterns that correlate with basic socio-economic variables. Our results suggest that mobile phone records can be used in conjunction with topological data analysis tools to study the geography of migrant communities in a global city.

Community Resources

Code

Topological Data Analysis of Zebrafish Patterns (2020)

Melissa R. McGuirl, Alexandria Volkening, Björn Sandstede

Abstract

Self-organized pattern behavior is ubiquitous throughout nature, from fish schooling to collective cell dynamics during organism development. Qualitatively these patterns display impressive consistency, yet variability inevitably exists within pattern-forming systems on both microscopic and macroscopic scales. Quantifying variability and measuring pattern features can inform the underlying agent interactions and allow for predictive analyses. Nevertheless, current methods for analyzing patterns that arise from collective behavior capture only macroscopic features or rely on either manual inspection or smoothing algorithms that lose the underlying agent-based nature of the data. Here we introduce methods based on topological data analysis and interpretable machine learning for quantifying both agent-level features and global pattern attributes on a large scale. Because the zebrafish is a model organism for skin pattern formation, we focus specifically on analyzing its skin patterns as a means of illustrating our approach. Using a recent agent-based model, we simulate thousands of wild-type and mutant zebrafish patterns and apply our methodology to better understand pattern variability in zebrafish. Our methodology is able to quantify the differential impact of stochasticity in cell interactions on wild-type and mutant patterns, and we use our methods to predict stripe and spot statistics as a function of varying cellular communication. Our work provides an approach to automatically quantifying biological patterns and analyzing agent-based dynamics so that we can now answer critical questions in pattern formation at a much larger scale.

Homological Scaffolds of Brain Functional Networks (2014)

G. Petri, P. Expert, F. Turkheimer, R. Carhart-Harris, D. Nutt, P. J. Hellyer, F. Vaccarino

Abstract

Networks, as efficient representations of complex systems, have appealed to scientists for a long time and now permeate many areas of science, including neuroimaging (Bullmore and Sporns 2009 Nat. Rev. Neurosci.10, 186–198. (doi:10.1038/nrn2618)). Traditionally, the structure of complex networks has been studied through their statistical properties and metrics concerned with node and link properties, e.g. degree-distribution, node centrality and modularity. Here, we study the characteristics of functional brain networks at the mesoscopic level from a novel perspective that highlights the role of inhomogeneities in the fabric of functional connections. This can be done by focusing on the features of a set of topological objects—homological cycles—associated with the weighted functional network. We leverage the detected topological information to define the homological scaffolds, a new set of objects designed to represent compactly the homological features of the correlation network and simultaneously make their homological properties amenable to networks theoretical methods. As a proof of principle, we apply these tools to compare resting-state functional brain activity in 15 healthy volunteers after intravenous infusion of placebo and psilocybin—the main psychoactive component of magic mushrooms. The results show that the homological structure of the brain's functional patterns undergoes a dramatic change post-psilocybin, characterized by the appearance of many transient structures of low stability and of a small number of persistent ones that are not observed in the case of placebo.

Nonlinear Dynamic Approaches to Identify Atrial Fibrillation Progression Based on Topological Methods (2019)

Bahareh Safarbali, Seyed Mohammad Reza Hashemi Golpayegani

Abstract

In recent years, atrial fibrillation (AF) development from paroxysmal to persistent or permanent forms has become an important issue in cardiovascular disorders. Information about AF pattern of presentation (paroxysmal, persistent, or permanent) was useful in the management of algorithms in each category. This management is aimed at reducing symptoms and stopping severe problems associated with AF. AF classification has been based on time duration and episodes until now. In particular, complexity changes in Heart Rate Variation (HRV) may contain clinically relevant signals of imminent systemic dysregulation. A number of nonlinear methods based on phase space and topological properties can give more insight into HRV abnormalities such as fibrillation. Aiming to provide a nonlinear tool to qualitatively classify AF stages, we proposed two geometrical indices (fractal dimension and persistent homology) based on HRV phase space, which can successfully replicate the changes in AF progression. The study population includes 38 lone AF patients and 20 normal subjects, which are collected from the Physio-Bank database. “Time of Life (TOL)” is proposed as a new feature based on the initial and final Čech radius in the persistent homology diagram. A neural network was implemented to prove the effectiveness of both TOL and fractal dimension as classification features. The accuracy of classification performance was 93%. The proposed indices provide a signal representation framework useful to understand the dynamic changes in AF cardiac patterns and to classify normal and pathological rhythms.

Persistent Homology of Time-Dependent Functional Networks Constructed From Coupled Time Series (2017)

Bernadette J. Stolz, Heather A. Harrington, Mason A. Porter

Abstract

We use topological data analysis to study “functional networks” that we construct from time-series data from both experimental and synthetic sources. We use persistent homology with a weight rank clique filtration to gain insights into these functional networks, and we use persistence landscapes to interpret our results. Our first example uses time-series output from networks of coupled Kuramoto oscillators. Our second example consists of biological data in the form of functional magnetic resonance imaging data that were acquired from human subjects during a simple motor-learning task in which subjects were monitored for three days during a five-day period. With these examples, we demonstrate that (1) using persistent homology to study functional networks provides fascinating insights into their properties and (2) the position of the features in a filtration can sometimes play a more vital role than persistence in the interpretation of topological features, even though conventionally the latter is used to distinguish between signal and noise. We find that persistent homology can detect differences in synchronization patterns in our data sets over time, giving insight both on changes in community structure in the networks and on increased synchronization between brain regions that form loops in a functional network during motor learning. For the motor-learning data, persistence landscapes also reveal that on average the majority of changes in the network loops take place on the second of the three days of the learning process.

Extended Persistent Homology Distinguishes Simple and Complex Contagions With High Accuracy (2025)

Vahid Shamsaddini, M. Amin Rahimian

Abstract

The social contagion literature makes a distinction between simple (independent cascade or bond percolation processes that pass infections through edges) and complex contagions (bootstrap percolation or threshold processes that require local reinforcement to spread). However, distinguishing simple and complex contagions using observational data poses a significant challenge in practice. Estimating population-level activation functions from observed contagion dynamics is hindered by confounding factors that influence adoptions (other than neighborhood interactions), as well as heterogeneity in individual behaviors and modeling variations that make it difficult to design appropriate null models for inferring contagion types. Here, we show that a new tool from topological data analysis (TDA), called extended persistent homology (EPH), when applied to contagion processes over networks, can effectively detect simple and complex contagion processes, as well as predict their parameters. We train classification and regression models using EPH-based topological summaries computed on simulated simple and complex contagion dynamics on three real-world network datasets and obtain high predictive performance over a wide range of contagion parameters and under a variety of informational constraints, including uncertainty in model parameters, noise, and partial observability of contagion dynamics. EPH captures the role of cycles of varying lengths in the observed contagion dynamics and offers a useful metric to classify contagion models and predict their parameters. Analyzing geometrical features of network contagion using TDA tools such as EPH can find applications in other network problems such as seeding, vaccination, and quarantine optimization, as well as network inference and reconstruction problems.

Lean Blowout Detection Using Topological Data Analysis (2024)

Arijit Bhattacharya, Sabyasachi Mondal, Somnath De, Achintya Mukhopadhyay, Swarnendu Sen

Abstract

Modern lean premixed combustors are operated in ultra-lean mode to conform to strict emission norms. However, this causes the combustors to become prone to lean blowout (LBO). Online monitoring of combustion dynamics may help to avoid LBO and help the combustor run more safely and reliably. Previous studies have suggested various techniques to early predict LBO in single-burner combustors. In contrast, early detection of LBO in multi-burner combustors has been little explored to date. Recent studies have discovered significantly different combustion dynamics between multi-burner combustors and single-burner combustors. In the present paper, we show that some well-established early LBO detection techniques suitable for single-burner combustor are less effective in early detecting LBO in multi-burner combustors. To resolve this, we propose a novel tool, topological data analysis (TDA), for real-time LBO prediction in a wide range of combustor configurations. We find that the TDA metrics are computationally cheap and follow monotonic trends during the transition to LBO. This indicates that the TDA metrics can be used to fine-tune the LBO safety margin, which is a desirable feature from practical implementation point of view. Furthermore, we show that the sublevel set TDA metrics show approximately monotonic changes during the transition to LBO even with low sampling-rate signals. Sublevel set TDA is computationally inexpensive and does not require phase-space embedding. Therefore, TDA can potentially be used for real-time monitoring of combustor dynamics with simple, low-cost, and low sampling-rate sensors.

Signal Enrichment With Strain-Level Resolution in Metagenomes Using Topological Data Analysis (2019)

Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida

Abstract

Background A metagenome is a collection of genomes, usually in a micro-environment, and sequencing a metagenomic sample en masse is a powerful means for investigating the community of the constituent microorganisms. One of the challenges is in distinguishing between similar organisms due to rampant multiple possible assignments of sequencing reads, resulting in false positive identifications. We map the problem to a topological data analysis (TDA) framework that extracts information from the geometric structure of data. Here the structure is defined by multi-way relationships between the sequencing reads using a reference database. Results Based primarily on the patterns of co-mapping of the reads to multiple organisms in the reference database, we use two models: one a subcomplex of a Barycentric subdivision complex and the other a Čech complex. The Barycentric subcomplex allows a natural mapping of the reads along with their coverage of organisms while the Čech complex takes simply the number of reads into account to map the problem to homology computation. Using simulated genome mixtures we show not just enrichment of signal but also microbe identification with strain-level resolution. Conclusions In particular, in the most refractory of cases where alternative algorithms that exploit unique reads (i.e., mapped to unique organisms) fail, we show that the TDA approach continues to show consistent performance. The Čech model that uses less information is equally effective, suggesting that even partial information when augmented with the appropriate structure is quite powerful.

Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis (2019)

Lorin Crawford, Anthea Monod, Andrew X. Chen, Sayan Mukherjee, Raúl Rabadán

Abstract

Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study in the field of cancer biology. Its rapid progression and the relative time cost of obtaining molecular data make other readily available forms of data, such as images, an important resource for actionable measures in patients. Our goal is to use information given by medical images taken from GBM patients in statistical settings. To do this, we design a novel statistic—the smooth Euler characteristic transform (SECT)—that quantifies magnetic resonance images of tumors. Due to its well-defined inner product structure, the SECT can be used in a wider range of functional and nonparametric modeling approaches than other previously proposed topological summary statistics. When applied to a cohort of GBM patients, we find that the SECT is a better predictor of clinical outcomes than both existing tumor shape quantifications and common molecular assays. Specifically, we demonstrate that SECT features alone explain more of the variance in GBM patient survival than gene expression, volumetric features, and morphometric features. The main takeaways from our findings are thus 2-fold. First, they suggest that images contain valuable information that can play an important role in clinical prognosis and other medical decisions. Second, they show that the SECT is a viable tool for the broader study of medical imaging informatics. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Topological Data Analysis for the Characterization of Atomic Scale Morphology From Atom Probe Tomography Images (2018)

Tianmu Zhang, Scott R. Broderick, Krishna Rajan

Abstract

Atom probe tomography (APT) represents a revolutionary characterization tool for materials that combine atomic imaging with a time-of-flight (TOF) mass spectrometer to provide direct space three-dimensional, atomic scale resolution images of materials with the chemical identities of hundreds of millions of atoms. It involves the controlled removal of atoms from a specimen’s surface by field evaporation and then sequentially analyzing them with a position sensitive detector and TOF mass spectrometer. A paradox in APT is that while on the one hand, it provides an unprecedented level of imaging resolution in three dimensions, it is very difficult to obtain an accurate perspective of morphology or shape outlined by atoms of similar chemistry and microstructure. The origins of this problem are numerous, including incomplete detection of atoms and the complexity of the evaporation fields of atoms at or near interfaces. Hence, unlike scattering techniques such as electron microscopy, interfaces appear diffused, not sharp. This, in turn, makes it challenging to visualize and quantitatively interpret the microstructure at the “meso” scale, where one is interested in the shape and form of the interfaces and their associated chemical gradients. It is here that the application of informatics at the nanoscale and statistical learning methods plays a critical role in both defining the level of uncertainty and helping to make quantitative, statistically objective interpretations where heuristics often dominate. In this chapter, we show how the tools of Topological Data Analysis provide a new and powerful tool in the field of nanoinformatics for materials characterization.

Toroidal Topology of Population Activity in Grid Cells (2022)

Richard J. Gardner, Erik Hermansen, Marius Pachitariu, Yoram Burak, Nils A. Baas, Benjamin A. Dunn, May-Britt Moser, Edvard I. Moser

Abstract

The medial entorhinal cortex is part of a neural system for mapping the position of an individual within a physical environment1. Grid cells, a key component of this system, fire in a characteristic hexagonal pattern of locations2, and are organized in modules3 that collectively form a population code for the animal’s allocentric position1. The invariance of the correlation structure of this population code across environments4,5 and behavioural states6,7, independent of specific sensory inputs, has pointed to intrinsic, recurrently connected continuous attractor networks (CANs) as a possible substrate of the grid pattern1,8–11. However, whether grid cell networks show continuous attractor dynamics, and how they interface with inputs from the environment, has remained unclear owing to the small samples of cells obtained so far. Here, using simultaneous recordings from many hundreds of grid cells and subsequent topological data analysis, we show that the joint activity of grid cells from an individual module resides on a toroidal manifold, as expected in a two-dimensional CAN. Positions on the torus correspond to positions of the moving animal in the environment. Individual cells are preferentially active at singular positions on the torus. Their positions are maintained between environments and from wakefulness to sleep, as predicted by CAN models for grid cells but not by alternative feedforward models12. This demonstration of network dynamics on a toroidal manifold provides a population-level visualization of CAN dynamics in grid cells.

Community Resources

Data

Topological Descriptors for Coral Reef Resilience Using a Stochastic Spatial Model (2022)

Robert A. McDonald, Rosanna Neuhausler, Martin Robinson, Laurel G. Larsen, Heather A. Harrington, Maria Bruna

Abstract

A complex interplay between species governs the evolution of spatial patterns in ecology. An open problem in the biological sciences is characterizing spatio-temporal data and understanding how changes at the local scale affect global dynamics/behavior. We present a toolkit of multiscale methods and use them to analyze coral reef resilience and dynamics.Here, we extend a well-studied temporal mathematical model of coral reef dynamics to include stochastic and spatial interactions and then generate data to study different ecological scenarios. We present descriptors to characterize patterns in heterogeneous spatio-temporal data surpassing spatially averaged measures. We apply these descriptors to simulated coral data and demonstrate the utility of two topological data analysis techniques--persistent homology and zigzag persistence--for characterizing the spatiotemporal evolution of reefs and generating insight into mechanisms of reef resilience. We show that the introduction of local competition between species leads to the appearance of coral clusters in the reef. Furthermore, we use our analyses to distinguish the temporal dynamics that stem from different initial configurations of coral, showing that the neighborhood composition of coral sites determines their long-term survival. Finally, we use zigzag persistence to quantify spatial behavior in the metastable regime as the level of fish grazing on algae varies and determine which spatial configurations protect coral from extinction in different environments.

Community Resources

Data

Pattern Characterization Using Topological Data Analysis: Application to Piezo Vibration Striking Treatment (2023)

Max M. Chumley, Melih C. Yesilli, Jisheng Chen, Firas A. Khasawneh, Yang Guo

Abstract

Quantifying patterns in visual or tactile textures provides important information about the process or phenomena that generated these patterns. In manufacturing, these patterns can be intentionally introduced as a design feature, or they can be a byproduct of a specific process. Since surface texture has significant impact on the mechanical properties and the longevity of the workpiece, it is important to develop tools for quantifying surface patterns and, when applicable, comparing them to their nominal counterparts. While existing tools may be able to indicate the existence of a pattern, they typically do not provide more information about the pattern structure, or how much it deviates from a nominal pattern. Further, prior works do not provide automatic or algorithmic approaches for quantifying other pattern characteristics such as depths’ consistency, and variations in the pattern motifs at different level sets. This paper leverages persistent homology from Topological Data Analysis (TDA) to derive noise-robust scores for quantifying motifs’ depth and roundness in a pattern. Specifically, sublevel persistence is used to derive scores that quantify the consistency of indentation depths at any level set in Piezo Vibration Striking Treatment (PVST) surfaces. Moreover, we combine sublevel persistence with the distance transform to quantify the consistency of the indentation radii, and to compare them with the nominal ones. Although the tool in our PVST experiments had a semi-spherical profile, we present a generalization of our approach to tools/motifs of arbitrary shapes thus making our method applicable to other pattern-generating manufacturing processes.

Community Resources

Code

Steinhaus Filtration and Stable Paths in the Mapper (2020)

Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel Saul

Abstract

Two central concepts from topological data analysis are persistence and the Mapper construction. Persistence employs a sequence of objects built on data called a filtration. A Mapper produces insightful summaries of data, and has found widespread applications in diverse areas. We define a new filtration called the cover filtration built from a single cover based on a generalized Steinhaus distance, which is a generalization of Jaccard distance. We prove a stability result: the cover filtrations of two covers are \$\alpha/m\$ interleaved, where \$\alpha\$ is a bound on bottleneck distance between covers and \$m\$ is the size of smallest set in either cover. We also show our construction is equivalent to the Cech filtration under certain settings, and the Vietoris-Rips filtration completely determines the cover filtration in all cases. We then develop a theory for stable paths within this filtration. Unlike standard results on stability in topological persistence, our definition of path stability aligns exactly with the above result on stability of cover filtration. We demonstrate how our framework can be employed in a variety of applications where a metric is not obvious but a cover is readily available. First we present a new model for recommendation systems using cover filtration. For an explicit example, stable paths identified on a movies data set represent sequences of movies constituting gentle transitions from one genre to another. As a second application in explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations. Stable paths in the Mapper from a supervised machine learning model trained on the FashionMNIST data set provide improved explanations of relationships between subpopulations of images.

Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

Monica Nicolau, Arnold J. Levine, Gunnar Carlsson

Abstract

High-throughput biological data, whether generated as sequencing, transcriptional microarrays, proteomic, or other means, continues to require analytic methods that address its high dimensional aspects. Because the computational part of data analysis ultimately identifies shape characteristics in the organization of data sets, the mathematics of shape recognition in high dimensions continues to be a crucial part of data analysis. This article introduces a method that extracts information from high-throughput microarray data and, by using topology, provides greater depth of information than current analytic techniques. The method, termed Progression Analysis of Disease (PAD), first identifies robust aspects of cluster analysis, then goes deeper to find a multitude of biologically meaningful shape characteristics in these data. Additionally, because PAD incorporates a visualization tool, it provides a simple picture or graph that can be used to further explore these data. Although PAD can be applied to a wide range of high-throughput data types, it is used here as an example to analyze breast cancer transcriptional data. This identified a unique subgroup of Estrogen Receptor-positive (ER+) breast cancers that express high levels of c-MYB and low levels of innate inflammatory genes. These patients exhibit 100% survival and no metastasis. No supervised step beyond distinction between tumor and healthy patients was used to identify this subtype. The group has a clear and distinct, statistically significant molecular signature, it highlights coherent biology but is invisible to cluster methods, and does not fit into the accepted classification of Luminal A/B, Normal-like subtypes of ER+ breast cancers. We denote the group as c-MYB+ breast cancer.

Felix: A Topology Based Framework for Visual Exploration of Cosmic Filaments (2016)

Nithin Shivshankar, Pratyush Pranav, Vijay Natarajan, Rien van de Weygaert, E. G. Patrick Bos, Steven Rieder

Abstract

The large-scale structure of the universe is comprised of virialized blob-like clusters, linear filaments, sheet-like walls and huge near empty three-dimensional voids. Characterizing the large scale universe is essential to our understanding of the formation and evolution of galaxies. The density range of clusters, walls and voids are relatively well separated, when compared to filaments, which span a relatively larger range. The large scale filamentary network thus forms an intricate part of the cosmic web. In this paper, we describe Felix, a topology based framework for visual exploration of filaments in the cosmic web. The filamentary structure is represented by the ascending manifold geometry of the 2-saddles in the Morse-Smale complex of the density field. We generate a hierarchy of Morse-Smale complexes and query for filaments based on the density ranges at the end points of the filaments. The query is processed efficiently over the entire hierarchical Morse-Smale complex, allowing for interactive visualization. We apply Felix to computer simulations based on the heuristic Voronoi kinematic model and the standard \$\Lambda\$CDM cosmology, and demonstrate its usefulness through two case studies. First, we extract cosmic filaments within and across cluster like regions in Voronoi kinematic simulation datasets. We demonstrate that we produce similar results to existing structure finders. Filaments that form the spine of the cosmic web, which exist in high density regions in the current epoch, are isolated using Felix. Also, filaments present in void-like regions are isolated and visualized. These filamentary structures are often over shadowed by higher density range filaments and are not easily characterizable and extractable using other filament extraction methodologies.

MRI and Biomechanics Multidimensional Data Analysis Reveals R2 -R1ρ as an Early Predictor of Cartilage Lesion Progression in Knee Osteoarthritis (2017)

Valentina Pedoia, Jenny Haefeli, Kazuhito Morioka, Hsiang-Ling Teng, Lorenzo Nardo, Richard B. Souza, Adam R. Ferguson, Sharmila Majumdar

Abstract

PURPOSE: To couple quantitative compositional MRI, gait analysis, and machine learning multidimensional data analysis to study osteoarthritis (OA). OA is a multifactorial disorder accompanied by biochemical and morphological changes in the articular cartilage, modulated by skeletal biomechanics and gait. While we can now acquire detailed information about the knee joint structure and function, we are not yet able to leverage the multifactorial factors for diagnosis and disease management of knee OA. MATERIALS AND METHODS: We mapped 178 subjects in a multidimensional space integrating: demographic, clinical information, gait kinematics and kinetics, cartilage compositional T1ρ and T2 and R2 -R1ρ (1/T2 -1/T1ρ ) acquired at 3T and whole-organ magnetic resonance imaging score morphological grading. Topological data analysis (TDA) and Kolmogorov-Smirnov test were adopted for data integration, analysis, and hypothesis generation. Regression models were used for hypothesis testing. RESULTS: The results of the TDA showed a network composed of three main patient subpopulations, thus potentially identifying new phenotypes. T2 and T1ρ values (T2 lateral femur P = 1.45*10-8 , T1ρ medial tibia P = 1.05*10-5 ), the presence of femoral cartilage defects (P = 0.0013), lesions in the meniscus body (P = 0.0035), and race (P = 2.44*10-4 ) were key markers in the subpopulation classification. Within one of the subpopulations we observed an association between the composite metric R2 -R1ρ and the longitudinal progression of cartilage lesions. CONCLUSION: The analysis presented demonstrates some of the complex multitissue biochemical and biomechanical interactions that define joint degeneration and OA using a multidimensional approach, and potentially indicates that R2 -R1ρ may be an imaging biomarker for early OA. LEVEL OF EVIDENCE: 3 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;47:78-90.

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)

Nicole Bussola, Bruno Papa, Ombretta Melaiu, Aurora Castellano, Doriana Fruci, Giuseppe Jurman

Abstract

We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the U-Net model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform.

Fundamentals on Base Stations in Urban Cellular Networks: From the Perspective of Algebraic Topology (2018)

Ying Chen, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

Export citation

A Topological Representation of Branching Neuronal Morphologies (2018)

Lida Kanari, Pawe\\textbackslash\l D\\textbackslash\lotko, Martina Scolamiero, Ran Levi, Julian Shillcock, Kathryn Hess, Henry Markram

Export citation

Using Topological Data Analysis for Diagnosis Pulmonary Embolism (2015)

M. Rucco, E. Merelli, D. Herman, D. Ramanan, T. Petrossian, L. Falsetti, C. Nitti, A. Salvi

Community Resources

Code

Modeling the Spread of the Zika Virus Using Topological Data Analysis (2018)

Derek Lo, Briton Park

Export citation

A Multimodal Data Analysis Approach for Targeted Drug Discovery Involving Topological Data Analysis (TDA) (2016)

Muthuraman Alagappan, Dadi Jiang, Nicholas Denko, Albert C. Koong

Export citation

Algorithms for Topological Analysis of Spatial Data (2018)

Sergey Eremeev, Ekaterina Seltsova

Export citation

Dissecting Ethereum Blockchain Analytics: What We Learn From Topology and Geometry of the Ethereum Graph? (2020)

Yitao Li, Umar Islambekov, Cuneyt Akcora, Ekaterina Smirnova, Yulia R. Gel, Murat Kantarcioglu

Detecting Functional States of the Rat Brain With Topological Data Analysis (2018)

Nianqiao Ju, Ismar Volić, Michael Wiest

Export citation

Topological Features in Cancer Gene Expression Data (2014)

S. Lockwood, B. Krishnamoorthy

Protein Classification With Improved Topological Data Analysis (2018)

Tamal K. Dey, Sayan Mandal

Identification of Type 2 Diabetes Subgroups Through Topological Analysis of Patient Similarity (2015)

Li Li, Wei-Yi Cheng, Benjamin S. Glicksberg, Omri Gottesman, Ronald Tamler, Rong Chen, Erwin P. Bottinger, Joel T. Dudley

Export citation

Topological Feature Vectors for Chatter Detection in Turning Processes (2019)

Melih C. Yesilli, Firas A. Khasawneh, Andreas Otto

Export citation

Topological Data Analysis With Metric Learning and an Application to High-Dimensional Football Data (2015)

David Alejandro Perdomo Meza

Export citation

High-Throughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: Application to Zeolites (2018)

Yongjin Lee, Senja D. Barthel, Pawe\\textbackslash\l D\\textbackslash\lotko, Seyed Mohamad Moosavi, Kathryn Hess, Berend Smit

Export citation

Topological Data Analysis Quantifies Biological Nano-Structure From Single Molecule Localization Microscopy (2020)

Jeremy A. Pike, Abdullah O. Khan, Chiara Pallini, Steven G. Thomas, Markus Mund, Jonas Ries, Natalie S. Poulter, Iain B. Styles

Abstract

AbstractMotivation. Localization microscopy data is represented by a set of spatial coordinates, each corresponding to a single detection, that form a point cl

Resting-State fMRI Functional Connectivity: Big Data Preprocessing Pipelines and Topological Data Analysis (2017)

Angkoon Phinyomark, Esther Ibáñez-Marcelo, Giovanni Petri

Export citation

Towards Topological Pattern Detection Methods in Climate Data: Application to Atmospheric Blocking Events (2018)

Grzegorz Muszynski, Karthik Kashinath, Vitaliy Kurlin, Michael F. Wehner, Prabhat

Export citation

A Novel Approach to Identifying a Neuroimaging Biomarker for Patients With Serious Mental Illness (2017)

Alok Madan, J. Christopher Fowler, Michelle A. Patriquin, Ramiro Salas, Philip R. Baldwin, Kenia M. Velasquez, Humsini Viswanath, David L. Molfese, Carla Sharp, Jon G. Allen

Export citation

Topological Distance Between Nonplanar Transportation Networks (2018)

Ahmed Abdelkader, Geoff Boeing, Brittany Terese Fasy, David L. Millman

Export citation

A Survey of Topological Data Analysis Methods for Big Data in Healthcare Intelligence (2019)

Milan Joshi, Dhananjay Joshi

Export citation

Multiscale Topology Characterizes Dynamic Tumor Vascular Networks (2022)

Bernadette J. Stolz, Jakob Kaeppler, Bostjan Markelc, Franziska Braun, Florian Lipsmeier, Ruth J. Muschel, Helen M. Byrne, Heather A. Harrington

A New Approach to Investigate the Association Between Brain Functional Connectivity and Disease Characteristics of Attention-Deficit/Hyperactivity Disorder: Topological Neuroimaging Data Analysis (2015)

Sunghyon Kyeong, Seonjeong Park, Keun-Ah Cheon, Jae-Jin Kim, Dong-Ho Song, Eunjoo Kim

Export citation

Topology of the Mesoscale Connectome of the Mouse Brain (2018)

Pascal Grange

Community Resources

Data

Export citation

Topological Characterization of Shallow Cumulus Cloud Fields Using Persistent Homology (2018)

José Licón-Saláiz, Henri Riihimäki, Thirza W. van Laar

Topological Data Analysis and 3D Printing Technologies for Flow in Fracture Networks (2018)

Anna Suzuki, M. Miyazawa, T. Ito

Export citation

Applying a Novel Integrated Persistent Feature to Understand Topographical Network Connectivity in Older Adults With Autism Spectrum Disorder (2019)

Michael Catchings

Export citation

A Collaborative Visual Analytics Suite for Protein Folding Research (2014)

William Harvey, In-Hee Park, Oliver Rübel, Valerio Pascucci, Peer-Timo Bremer, Chenglong Li, Yusu Wang

Using Topological Data Analysis for Text Classification (2018)

Pratik Doshi

Community Resources

Code

Export citation

Exploring Hyperspectral Imaging Data Sets With Topological Data Analysis (2017)

Ludovic Duponchel

Export citation

Deep Learning With Topological Signatures (2017)

Christoph Hofer, Roland Kwitt, Marc Niethammer, Andreas Uhl

TopoResNet: A Hybrid Deep Learning Architecture and Its Application to Skin Lesion Classification (2019)

Yu-Min Chung, Chuan-Shen Hu, Austin Lawson, Clifford Smyth

Community Resources

Data

Export citation

Automatic Detection of Image Morphing by Topology-Based Analysis (2018)

Sabah Jassim, Aras Asaad

Export citation

Topological Approaches to Skin Disease Image Analysis (2018)

Yu-Min Chung, Chuan-Shen Hu, Austin Lawson, Clifford Smyth

Export citation

Topological Data Analysis of Task-Based fMRI Data From Experiments on Schizophrenia (2021)

Bernadette J. Stolz, Tegan Emerson, Satu Nahkuri, Mason A. Porter, Heather A. Harrington

An Introduction to a New Text Classification and Visualization for Natural Language Processing Using Topological Data Analysis (2019)

Naiereh Elyasi, Mehdi Hosseini Moghadam

Export citation

Use of Topological Data Analysis in Motor Intention Based Brain-Computer Interfaces (2018)

Fatih Altindis, Bulent Yilmaz, Sergey Borisenok, Kutay Icoz

Community Resources

Code (Toolbox)

Export citation

Exact Topological Inference of the Resting-State Brain Networks in Twins (2019)

Moo K. Chung, Hyekyoung Lee, Hernando Ombao, Victor Solo

Export citation

Topology and Geometry for Small Sample Sizes: An Application to Research on the Profoundly Gifted (2018)

Colleen Molloy Farrelly

Export citation

The Topological Basis of Function in Flow and Mechanical Networks (2019)

Jason Rocks, Andrea Liu, Eleni Katifori

Export citation

Limitations of Topological Data Analysis for Event-Related fMRI (2018)

Cameron T. Ellis, Michael Lesnick, Gregory Henselman-Petrusek, Bryn Keller, Jonathan D. Cohen

Export citation

A Study on Topological Descriptors for the Analysis of 3D Surface Texture (2018)

Matthias Zeppelzauer, Bartosz Zieliński, Mateusz Juda, Markus Seidl

Export citation

Novel Subgroups of Attention-Deficit/Hyperactivity Disorder Identified by Topological Data Analysis and Their Functional Network Modular Organizations (2017)

Sunghyon Kyeong, Jae-Jin Kim, Eunjoo Kim

Export citation

Characterizing Grapevine 3D Inflorescence Architecture Using X-Ray Imaging and Advanced Morphometrics: Implications for Understanding Cluster Density (2019)

Mao Li, Laura L. Klein, Keith E. Duncan, Ni Jiang, Daniel H. Chitwood, Jason Londo, Allison J. Miller, Christopher N. Topp

Export citation

ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery (2022)

Andaç Demir, Baris Coskunuzer, Yulia Gel, Ignacio Segovia-Dominguez, Yuzhou Chen, Bulent Kiziltan

Community Resources

Data
Video

Topological Data Analysis for Data Mining Small Educational Samples With Application to Studies of the Gifted (2018)

Colleen Molloy Farrelly

Export citation

Topological Persistence Vineyard for Dynamic Functional Brain Connectivity During Resting and Gaming Stages (2016)

Jaejun Yoo, Eun Young Kim, Yong Min Ahn, Jong Chul Ye

Topology Across Scales on Heterogeneous Cell Data (2025)

Maria Torras-Perez, Iris H.R. Yoon, Praveen Weeratunga, Ling-Pei Ho, Helen M. Byrne, Ulrike Tillmann, Heather A. Harrington

Classification of Firn Data via Topological Features (2025)

Sarah Day, Jesse Dimino, Matt Jester, Kaitlin Keegan, Thomas Weighill

Time Delay Embeddings to Characterize the Timbre of Musical Instruments Using Topological Data Analysis: A Study on Synthetic and Real Data (2025)

Gakusei Sato, Riccardo Muolo, Hiroya Nakao

A Barcode Shape Descriptor for Curve Point Cloud Data (2004)

Anne Collins, Afra Zomorodian, Gunnar Carlsson, Leonidas J. Guibas

Abstract

In this paper, we present a complete computational pipeline for extracting a compact shape descriptor for curve point cloud data (PCD). Our shape descriptor, called a barcode, is based on a blend of techniques from differential geometry and algebraic topology. We also provide a metric over the space of barcodes, enabling fast comparison of PCDs for shape recognition and clustering. To demonstrate the feasibility of our approach, we implement our pipeline and provide experimental evidence in shape classification and parametrization.

Construction of Symbolic Dynamics From Experimental Time Series (1999)

K. Mischaikow, M. Mrozek, J. Reiss, A. Szymczak

Abstract

Symbolic dynamics play a central role in the description of the evolution of nonlinear systems. Yet there are few methods for determining symbolic dynamics of chaotic data. One difficulty is that the data contains random fluctuations associated with the experimental process. Using data obtained from a magnetoelastic ribbon experiment we show how a topological approach that allows for experimental error and bounded noise can be used to obtain a description of the dynamics in terms of subshift dynamics on a finite set of symbols.

Filtration Curves for Graph Representation (2021)

Leslie O'Bray, Bastian Rieck, Karsten Borgwardt

Abstract

The two predominant approaches to graph comparison in recent years are based on (i) enumerating matching subgraphs or (ii) comparing neighborhoods of nodes. In this work, we complement these two perspectives with a third way of representing graphs: using filtration curves from topological data analysis that capture both edge weight information and global graph structure. Filtration curves are highly efficient to compute and lead to expressive representations of graphs, which we demonstrate on graph classification benchmark datasets. Our work opens the door to a new form of graph representation in data mining.

Hierarchical Clustering and Zeroth Persistent Homology (2020)

İsmail Güzel, Atabey Kaygun

Abstract

In this article, we show that hierarchical clustering and the zeroth persistent homology do deliver the same topological information about a given data set. We show this fact using cophenetic matrices constructed out of the filtered Vietoris-Rips complex of the data set at hand. As in any cophenetic matrix, one can also display the inter-relations of zeroth homology classes via a rooted tree, also known as a dendogram. Since homological cophenetic matrices can be calculated for higher homologies, one can also sketch similar dendograms for higher persistent homology classes.

Simplicial Neural Networks (2020)

Stefania Ebli, Michaël Defferrard, Gard Spreemann

Abstract

We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices - allowing us to consider richer data, including vector fields and \$n\$-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes.

Topographical Transcriptome Mapping of the Mouse Medial Ganglionic Eminence by Spatially Resolved RNA-seq (2014)

Sabrina Zechel, Pawel Zajac, Peter Lönnerberg, Carlos F. Ibáñez, Sten Linnarsson

Abstract

Cortical interneurons originating from the medial ganglionic eminence, MGE, are among the most diverse cells within the CNS. Different pools of proliferating progenitor cells are thought to exist in the ventricular zone of the MGE, but whether the underlying subventricular and mantle regions of the MGE are spatially patterned has not yet been addressed. Here, we combined laser-capture microdissection and multiplex RNA-sequencing to map the transcriptome of MGE cells at a spatial resolution of 50 μm.

Persistent Homology Analysis of Brain Artery Trees (2016)

Paul Bendich, J. S. Marron, Ezra Miller, Alex Pieloch, Sean Skwerer

Abstract

New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries.

Towards a New Approach to Reveal Dynamical Organization of the Brain Using Topological Data Analysis (2018)

Manish Saggar, Olaf Sporns, Javier Gonzalez-Castillo, Peter A. Bandettini, Gunnar Carlsson, Gary Glover, Allan L. Reiss

Abstract

Approaches describing how the brain changes to accomplish cognitive tasks tend to rely on collapsed data. Here, authors present a new approach that maintains high dimensionality and use it to describe individual differences in how brain activity is represented and organized across different cognitive tasks.

Coordinate-Free Coverage in Sensor Networks With Controlled Boundaries via Homology (2006)

V. de Silva, R. Ghrist

Abstract

Tools from computational homology are introduced to verify coverage in an idealized sensor network. These methods are unique in that, while they are coordinate-free and assume no localization or orientation capabilities for the nodes, there are also no probabilistic assumptions. The key ingredient is the theory of homology from algebraic topology. The robustness of these tools is demonstrated by adapting them to a variety of settings, including static planar coverage, 3-D barrier coverage, and time-dependent sweeping coverage. Results are also given on hole repair, error tolerance, optimal coverage, and variable radii. An overview of implementation is given.

Coverage in Sensor Networks via Persistent Homology (2007)

Vin de Silva, Robert Ghrist

Abstract

We introduce a topological approach to a problem of covering a region in Euclidean space by balls of fixed radius at unknown locations (this problem being motivated by sensor networks with minimal sensing capabilities). In particular, we give a homological criterion to rigorously guarantee that a collection of balls covers a bounded domain based on the homology of a certain simplicial pair. This pair of (Vietoris–Rips) complexes is derived from graphs representing a coarse form of distance estimation between nodes and a proximity sensor for the boundary of the domain. The methods we introduce come from persistent homology theory and are applicable to nonlocalized sensor networks with ad hoc wireless communications.

Classification of Skin Lesions by Topological Data Analysis Alongside With Neural Network (2020)

Naiereh Elyasi, Mehdi Hosseini Moghadam

Abstract

In this paper we use TDA mapper alongside with deep convolutional neural networks in the classification of 7 major skin diseases. First we apply kepler mapper with neural network as one of its filter steps to classify the dataset HAM10000. Mapper visualizes the classification result by a simplicial complex, where neural network can not do this alone, but as a filter step neural network helps to classify data better. Furthermore we apply TDA mapper and persistent homology to understand the weights of layers of mobilenet network in different training epochs of HAM10000. Also we use persistent diagrams to visualize the results of analysis of layers of mobilenet network.

Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition (2007)

Gurjeet Singh, Facundo Mémoli, Gunnar Carlsson

Abstract

We present a computational method for extracting simple descriptions of high dimensional data sets in the form of simplicial complexes. Our method, called Mapper, is based on the idea of partial clustering of the data guided by a set of functions deﬁned on the data. The proposed method is not dependent on any particular clustering algorithm, i.e. any clustering algorithm may be used with Mapper. We implement this method and present a few sample applications in which simple descriptions of the data present important information about its structure.

Export citation

Homological Scaffold via Minimal Homology Bases (2021)

Marco Guerra, Alessandro De Gregorio, Ulderico Fugacci, Giovanni Petri, Francesco Vaccarino

Abstract

The homological scaffold leverages persistent homology to construct a topologically sound summary of a weighted network. However, its crucial dependency on the choice of representative cycles hinders the ability to trace back global features onto individual network components, unless one provides a principled way to make such a choice. In this paper, we apply recent advances in the computation of minimal homology bases to introduce a quasi-canonical version of the scaffold, called minimal, and employ it to analyze data both real and in silico. At the same time, we verify that, statistically, the standard scaffold is a good proxy of the minimal one for sufficiently complex networks.

HiDeF: Identifying Persistent Structures in Multiscale ‘Omics Data (2021)

Fan Zheng, She Zhang, Christopher Churas, Dexter Pratt, Ivet Bahar, Trey Ideker

Abstract

In any ‘omics study, the scale of analysis can dramatically affect the outcome. For instance, when clustering single-cell transcriptomes, is the analysis tuned to discover broad or specific cell types? Likewise, protein communities revealed from protein networks can vary widely in sizes depending on the method. Here, we use the concept of persistent homology, drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. Application to mouse single-cell transcriptomes significantly expands the catalog of identified cell types, while analysis of SARS-COV-2 protein interactions suggests hijacking of WNT. The method, HiDeF, is available via Python and Cytoscape.

Topological Data Analysis of C. Elegans Locomotion and Behavior (2021)

Ashleigh Thomas, Kathleen Bates, Alex Elchesen, Iryna Hartsock, Hang Lu, Peter Bubenik

Abstract

Video of nematodes/roundworms was analyzed using persistent homology to study locomotion and behavior. In each frame, an organism's body posture was represented by a high-dimensional vector. By concatenating points in fixed-duration segments of this time series, we created a sliding window embedding (sometimes called a time delay embedding) where each point corresponds to a sequence of postures of an organism. Persistent homology on the points in this time series detected behaviors and comparisons of these persistent homology computations detected variation in their corresponding behaviors. We used average persistence landscapes and machine learning techniques to study changes in locomotion and behavior in varying environments.

Multiscale Projective Coordinates via Persistent Cohomology of Sparse Filtrations (2018)

Jose A. Perea

Abstract

We present a framework which leverages the underlying topology of a data set, in order to produce appropriate coordinate representations. In particular, we show how to construct maps to real and complex projective spaces, given appropriate persistent cohomology classes. An initial map is obtained in two steps: First, the persistent cohomology of a sparse filtration is used to compute systems of transition functions for (real and complex) line bundles over neighborhoods of the data. Next, the transition functions are used to produce explicit classifying maps for the induced bundles. A framework for dimensionality reduction in projective space (Principal Projective Components) is also developed, aimed at decreasing the target dimension of the original map. Several examples are provided as well as theorems addressing choices in the construction.

A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks (2022)

Bouchaib Azamir, Driss Bennis, Bertrand Michel

Abstract

Topological data analysis has recently been applied to the study of dynamic networks. In this context, an algorithm was introduced and helps, among other things, to detect early warning signals of abnormal changes in the dynamic network under study. However, the complexity of this algorithm increases significantly once the database studied grows. In this paper, we propose a simplification of the algorithm without affecting its performance. We give various applications and simulations of the new algorithm on some weighted networks. The obtained results show clearly the efficiency of the introduced approach. Moreover, in some cases, the proposed algorithm makes it possible to highlight local information and sometimes early warning signals of local abnormal changes.

Mind the Gap: A Study in Global Development Through Persistent Homology (2018)

Andrew Banman, Lori Ziegelmeier

Abstract

The Gapminder project set out to use statistics to dispel simplistic notions about global development. In the same spirit, we use persistent homology, a technique from computational algebraic topology, to explore the relationship between country development and geography. For each country, four indicators, gross domestic product per capita; average life expectancy; infant mortality; and gross national income per capita, were used to quantify the development. Two analyses were performed. The first considers clusters of the countries based on these indicators, and the second uncovers cycles in the data when combined with geographic border structure. Our analysis is a multi-scale approach that reveals similarities and connections among countries at a variety of levels. We discover localized development patterns that are invisible in standard statistical methods.

Geometry and Topology of the Space of Sonar Target Echos (2018)

Michael Robinson, Sean Fennell, Brian DiZio, Jennifer Dumiak

Abstract

Successful synthetic aperture sonar target classification depends on the “shape” of the scatterers within a target signature. This article presents a workflow that computes a target-to-target distance from persistence diagrams, since the “shape” of a signature informs its persistence diagram in a structure-preserving way. The target-to-target distances derived from persistence diagrams compare favorably against those derived from spectral features and have the advantage of being substantially more compact. While spectral features produce clusters associated to each target type that are reasonably dense and well formed, the clusters are not well-separated from one another. In rather dramatic contrast, a distance derived from persistence diagrams results in highly separated clusters at the expense of some misclassification of outliers.

Using Persistent Homology to Reveal Hidden Information in Neural Data (2015)

Gard Spreemann, Benjamin Dunn, Magnus Bakke Botnan, Nils A. Baas

Abstract

We propose a method, based on persistent homology, to uncover topological properties of a priori unknown covariates of neuron activity. Our input data consist of spike train measurements of a set of neurons of interest, a candidate list of the known stimuli that govern neuron activity, and the corresponding state of the animal throughout the experiment performed. Using a generalized linear model for neuron activity and simple assumptions on the effects of the external stimuli, we infer away any contribution to the observed spike trains by the candidate stimuli. Persistent homology then reveals useful information about any further, unknown, covariates.

Community Resources

Code

Topological Feature Extraction for Comparison of Terascale Combustion Simulation Data (2011)

Ajith Mascarenhas, Ray W. Grout, Peer-Timo Bremer, Evatt R. Hawkes, Valerio Pascucci, Jacqueline H. Chen

Abstract

We describe a combinatorial streaming algorithm to extract features which identify regions of local intense rates of mixing in twoterascale turbulent combustion simulations. Our algorithm allows simulation data comprised of scalar fields represented on 728x896x512 or 2025x1600x400 grids to be processed on a single relatively lightweight machine. The turbulence-induced mixing governs the rate of reaction and hence is of principal interest in these combustion simulations. We use our feature extraction algorithm to compare two very different simulations and find that in both the thickness of the extracted features grows with decreasing turbulence intensity. Simultaneous consideration of results of applying the algorithm to the HO2 mass fraction field indicates that autoignition kernels near the base of a lifted flame tend not to overlap with the high mixing rate regions.

Motor Eccentricity Fault Detection: Physics-Based and Data-Driven Approaches (2023)

Bingnan Wang, Hiroshi Inoue, Makoto Kanemaru

Abstract

Fault detection using motor current signature analysis (MCSA) is attractive for industrial applications due to its simplicity with no additional sensor installation required. However current components associated with faults are often very subtle and much smaller than the supply frequency component, making it challenging to detect and quantify fault levels. In this paper, we present our work on quantitative eccentricity fault diagnosis technologies for electric motors, including physical-model approach using improved winding function theory, which can simulate motor dynamics under faulty conditions and agrees well with experiment data, and data-driven approach using topological data analysis (TDA), which can effectively differentiate signals measured at different eccentricity levels. The advantages and limitations of each approach is discussed. Both methods can be extended to the detection and quantification of other types of electric motor faults.

Protein-Folding Analysis Using Features Obtained by Persistent Homology (2020)

Takashi Ichinomiya, Ippei Obayashi, Yasuaki Hiraoka

Abstract

Understanding the protein-folding process is an outstanding issue in biophysics; recent developments in molecular dynamics simulation have provided insights into this phenomenon. However, the large freedom of atomic motion hinders the understanding of this process. In this study, we applied persistent homology, an emerging method to analyze topological features in a data set, to reveal protein-folding dynamics. We developed a new, to our knowledge, method to characterize the protein structure based on persistent homology and applied this method to molecular dynamics simulations of chignolin. Using principle component analysis or nonnegative matrix factorization, our analysis method revealed two stable states and one saddle state, corresponding to the native, misfolded, and transition states, respectively. We also identified an unfolded state with slow dynamics in the reduced space. Our method serves as a promising tool to understand the protein-folding process.

Persistence Diagrams for Exploring the Shape Variability of Abdominal Aortic Aneurysms (2024)

Dario Arnaldo Domanin, Matteo Pegoraro, Santi Trimarchi, Maurizio Domanin, Piercesare Secchi

Abstract

Abdominal aortic aneurysm consists of a permanent dilation in the abdominal portion of the aorta and, along with its associated pathologies like calcifications and intraluminal thrombi, is one of the most important pathologies of the circulatory system. The shape of the aorta is among the primary drivers for these health issues, with particular reference to all the characteristics which affect the hemodynamics. Starting from the computed tomography angiography of a patient, we propose to summarize such information using tools derived from Topological Data Analysis, obtaining persistence diagrams which describe the irregularities of the lumen of the aorta. We showcase the effectiveness of such shape-related descriptors with a series of supervised and unsupervised case studies.

Simplicial Representation Learning With Neural \$K\$-Forms (2023)

Kelly Maggs, Celia Hacker, Bastian Rieck

Abstract

Geometric deep learning extends deep learning to incorporate information about the geometry and topology data, especially in complex domains like graphs. Despite the popularity of message passing in this field, it has limitations such as the need for graph rewiring, ambiguity in interpreting data, and over-smoothing. In this paper, we take a different approach, focusing on leveraging geometric information from simplicial complexes embedded in \$\mathbb\R\\textasciicircumn\$ using node coordinates. We use differential \$k\$-forms in \$\mathbb\R\\textasciicircumn\$ to create representations of simplices, offering interpretability and geometric consistency without message passing. This approach also enables us to apply differential geometry tools and achieve universal approximation. Our method is efficient, versatile, and applicable to various input complexes, including graphs, simplicial complexes, and cell complexes. It outperforms existing message passing neural networks in harnessing information from geometrical graphs with node features serving as coordinates.

Graph Classification via Heat Diffusion on Simplicial Complexes (2020)

Mehmet Emin Aktas, Esra Akbas

Abstract

In this paper, we study the graph classification problem in vertex-labeled graphs. Our main goal is to classify the graphs comparing their higher-order structures thanks to heat diffusion on their simplices. We first represent vertex-labeled graphs as simplex-weighted super-graphs. We then define the diffusion Frechet function over their simplices to encode the higher-order network topology and finally reach our goal by combining the function values with machine learning algorithms. Our experiments on real-world bioinformatics networks show that using diffusion Fr\éḩet function on simplices is promising in graph classification and more effective than the baseline methods. To the best of our knowledge, this paper is the first paper in the literature using heat diffusion on higher-dimensional simplices in a graph mining problem. We believe that our method can be extended to different graph mining domains, not only the graph classification problem.

Persistent Homology and Many-Body Atomic Structure for Medium-Range Order in the Glass (2015)

Takenobu Nakamura, Yasuaki Hiraoka, Akihiko Hirata, Emerson G. Escolar, Yasumasa Nishiura

Abstract

The characterization of the medium-range (MRO) order in amorphous materials and its relation to the short-range order is discussed. A new topological approach to extract a hierarchical structure of amorphous materials is presented, which is robust against small perturbations and allows us to distinguish it from periodic or random configurations. This method is called the persistence diagram (PD) and introduces scales to many-body atomic structures to facilitate size and shape characterization. We first illustrate the representation of perfect crystalline and random structures in PDs. Then, the MRO in amorphous silica is characterized using the appropriate PD. The PD approach compresses the size of the data set significantly, to much smaller geometrical summaries, and has considerable potential for application to a wide range of materials, including complex molecular liquids, granular materials, and metallic glasses.

An Industry Case of Large-Scale Demand Forecasting of Hierarchical Components (2019)

Rodrigo Rivera-Castro, Ivan Nazarov, Yuke Xiang, Ivan Maksimov, Aleksandr Pletnev, Evgeny Burnaev

Abstract

Demand forecasting of hierarchical components is essential in manufacturing. However, its discussion in the machine-learning literature has been limited, and judgemental forecasts remain pervasive in the industry. Demand planners require easy-to-understand tools capable of delivering state-of-the-art results. This work presents an industry case of demand forecasting at one of the largest manufacturers of electronics in the world. It seeks to support practitioners with five contributions: (1) A benchmark of fourteen demand forecast methods applied to a relevant data set, (2) A data transformation technique yielding comparable results with state of the art, (3) An alternative to ARIMA based on matrix factorization, (4) A model selection technique based on topological data analysis for time series and (5) A novel data set. Organizations seeking to up-skill existing personnel and increase forecast accuracy will find value in this work.

Some Applications of TDA on Financial Markets (2022)

Miguel Angel Ruiz-Ortiz, José Carlos Gómez-Larrañaga, Jesús Rodríguez-Viorato

Abstract

The Topological Data Analysis (TDA) has had many applications. However, financial markets has been studied slightly through TDA. Here we present a quick review of some recent applications of TDA on financial markets and propose a new turbulence index based on persistent homology -- the fundamental tool for TDA -- that seems to capture critical transitions on financial data, based on our experiment with SP500 data before 2020 stock market crash in February 20, 2020, due to the COVID-19 pandemic. We review applications in the early detection of turbulence periods in financial markets and how TDA can help to get new insights while investing and obtain superior risk-adjusted returns compared with investing strategies using classical turbulence indices as VIX and the Chow's index based on the Mahalanobis distance. Furthermore, we include an introduction to persistent homology so the reader could be able to understand this paper without knowing TDA.

Model Comparison via Simplicial Complexes and Persistent Homology (2020)

Sean T. Vittadello, Michael P. H. Stumpf

Abstract

In many scientific and technological contexts we have only a poor understanding of the structure and details of appropriate mathematical models. We often need to compare different models. With available data we can use formal statistical model selection to compare and contrast the ability of different mathematical models to describe such data. But there is a lack of rigorous methods to compare different models \emph\a priori\. Here we develop and illustrate two such approaches that allow us to compare model structures in a systematic way. Using well-developed and understood concepts from simplicial geometry we are able to define a distance based on the persistent homology applied to the simplicial complexes that captures the model structure. In this way we can identify shared topological features of different models. We then expand this, and move from a distance between simplicial complexes to studying equivalences between models in order to determine their functional relatedness.

Four-Dimensional Observation of Ductile Fracture in Sintered Iron Using Synchrotron X-Ray Laminography (2019)

Y. Ozaki, Y. Mugita, M. Aramaki, O. Furukimi, S. Oue, F. Jiang, T. Tsuji, A. Takeuchi, M. Uesugi, K. Ashizuka

Abstract

Synchrotron X-ray laminography was used to examine the time-dependent evolution of the three-dimensional (3D) morphology of micropores in sintered iron during the tensile test. 3D snapshots showed that the networked open pores grow wider than 20 µm along the tensile direction, resulting in the internal necking of the specimen. Subsequently, these pores initiated the cracks perpendicular to the tensile direction by coalescing with the surrounding pre-existing microvoids or with the secondary-generated voids immediately before fracture. Topological analysis of the barycentric positions of these microvoids showed that they form the two-dimensional networks within the ∼20 µm of radius area. These observations strongly indicate that the microvoid coalescence could occur on shear planes formed close to the enlarged open pores or between closed pores by strain accumulation and play an important role in the crack initiation.

Pore Geometry Characterization by Persistent Homology Theory (2018)

Fei Jiang, Takeshi Tsuji, Tomoyuki Shirai

Abstract

Rock pore geometry has heterogeneous characteristics and is scale dependent. This feature in a geological formation differs significantly from artificial materials and makes it difficult to predict hydrologic and elastic properties. To characterize pore heterogeneity, we propose an evaluation method that exploits the recently developed persistent homology theory. In the proposed method, complex pore geometry is first represented as sphere cloud data using a pore-network extraction method. Then, a persistence diagram (PD) is calculated from the point cloud, which represents the spatial distribution of pore bodies. A new parameter (distance index H) derived from the PD is proposed to characterize the degree of rock heterogeneity. Low H value indicates high heterogeneity. A new empirical equation using this index H is proposed to predict the effective elastic modulus of porous media. The results indicate that the proposed PD analysis is very efficient for extracting topological feature of pore geometry.

Quantitative Analysis of Phase Transitions in Two-Dimensional XY Models Using Persistent Homology (2022)

Nicholas Sale, Jeffrey Giansiracusa, Biagio Lucini

Abstract

We use persistent homology and persistence images as an observable of three different variants of the two-dimensional XY model in order to identify and study their phase transitions. We examine models with the classical XY action, a topological lattice action, and an action with an additional nematic term. In particular, we introduce a new way of computing the persistent homology of lattice spin model configurations and, by considering the fluctuations in the output of logistic regression and k-nearest neighbours models trained on persistence images, we develop a methodology to extract estimates of the critical temperature and the critical exponent of the correlation length. We put particular emphasis on finite-size scaling behaviour and producing estimates with quantifiable error. For each model we successfully identify its phase transition(s) and are able to get an accurate determination of the critical temperatures and critical exponents of the correlation length.

A Mayer–Vietoris Formula for Persistent Homology With an Application to Shape Recognition in the Presence of Occlusions (2011)

Barbara Di Fabio, Claudia Landi

Abstract

In algebraic topology it is well known that, using the Mayer–Vietoris sequence, the homology of a space X can be studied by splitting X into subspaces A and B and computing the homology of A, B, and A∩B. A natural question is: To what extent does persistent homology benefit from a similar property? In this paper we show that persistent homology has a Mayer–Vietoris sequence that is generally not exact but only of order 2. However, we obtain a Mayer–Vietoris formula involving the ranks of the persistent homology groups of X, A, B, and A∩B plus three extra terms. This implies that persistent homological features of A and B can be found either as persistent homological features of X or of A∩B. As an application of this result, we show that persistence diagrams are able to recognize an occluded shape by showing a common subset of points.

Persistent Homology for Breast Tumor Classification Using Mammogram Scans (2022)

Aras Asaad, Dashti Ali, Taban Majeed, Rasber Rashid

Abstract

An Important tool in the field topological data analysis is known as persistent Homology (PH) which is used to encode abstract representation of the homology of data at different resolutions in the form of persistence diagram (PD). In this work we build more than one PD representation of a single image based on a landmark selection method, known as local binary patterns, that encode different types of local textures from images. We employed different PD vectorizations using persistence landscapes, persistence images, persistence binning (Betti Curve) and statistics. We tested the effectiveness of proposed landmark based PH on two publicly available breast abnormality detection datasets using mammogram scans. Sensitivity of landmark based PH obtained is over 90% in both datasets for the detection of abnormal breast scans. Finally, experimental results give new insights on using different types of PD vectorizations which help in utilising PH in conjunction with machine learning classifiers.

The Extended Persistent Homology Transform of Manifolds With Boundary (2022)

Katharine Turner, Vanessa Robins, James Morgan

Abstract

The Extended Persistent Homology Transform (XPHT) is a topological transform which takes as input a shape embedded in Euclidean space, and to each unit vector assigns the extended persistence module of the height function over that shape with respect to that direction. We can define a distance between two shapes by integrating over the sphere the distance between their respective extended persistence modules. By using extended persistence we get finite distances between shapes even when they have different Betti numbers. We use Morse theory to show that the extended persistence of a height function over a manifold with boundary can be deduced from the extended persistence for that height function restricted to the boundary, alongside labels on the critical points as positive or negative critical. We study the application of the XPHT to binary images; outlining an algorithm for efficient calculation of the XPHT exploiting relationships between the PHT of the boundary curves to the extended persistence of the foreground.

Using Zigzag Persistent Homology to Detect Hopf Bifurcations in Dynamical Systems (2020)

Sarah Tymochko, Elizabeth Munch, Firas A. Khasawneh

Abstract

Bifurcations in dynamical systems characterize qualitative changes in the system behavior. Therefore, their detection is important because they can signal the transition from normal system operation to imminent failure. While standard persistent homology has been used in this setting, it usually requires analyzing a collection of persistence diagrams, which in turn drives up the computational cost considerably. Using zigzag persistence, we can capture topological changes in the state space of the dynamical system in only one persistence diagram. Here we present Bifurcations using ZigZag (BuZZ), a one-step method to study and detect bifurcations using zigzag persistence. The BuZZ method is successfully able to detect this type of behavior in two synthetic examples as well as an example dynamical system.

Community Resources

Code

Automatic Tree Ring Detection Using Jacobi Sets (2020)

Kayla Makela, Tim Ophelders, Michelle Quigley, Elizabeth Munch, Daniel Chitwood, Asia Dowtin

Abstract

Tree ring widths are an important source of climatic and historical data, but measuring these widths typically requires extensive manual work. Computer vision techniques provide promising directions towards the automation of tree ring detection, but most automated methods still require a substantial amount of user interaction to obtain high accuracy. We perform analysis on 3D X-ray CT images of a cross-section of a tree trunk, known as a tree disk. We present novel automated methods for locating the pith (center) of a tree disk, and ring boundaries. Our methods use a combination of standard image processing techniques and tools from topological data analysis. We evaluate the efficacy of our method for two different CT scans by comparing its results to manually located rings and centers and show that it is better than current automatic methods in terms of correctly counting each ring and its location. Our methods have several parameters, which we optimize experimentally by minimizing edit distances to the manually obtained locations.

The Geometry of Synchronization Problems and Learning Group Actions (2019)

Tingran Gao, Jacek Brodzki, Sayan Mukherjee

Abstract

We develop a geometric framework, based on the classical theory of fibre bundles, to characterize the cohomological nature of a large class of synchronization-type problems in the context of graph inference and combinatorial optimization. We identify each synchronization problem in topological group G on connected graph ΓΓ\Gamma with a flat principal G-bundle over ΓΓ\Gamma , thus establishing a classification result for synchronization problems using the representation variety of the fundamental group of ΓΓ\Gamma into G. We then develop a twisted Hodge theory on flat vector bundles associated with these flat principal G-bundles, and provide a geometric realization of the graph connection Laplacian as the lowest-degree Hodge Laplacian in the twisted de Rham–Hodge cochain complex. Motivated by these geometric intuitions, we propose to study the problem of learning group actions—partitioning a collection of objects based on the local synchronizability of pairwise correspondence relations—and provide a heuristic synchronization-based algorithm for solving this type of problems. We demonstrate the efficacy of this algorithm on simulated and real datasets.

Persistent Homology for Path Planning in Uncertain Environments (2015)

S. Bhattacharya, R. Ghrist, V. Kumar

Abstract

We address the fundamental problem of goal-directed path planning in an uncertain environment represented as a probability (of occupancy) map. Most methods generally use a threshold to reduce the grayscale map to a binary map before applying off-the-shelf techniques to find the best path. This raises the somewhat ill-posed question, what is the right (optimal) value to threshold the map? We instead suggest a persistent homology approach to the problem-a topological approach in which we seek the homology class of trajectories that is most persistent for the given probability map. In other words, we want the class of trajectories that is free of obstacles over the largest range of threshold values. In order to make this problem tractable, we use homology in ℤ2 coefficients (instead of the standard ℤ coefficients), and describe how graph search-based algorithms can be used to find trajectories in different homology classes. Our simulation results demonstrate the efficiency and practical applicability of the algorithm proposed in this paper.paper.

Improving Health Care Management Through Persistent Homology of Time-Varying Variability of Emergency Department Patient Flow (2018)

Mael Dugast, Guillaume Bouleux, Olivier Mory, Eric Marcon

Abstract

Excessive admissions at the Emergency Department (ED) is a phenomenon very closely linked to the propagation of viruses. It is a cause of overcrowding for EDs and a public health problem. The aim of this work is to give EDs’ leaders more time for decision making during this period. Based on the admissions time series associated with speciﬁc clinical diagnoses, we will ﬁrst perform a Detrended Fluctuation Analysis (DFA) to obtain the corresponding variability time series. Next, we will embed this time series on a manifold to obtain a point cloud representation and use Topological Data Analysis (TDA) through persistent homology technic to propose two early realtime indicators. One is the early indicator of abnormal arrivals at the ED whereas the second gives the information on the time index of the maximum number of arrivals. The performance of the detectors is parameter dependent and it can evolve each year. That is why we also propose to solve a bi-objective optimization problem to track the variations of this parameter.

Emotion Recognition in Talking-Face Videos Using Persistent Entropy and Neural Networks (2022)

Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Guillermo Aguirre-Carrazana, Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Guillermo Aguirre-Carrazana

Abstract

\textlessabstract\textgreater\textlessp\textgreaterThe automatic recognition of a person's emotional state has become a very active research field that involves scientists specialized in different areas such as artificial intelligence, computer vision, or psychology, among others. Our main objective in this work is to develop a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos. Specifically, we combine audio-signal and image-sequence information to compute a \textlessitalic\textgreatertopology signature\textless/italic\textgreater (a 9-dimensional vector) for each video. We prove that small changes in the video produce small changes in the signature, ensuring the stability of the method. These topological signatures are used to feed a neural network to distinguish between the following emotions: calm, happy, sad, angry, fearful, disgust, and surprised. The results reached are promising and competitive, beating the performances achieved in other state-of-the-art works found in the literature.\textless/p\textgreater\textless/abstract\textgreater

Persistence-Based Hough Transform for Line Detection (2025)

Johannes Ferner, Stefan Huber, Saverio Messineo, Angel Pop, Martin Uray

Abstract

The Hough transform is a popular and classical technique in computer vision for the detection of lines (or more general objects). It maps a pixel into a dual space -- the Hough space: each pixel is mapped to the set of lines through this pixel, which forms a curve in Hough space. The detection of lines then becomes a voting process to find those lines that received many votes by pixels. However, this voting is done by thresholding, which is susceptible to noise and other artifacts. In this work, we present an alternative voting technique to detect peaks in the Hough space based on persistent homology, which very naturally addresses limitations of simple thresholding. Experiments on synthetic data show that our method significantly outperforms the original method, while also demonstrating enhanced robustness. This work seeks to inspire future research in two key directions. First, we highlight the untapped potential of Topological Data Analysis techniques and advocate for their broader integration into existing methods, including well-established ones. Secondly, we initiate a discussion on the mathematical stability of the Hough transform, encouraging exploration of mathematically grounded improvements to enhance its robustness.

Persistent Homology of Geospatial Data: A Case Study With Voting (2021)

Michelle Feng, Mason A. Porter

Abstract

A crucial step in the analysis of persistent homology is the transformation of data into an appropriate topological object (which, in our case, is a simplicial complex). Software packages for computing persistent homology typically construct Vietoris--Rips or other distance-based simplicial complexes on point clouds because they are relatively easy to compute. We investigate alternative methods of constructing simplicial complexes and the effects of making associated choices during simplicial-complex construction on the output of persistent-homology algorithms. We present two new methods for constructing simplicial complexes from two-dimensional geospatial data (such as maps). We apply these methods to a California precinct-level voting data set, and we thereby demonstrate that our new constructions can capture geometric characteristics that are missed by distance-based constructions. Our new constructions can thus yield more interpretable persistence modules and barcodes for geospatial data. In particular, they are able to distinguish short-persistence features that occur only for a narrow range of distance scales (e.g., voting patterns in densely populated cities) from short-persistence noise by incorporating information about other spatial relationships between regions.

Fibers of Failure: Classifying Errors in Predictive Processes (2020)

Leo S. Carlsson, Mikael Vejdemo-Johansson, Gunnar Carlsson, Pär G. Jönsson

Abstract

Predictive models are used in many different fields of science and engineering and are always prone to make faulty predictions. These faulty predictions can be more or less malignant depending on the model application. We describe fibers of failure (FiFa), a method to classify failure modes of predictive processes. Our method uses Mapper, an algorithm from topological data analysis (TDA), to build a graphical model of input data stratified by prediction errors. We demonstrate two ways to use the failure mode groupings: either to produce a correction layer that adjusts predictions by similarity to the failure modes; or to inspect members of the failure modes to illustrate and investigate what characterizes each failure mode. We demonstrate FiFa on two scenarios: a convolutional neural network (CNN) predicting MNIST images with added noise, and an artificial neural network (ANN) predicting the electrical energy consumption of an electric arc furnace (EAF). The correction layer on the CNN model improved its prediction accuracy significantly while the inspection of failure modes for the EAF model provided guiding insights into the domain-specific reasons behind several high-error regions.

Persistent Homology Analysis of Brain Transcriptome Data in Autism (2019)

Daniel Shnier, Mircea A. Voineagu, Irina Voineagu

Abstract

Persistent homology methods have found applications in the analysis of multiple types of biological data, particularly imaging data or data with a spatial and/or temporal component. However, few studies have assessed the use of persistent homology for the analysis of gene expression data. Here we apply persistent homology methods to investigate the global properties of gene expression in post-mortem brain tissue (cerebral cortex) of individuals with autism spectrum disorders (ASD) and matched controls. We observe a significant difference in the geometry of inter-sample relationships between autism and healthy controls as measured by the sum of the death times of zero-dimensional components and the Euler characteristic. This observation is replicated across two distinct datasets, and we interpret it as evidence for an increased heterogeneity of gene expression in autism. We also assessed the topology of gene-level point clouds and did not observe significant differences between ASD and control transcriptomes, suggesting that the overall transcriptome organization is similar in ASD and healthy cerebral cortex. Overall, our study provides a novel framework for persistent homology analyses of gene expression data for genetically complex disorders.

Musical Stylistic Analysis: A Study of Intervallic Transition Graphs via Persistent Homology (2022)

Martín Mijangos, Alessandro Bravetti, Pablo Padilla

Abstract

Topological data analysis has been recently applied to investigate stylistic signatures and trends in musical compositions. A useful tool in this area is Persistent Homology. In this paper, we develop a novel method to represent a weighted directed graph as a finite metric space and then use persistent homology to extract useful features. We apply this method to weighted directed graphs obtained from pitch transitions information of a given musical fragment and use these techniques to the study of stylistic trends. In particular, we are interested in using these tools to make quantitative stylistic comparisons. As a first illustration, we analyze a selection of string quartets by Haydn, Mozart and Beethoven and discuss possible implications of our results in terms of different approaches by these composers to stylistic exploration and variety. We observe that Haydn is stylistically the most conservative, followed by Mozart, while Beethoven is the most innovative, expanding and modifying the string quartet as a musical form. Finally we also compare the variability of different genres, namely minuets, allegros, prestos and adagios, by a given composer and conclude that the minuet is the most stable form of the string quartet movements.

Persistent Homology in Cosmic Shear - II. A Tomographic Analysis of DES-Y1 (2022)

Sven Heydenreich, Benjamin Brück, Pierre Burger, Joachim Harnois-Déraps, Sandra Unruh, Tiago Castro, Klaus Dolag, Nicolas Martinet

Abstract

We demonstrate how to use persistent homology for cosmological parameter inference in a tomographic cosmic shear survey. We obtain the first cosmological parameter constraints from persistent homology by applying our method to the first-year data of the Dark Energy Survey. To obtain these constraints, we analyse the topological structure of the matter distribution by extracting persistence diagrams from signal-to-noise maps of aperture masses. This presents a natural extension to the widely used peak count statistics. Extracting the persistence diagrams from the cosmo-SLICS, a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations with variable cosmological parameters, we interpolate the signal using Gaussian processes and marginalise over the most relevant systematic effects, including intrinsic alignments and baryonic effects. For the structure growth parameter, we find , which is in full agreement with other late-time probes. We also constrain the intrinsic alignment parameter to \textlessi\textgreaterA\textlessi/\textgreater = 1.54 ± 0.52, which constitutes a detection of the intrinsic alignment effect at almost 3\textlessi\textgreaterσ\textlessi/\textgreater.

Coexistence Holes Characterize the Assembly and Disassembly of Multispecies Systems (2021)

Marco Tulio Angulo, Aaron Kelley, Luis Montejano, Chuliang Song, Serguei Saavedra

Abstract

A central goal of ecological research has been to understand the limits on the maximum number of species that can coexist under given constraints. However, we know little about the assembly and disassembly processes under which a community can reach such a maximum number, or whether this number is in fact attainable in practice. This limitation is partly due to the challenge of performing experimental work and partly due to the lack of a formalism under which one can systematically study such processes. Here, we introduce a formalism based on algebraic topology and homology theory to study the space of species coexistence formed by a given pool of species. We show that this space is characterized by ubiquitous discontinuities that we call coexistence holes (that is, empty spaces surrounded by filled space). Using theoretical and experimental systems, we provide direct evidence showing that these coexistence holes do not occur arbitrarily—their diversity is constrained by the internal structure of species interactions and their frequency can be explained by the external factors acting on these systems. Our work suggests that the assembly and disassembly of ecological systems is a discontinuous process that tends to obey regularities.

Exploring the Geometry and Topology of Neural Network Loss Landscapes (2022)

Stefan Horoi, Jessie Huang, Bastian Rieck, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy

Abstract

Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have proposed visualizing the loss landscape through the use of simple dimensionality reduction techniques. However, such visualization methods have been limited by their linear nature and only capture features in one or two dimensions, thus restricting sampling of the loss landscape to lines or planes. Here, we expand and improve upon these in three ways. First, we present a novel “jump and retrain” procedure for sampling relevant portions of the loss landscape. We show that the resulting sampled data holds more meaningful information about the network’s ability to generalize. Next, we show that non-linear dimensionality reduction of the jump and retrain trajectories via PHATE, a trajectory and manifold-preserving method, allows us to visualize differences between networks that are generalizing well vs poorly. Finally, we combine PHATE trajectories with a computational homology characterization to quantify trajectory differences.

Persistence-Based Hough Transform for Line Detection (2025)

Johannes Ferner, Stefan Huber, Saverio Messineo, Angel Pop, Martin Uray

Abstract

The Hough transform is a popular and classical technique in computer vision for the detection of lines (or more general objects). It maps a pixel into a dual space -- the Hough space: each pixel is mapped to the set of lines through this pixel, which forms a curve in Hough space. The detection of lines then becomes a voting process to find those lines that received many votes by pixels. However, this voting is done by thresholding, which is susceptible to noise and other artifacts. In this work, we present an alternative voting technique to detect peaks in the Hough space based on persistent homology, which very naturally addresses limitations of simple thresholding. Experiments on synthetic data show that our method significantly outperforms the original method, while also demonstrating enhanced robustness. This work seeks to inspire future research in two key directions. First, we highlight the untapped potential of Topological Data Analysis techniques and advocate for their broader integration into existing methods, including well-established ones. Secondly, we initiate a discussion on the mathematical stability of the Hough transform, encouraging exploration of mathematically grounded improvements to enhance its robustness.

Community Resources

Code

A Klein-Bottle-Based Dictionary for Texture Representation (2014)

Jose A. Perea, Gunnar Carlsson

Abstract

A natural object of study in texture representation and material classification is the probability density function, in pixel-value space, underlying the set of small patches from the given image. Inspired by the fact that small \$\$n\times n\$\$n×nhigh-contrast patches from natural images in gray-scale accumulate with high density around a surface \$\$\fancyscript\K\\subset \\mathbb \R\\\textasciicircum\n\textasciicircum2\\$\$K⊂Rn2with the topology of a Klein bottle (Carlsson et al. International Journal of Computer Vision 76(1):1–12, 2008), we present in this paper a novel framework for the estimation and representation of distributions around \$\$\fancyscript\K\\$\$K, of patches from texture images. More specifically, we show that most \$\$n\times n\$\$n×npatches from a given image can be projected onto \$\$\fancyscript\K\\$\$Kyielding a finite sample \$\$S\subset \fancyscript\K\\$\$S⊂K, whose underlying probability density function can be represented in terms of Fourier-like coefficients, which in turn, can be estimated from \$\$S\$\$S. We show that image rotation acts as a linear transformation at the level of the estimated coefficients, and use this to define a multi-scale rotation-invariant descriptor. We test it by classifying the materials in three popular data sets: The CUReT, UIUCTex and KTH-TIPS texture databases.

Feature Detection and Hypothesis Testing for Extremely Noisy Nanoparticle Images Using Topological Data Analysis (2023)

Andrew M. Thomas, Peter A. Crozier, Yuchen Xu, David S. Matteson

Abstract

We propose a flexible algorithm for feature detection and hypothesis testing in images with ultra-low signal-to-noise ratio using cubical persistent homology. Our main application is in the identification of atomic columns and other features in Transmission Electron Microscopy (TEM). Cubical persistent homology is used to identify local minima and their size in subregions in the frames of nanoparticle videos, which are hypothesized to correspond to relevant atomic features. We compare the performance of our algorithm to other employed methods for the detection of columns and their intensity. Additionally, Monte Carlo goodness-of-fit testing using real-valued summaries of persistence diagrams derived from smoothed images (generated from pixels residing in the vacuum region of an image) is developed and employed to identify whether or not the proposed atomic features generated by our algorithm are due to noise. Using these summaries derived from the generated persistence diagrams, one can produce univariate time series for the nanoparticle videos, thus, providing a means for assessing fluxional behavior. A guarantee on the false discovery rate for multiple Monte Carlo testing of identical hypotheses is also established.

Community Resources

Code
Data

Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

Arjuna P. H. Don, James F. Peters

Abstract

This article introduces an application of Ghrist barcodes in the study of persistent Betti numbers derived from vortex nerve complexes found in triangulations of video frames. A Ghrist barcode (also called a persistence barcode) is a topology of data pic- tograph useful in representing the persistence of the features of changing shapes. The basic approach is to introduce a free Abelian group representation of intersecting filled polygons on the barycenters of the triangles of Alexandroff nerves. An Alexandroff nerve is a maximal collection of triangles of a common vertex in the triangulation of a finite, bounded planar region. In our case, the planar region is a video frame. A Betti number is a count of the number of generators is a finite Abelian group. The focus here is on the persistent Betti numbers across sequences of triangulated video frames. Each Betti number is mapped to an entry in a Ghrist barcode. Two main results are given, namely, vortex nerves are Edelsbrunner-Harer nerve complexes and the Betti number of a vortex nerve equals k + 2 for a vortex nerve containing k edges attached between a pair of vortex cycles in the nerve.

Go With the Flow? A Large-Scale Analysis of Health Care Delivery Networks in the United States Using Hodge Theory (2021)

Thomas Gebhart, Xiaojun Fu, Russell J. Funk

Abstract

Health care delivery is a collaborative process, requiring close coordination among networks of providers with specialized expertise. Yet in the United States, care is often spread across multiple disconnected providers (e.g., primary care physicians, specialists), leading to fragmented care delivery networks, and contributing to higher costs and lower quality. While this problem is well known, there are relatively few quantitative tools available for characterizing the dynamics of care delivery networks at scale, thereby inhibiting deeper understanding of care fragmentation and efforts to address it. In this, study, we conduct a large-scale analysis of care delivery networks across the United States using the discrete Hodge decomposition, an emerging method of topological data analysis. Using this technique, we decompose networks of patient flows among physicians into three orthogonal subspaces: gradient (acyclic flow), harmonic (global cyclic flow), and curl (local cyclic flow). We document substantial variation in the relative importance of each subspace, suggesting that there may be systematic differences in the organization of care delivery networks across health care markets. Moreover, we find that the relative importance of each subspace is predictive of local care cost and quality, with outcomes tending to be better with greater curl flow and worse with greater harmonic flow.

Optimizing Porosity Detection in Wire Laser Metal Deposition Processes Through Data-Driven AI Classification Techniques (2023)

Meritxell Gomez-Omella, Jon Flores, Basilio Sierra, Susana Ferreiro, Nicolas Hascoët, Francisco Chinesta

Abstract

Additive manufacturing (AM) is an attractive solution for many companies that produce geometrically complex parts. This process consists of depositing material layer by layer following a sliced CAD geometry. It brings several benefits to manufacturing capabilities, such as design freedom, reduced material waste, and short-run customization. However, one of the current challenges faced by users of the process, mainly in wire laser metal deposition (wLMD), is to avoid defects in the manufactured part, especially the porosity. This defect is caused by extreme conditions and metallurgical transformations of the process. And not only does it directly affect the mechanical performance of the parts, especially the fatigue properties, but it also means an increase in costs due to the inspection tasks to which the manufactured parts must be subjected. This work compares three operational solution approaches, product-centric, based on signal-based feature extraction and Topological Data Analysis together with statistical and Machine Learning (ML) techniques, for the early detection and prediction of porosity failure in a wLMD process. The different forecasting and validation strategies demonstrate the variety of conclusions that can be drawn with different objectives in the analysis of the monitored data in AM problems.

Hierarchical Structures of Amorphous Solids Characterized by Persistent Homology (2016)

Yasuaki Hiraoka, Takenobu Nakamura, Akihiko Hirata, Emerson G. Escolar, Kaname Matsue, Yasumasa Nishiura

Abstract

This article proposes a topological method that extracts hierarchical structures of various amorphous solids. The method is based on the persistence diagram (PD), a mathematical tool for capturing shapes of multiscale data. The input to the PDs is given by an atomic configuration and the output is expressed as 2D histograms. Then, specific distributions such as curves and islands in the PDs identify meaningful shape characteristics of the atomic configuration. Although the method can be applied to a wide variety of disordered systems, it is applied here to silica glass, the Lennard-Jones system, and Cu-Zr metallic glass as standard examples of continuous random network and random packing structures. In silica glass, the method classified the atomic rings as short-range and medium-range orders and unveiled hierarchical ring structures among them. These detailed geometric characterizations clarified a real space origin of the first sharp diffraction peak and also indicated that PDs contain information on elastic response. Even in the Lennard-Jones system and Cu-Zr metallic glass, the hierarchical structures in the atomic configurations were derived in a similar way using PDs, although the glass structures and properties substantially differ from silica glass. These results suggest that the PDs provide a unified method that extracts greater depth of geometric information in amorphous solids than conventional methods.

The Emergence of Higher-Order Structure in Scientific and Technological Knowledge Networks (2020)

Thomas Gebhart, Russell J. Funk

Abstract

The growth of science and technology is primarily a recombinative process, wherein new discoveries and inventions are generally built from prior knowledge. While the recent past has seen rapid growth in scientific and technological knowledge, relatively little is known about the manner in which science and technology develop and coalesce knowledge into larger structures that enable or constrain future breakthroughs. Network science has recently emerged as a framework for measuring the structure and dynamics of knowledge. While helpful, these existing approaches struggle to capture the global structural properties of the underlying networks, leading to conflicting observations about the nature of scientific and technological progress. We bridge this methodological gap using tools from algebraic topology to characterize the higher-order structure of knowledge networks in science and technology across scale. We observe rapid and varied growth in the high-dimensional structure in many fields of science and technology, and find this high-dimensional growth coincides with decline in lower-dimensional structure. This higher-order growth in knowledge networks has historically far outpaced the growth in scientific and technological collaboration networks. We also characterize the relationship between higher-order structure and the nature of the science and technology produced within these structural environments and find a positive relationship between the abstractness of language used within fields and increasing high-dimensional structure. We also find a robust relationship between high-dimensional structure and number of metrics for publication success, implying this high-dimensional structure may be linked to discovery and invention.

Tracking Resilience to Infections by Mapping Disease Space (2016)

Brenda Y. Torres, Jose Henrique M. Oliveira, Ann Thomas Tate, Poonam Rath, Katherine Cumnock, David S. Schneider

Abstract

Infected hosts differ in their responses to pathogens; some hosts are resilient and recover their original health, whereas others follow a divergent path and die. To quantitate these differences, we propose mapping the routes infected individuals take through “disease space.” We find that when plotting physiological parameters against each other, many pairs have hysteretic relationships that identify the current location of the host and predict the future route of the infection. These maps can readily be constructed from experimental longitudinal data, and we provide two methods to generate the maps from the cross-sectional data that is commonly gathered in field trials. We hypothesize that resilient hosts tend to take small loops through disease space, whereas nonresilient individuals take large loops. We support this hypothesis with experimental data in mice infected with Plasmodium chabaudi, finding that dying mice trace a large arc in red blood cells (RBCs) by reticulocyte space as compared to surviving mice. We find that human malaria patients who are heterozygous for sickle cell hemoglobin occupy a small area of RBCs by reticulocyte space, suggesting this approach can be used to distinguish resilience in human populations. This technique should be broadly useful in describing the in-host dynamics of infections in both model hosts and patients at both population and individual levels.

WDR76 Co-Localizes With Heterochromatin Related Proteins and Rapidly Responds to DNA Damage (2016)

Joshua M. Gilmore, Mihaela E. Sardiu, Brad D. Groppe, Janet L. Thornton, Xingyu Liu, Gerald Dayebgadoh, Charles A. Banks, Brian D. Slaughter, Jay R. Unruh, Jerry L. Workman, Laurence Florens, Michael P. Washburn

Abstract

Proteins that respond to DNA damage play critical roles in normal and diseased states in human biology. Studies have suggested that the S. cerevisiae protein CMR1/YDL156w is associated with histones and is possibly associated with DNA repair and replication processes. Through a quantitative proteomic analysis of affinity purifications here we show that the human homologue of this protein, WDR76, shares multiple protein associations with the histones H2A, H2B, and H4. Furthermore, our quantitative proteomic analysis of WDR76 associated proteins demonstrated links to proteins in the DNA damage response like PARP1 and XRCC5 and heterochromatin related proteins like CBX1, CBX3, and CBX5. Co-immunoprecipitation studies validated these interactions. Next, quantitative imaging studies demonstrated that WDR76 was recruited to laser induced DNA damage immediately after induction, and we compared the recruitment of WDR76 to laser induced DNA damage to known DNA damage proteins like PARP1, XRCC5, and RPA1. In addition, WDR76 co-localizes to puncta with the heterochromatin proteins CBX1 and CBX5, which are also recruited to DNA damage but much less intensely than WDR76. This work demonstrates the chromatin and DNA damage protein associations of WDR76 and demonstrates the rapid response of WDR76 to laser induced DNA damage.

Community Resources

Data (task=2ceca5bf2466476fbb3c652f46b90854)

Community Structures in Simplicial Complexes: An Application to Wildlife Corridor Designing in Central India -- Eastern Ghats Landscape Complex, India (2020)

Saurabh Shanu, Shashankaditya Upadhyay, Arijit Roy, Raghunandan Chundawat, Sudeepto Bhattacharya

Abstract

The concept of simplicial complex from Algebraic Topology is applied to understand and model the flow of genetic information, processes and organisms between the areas of unimpaired habitats to design a network of wildlife corridors for Tigers (Panthera Tigris Tigris) in Central India Eastern Ghats landscape complex. The work extends and improves on a previous work that has made use of the concept of minimum spanning tree obtained from the weighted graph in the focal landscape, which suggested a viable corridor network for the tiger population of the Protected Areas (PAs) in the landscape complex. Centralities of the network identify the habitat patches and the critical parameters that are central to the process of tiger movement across the network. We extend the concept of vertex centrality to that of the simplicial centrality yielding inter-vertices adjacency and connection. As a result, the ecological information propagates expeditiously and even on a local scale in these networks representing a well-integrated and self-explanatory model as a community structure. A simplicial complex network based on the network centralities calculated in the landscape matrix presents a tiger corridor network in the landscape complex that is proposed to correspond better to reality than the previously proposed model. Because of the aforementioned functional and structural properties of the network, the work proposes an ecological network of corridors for the most tenable usage by the tiger populations both in the PAs and outside the PAs in the focal landscape.

TILT: Topological Interface Recovery in Limited-Angle Tomography (2024)

Elli Karvonen, Matti Lassas, Pekka Pankka, Samuli Siltanen

Abstract

A wavelet-based sparsity-promoting reconstruction method is studied in the context of tomography with severely limited projection data. Such imaging problems are ill-posed inverse problems, or very sensitive to measurement and modeling errors. The reconstruction method is based on minimizing a sum of a data discrepancy term based on an \$\ell\textasciicircum2\$-norm and another term containing an \$\ell\textasciicircum1\$-norm of a wavelet coefficient vector. Depending on the viewpoint, the method can be considered (i) as finding the Bayesian maximum a posteriori (MAP) estimate using a Besov-space \$B_\11\\textasciicircum\1$\\mathbb T\\textasciicircum\2$\$ prior, or (ii) as deterministic regularization with a Besov-norm penalty. The minimization is performed using a tailored primal-dual path following interior-point method, which is applicable to problems larger in scale than commercially available general-purpose optimization package algorithms. The choice of “regularization parameter” is done by a novel technique called the S-curve method, which can be used to incorporate a priori information on the sparsity of the unknown target to the reconstruction process. Numerical results are presented, focusing on uniformly sampled sparse-angle data. Both simulated and measured data are considered, and noise-robust and edge-preserving multiresolution reconstructions are achieved. In sparse-angle cases with simulated data the proposed method offers a significant improvement in reconstruction quality (measured in relative square norm error) over filtered back-projection (FBP) and Tikhonov regularization.

Community Resources

Code

The (Homological) Persistence of Gerrymandering (2021)

Moon Duchin, Tom Needham, Thomas Weighill

Abstract

\textlessp style='text-indent:20px;'\textgreaterWe apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. We begin by combining geographic and electoral data from a districting plan to produce a persistence diagram. Then, to see beyond a particular plan and understand the possibilities afforded by the choices made in redistricting, we build methods to visualize and analyze large ensembles of alternative plans. Our detailed case studies use zero-dimensional homology (persistent components) of filtered graphs constructed from voting data to analyze redistricting in Pennsylvania and North Carolina. We find that, across large ensembles of partitions, the features cluster in the persistence diagrams in a way that corresponds strongly to geographic location, so that we can construct an average diagram for an ensemble, with each point identified with a geographical region. Using this localization lets us produce zonings of each state at Congressional, state Senate, and state House scales, show the regional non-uniformity of election shifts, and identify attributes of partitions that tend to correspond to partisan advantage.\textless/p\textgreater\textlessp style='text-indent:20px;'\textgreaterThe methods here are set up to be broadly applicable to the use of TDA on large ensembles of data. Many studies will benefit from interpretable summaries of large sets of samples or simulations, and the work here on localization and zoning will readily generalize to other partition problems, which are abundant in scientific applications. For the mathematically and politically rich problem of redistricting in particular, TDA provides a powerful and elegant summarization tool whose findings will be useful for practitioners.\textless/p\textgreater

Using Persistent Homology as Preprocessing of Early Warning Signals for Critical Transition in Flood (2021)

Syed Mohamad Sadiq Syed Musa, Mohd Salmi Md Noorani, Fatimah Abdul Razak, Munira Ismail, Mohd Almie Alias, Saiful Izzuan Hussain

Abstract

Flood early warning systems (FLEWSs) contribute remarkably to reducing economic and life losses during a flood. The theory of critical slowing down (CSD) has been successfully used as a generic indicator of early warning signals in various fields. A new tool called persistent homology (PH) was recently introduced for data analysis. PH employs a qualitative approach to assess a data set and provide new information on the topological features of the data set. In the present paper, we propose the use of PH as a preprocessing step to achieve a FLEWS through CSD. We test our proposal on water level data of the Kelantan River, which tends to flood nearly every year. The results suggest that the new information obtained by PH exhibits CSD and, therefore, can be used as a signal for a FLEWS. Further analysis of the signal, we manage to establish an early warning signal for ten of the twelve flood events recorded in the river; the two other events are detected on the first day of the flood. Finally, we compare our results with those of a FLEWS constructed directly from water level data and find that FLEWS via PH creates fewer false alarms than the conventional technique.

Fast and Accurate Tumor Segmentation of Histology Images Using Persistent Homology and Deep Convolutional Features (2019)

Talha Qaiser, Yee-Wah Tsang, Daiki Taniyama, Naoya Sakamoto, Kazuaki Nakane, David Epstein, Nasir Rajpoot

Abstract

Tumor segmentation in whole-slide images of histology slides is an important step towards computer-assisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. We propose an efficient way of computing topological persistence of an image, alternative to simplicial homology. The PHPs are devised to distinguish tumor regions from their normal counterparts by modeling the atypical characteristics of tumor nuclei. We propose two variants of our method for tumor segmentation: one that targets speed without compromising accuracy and the other that targets higher accuracy. The fast version is based on a selection of exemplar image patches from a convolution neural network (CNN) and patch classification by quantifying the divergence between the PHPs of exemplars and the input image patch. Detailed comparative evaluation shows that the proposed algorithm is significantly faster than competing algorithms while achieving comparable results. The accurate version combines the PHPs and high-level CNN features and employs a multi-stage ensemble strategy for image patch labeling. Experimental results demonstrate that the combination of PHPs and CNN features outperform competing algorithms. This study is performed on two independently collected colorectal datasets containing adenoma, adenocarcinoma, signet, and healthy cases. Collectively, the accurate tumor segmentation produces the highest average patch-level F1-score, as compared with competing algorithms, on malignant and healthy cases from both the datasets. Overall the proposed framework highlights the utility of persistent homology for histopathology image analysis.

Understanding Diffraction Patterns of Glassy, Liquid and Amorphous Materials via Persistent Homology Analyses (2019)

Yohei Onodera, Shinji Kohara, Shuta Tahara, Atsunobu Masuno, Hiroyuki Inoue, Motoki Shiga, Akihiko Hirata, Koichi Tsuchiya, Yasuaki Hiraoka, Ippei Obayashi, Koji Ohara, Akitoshi Mizuno, Osami Sakata

Abstract

The structure of glassy, liquid, and amorphous materials is still not well understood, due to the insufficient structural information from diffraction data. In this article, attempts are made to understand the origin of diffraction peaks, particularly of the first sharp diffraction peak (FSDP, Q1), the principal peak (PP, Q2), and the third peak (Q3), observed in the measured diffraction patterns of disordered materials whose structure contains tetrahedral motifs. It is confirmed that the FSDP (Q1) is not a signature of the formation of a network, because an FSDP is observed in tetrahedral molecular liquids. It is found that the PP (Q2) reflects orientational correlations of tetrahedra. Q3, that can be observed in all disordered materials, even in common liquid metals, stems from simple pair correlations. Moreover, information on the topology of disordered materials was revealed by utilizing persistent homology analyses. The persistence diagram of silica (SiO2) glass suggests that the shape of rings in the glass is similar not only to those in the crystalline phase with comparable density (α-cristobalite), but also to rings present in crystalline phases with higher density (α-quartz and coesite); this is thought to be the signature of disorder. Furthermore, we have succeeded in revealing the differences, in terms of persistent homology, between tetrahedral networks and tetrahedral molecular liquids, and the difference/similarity between liquid and amorphous (glassy) states. Our series of analyses demonstrated that a combination of diffraction data and persistent homology analyses is a useful tool for allowing us to uncover structural features hidden in halo pattern of disordered materials.

Applications of Persistent Homology to Time Varying Systems (2013)

Elizabeth Munch

Abstract

\textlessp\textgreaterThis dissertation extends the theory of persistent homology to time varying systems. Most of the previous work has been dedicated to using this powerful tool in topological data analysis to study static point clouds. In particular, given a point cloud, we can construct its persistence diagram. Since the diagram varies continuously as the point cloud varies continuously, we study the space of time varying persistence diagrams, called vineyards when they were introduced by Cohen-Steiner, Edelsbrunner, and Morozov.\textless/p\textgreater\textlessp\textgreaterWe will first show that with a good choice of metric, these vineyards are stable for small perturbations of their associated point clouds. We will also define a new mean for a set of persistence diagrams based on the work of Mileyko et al. which, unlike the previously defined mean, is continuous for geodesic vineyards. \textless/p\textgreater\textlessp\textgreaterNext, we study the sensor network problem posed by Ghrist and de Silva, and their application of persistent homology to understand when a set of sensors covers a given region. Giving each of these sensors a probability of failure over time, we show that an exact computation of the probability of failure of the whole system is NP-hard, but give an algorithm which can predict failure in the case of a monitored system.\textless/p\textgreater\textlessp\textgreaterFinally, we apply these methods to an automated system which can cluster agents moving in aerial images by their behaviors. We build a data structure for storing and querying the information in real-time, and define behavior vectors which quantify behaviors of interest. This clustering by behavior can be used to find groups of interest, for which we can also quantify behaviors in order to determine whether the group is working together to achieve a common goal, and we speculate that this work can be extended to improving tracking algorithms as well as behavioral predictors.\textless/p\textgreater

Continuous Indexing of Fibrosis (CIF): Improving the Assessment and Classification of MPN Patients (2022)

Hosuk Ryou, Korsuk Sirinukunwattana, Alan Aberdeen, Gillian Grindstaff, Bernadette Stolz, Helen Byrne, Heather A. Harrington, Nikolaos Sousos, Anna L. Godfrey, Claire N. Harrison, Bethan Psaila, Adam J. Mead, Gabrielle Rees, Gareth D. H. Turner, Jens Rittscher, Daniel Royston

Abstract

The detection and grading of fibrosis in myeloproliferative neoplasms (MPN) is an important component of disease classification, prognostication and disease monitoring. However, current fibrosis grading systems are only semi-quantitative and fail to capture sample heterogeneity. To improve the detection, quantitation and representation of reticulin fibrosis, we developed a machine learning (ML) approach using bone marrow trephine (BMT) samples (n = 107) from patients diagnosed with MPN or a reactive / nonneoplastic marrow. The resulting Continuous Indexing of Fibrosis (CIF) enhances the detection and monitoring of fibrosis within BMTs, and aids the discrimination of MPN subtypes. When combined with megakaryocyte feature analysis, CIF discriminates between the frequently challenging differential diagnosis of essential thrombocythemia (ET) and pre-fibrotic myelofibrosis (pre-PMF) with high predictive accuracy [area under the curve = 0.94]. CIF also shows significant promise in the identification of MPN patients at risk of disease progression; analysis of samples from 35 patients diagnosed with ET and enrolled in the Primary Thrombocythemia-1 (PT-1) trial identified features predictive of post-ET myelofibrosis (area under the curve = 0.77). In addition to these clinical applications, automated analysis of fibrosis has clear potential to further refine disease classification boundaries and inform future studies of the micro-environmental factors driving disease initiation and progression in MPN and other stem cell disorders. The image analysis methods used to generate CIF can be readily integrated with those of other key morphological features in MPNs, including megakaryocyte morphology, that lie beyond the scope of conventional histological assessment. Key PointsMachine learning enables an objective and quantitative description of reticulin fibrosis within the bone marrow of patients with myeloproliferative neoplasms (MPN),Automated analysis and Continuous Indexing of Fibrosis (CIF) captures heterogeneity within MPN samples and has utility in refined classification and disease monitoringQuantitative fibrosis assessment combined with topological data analysis may help to predict patients at increased risk of progression to post-ET myelofibrosis, and assist in the discrimination of ET and pre-fibrotic PMF (pre-PMF)

Transfer Learning for Autonomous Chatter Detection in Machining (2022)

Melih C. Yesilli, Firas A. Khasawneh, Brian P. Mann

Abstract

Large-amplitude chatter vibrations are one of the most important phenomena in machining processes. It is often detrimental in cutting operations causing a poor surface finish and decreased tool life. Therefore, chatter detection using machine learning has been an active research area over the last decade. Three challenges can be identified in applying machine learning for chatter detection at large in industry: an insufficient understanding of the universality of chatter features across different processes, the need for automating feature extraction, and the existence of limited data for each specific workpiece-machine tool combination, e.g., when machining one-off products. These three challenges can be grouped under the umbrella of transfer learning, which is concerned with studying how knowledge gained from one setting can be leveraged to obtain information in new settings. This paper studies automating chatter detection by evaluating transfer learning of prominent as well as novel chatter detection methods. We investigate chatter classification accuracy using a variety of features extracted from turning and milling experiments with different cutting configurations. The studied methods include Fast Fourier Transform (FFT), Power Spectral Density (PSD), the Auto-correlation Function (ACF), and decomposition based tools such as Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We also examine more recent approaches based on Topological Data Analysis (TDA) and similarity measures of time series based on Discrete Time Warping (DTW). We evaluate transfer learning potential of each approach by training and testing both within and across the turning and milling data sets. Four supervised classification algorithms are explored: support vector machine (SVM), logistic regression, random forest classification, and gradient boosting. In addition to accuracy, we also comment on the automation potential of feature extraction for each approach which is integral to creating autonomous manufacturing centers. Our results show that carefully chosen time-frequency features can lead to high classification accuracies albeit at the cost of requiring manual pre-processing and the tagging of an expert user. On the other hand, we found that the TDA and DTW approaches can provide accuracies and F1-scores on par with the time-frequency methods without the need for manual preprocessing via completely automatic pipelines. Further, we discovered that the DTW approach outperforms all other methods when trained using the milling data and tested on the turning data. Therefore, TDA and DTW approaches may be preferred over the time-frequency-based approaches for fully automated chatter detection schemes. DTW and TDA also can be more advantageous when pooling data from either limited workpiece-machine tool combinations, or from small data sets of one-off processes.

Blind Swarms for Coverage in 2-D (2005)

V. D. Silva, R. Ghrist, A. Muhammad

Abstract

We consider coverage problems in robot sensor networks with minimal sensing capabilities. In particular, we demonstrate that a “blind” swarm of robots with no localization and only a weak form of distance estimation can rigorously determine coverage in a bounded planar domain of unknown size and shape. The methods we introduce come from algebraic topology. I. COVERAGE PROBLEMS Many of the potential applications of robot swarms require information about coverage in a given domain. For example, using a swarm of robot sensors for surveillance and security applications carries with it the charge to maximize, or, preferably, guarantee coverage. Such applications include networks of security cameras, mine field sweeping via networked robots [18], and oceanographic sampling [4]. In these contexts, each robot has some coverage domain, and one wishes to know about the union of these coverage domains. Such problems are also crucial in applications not involving robots directly, e.g., communication networks. As a preliminary analysis, we consider the static “field” coverage problem, in which robots are assumed stationary and the goal is to verify blanket coverage of a given domain. There is a large literature on this subject; see, e.g., [7], [1], [16]. In addition, there are variants on these problems involving “barrier” coverage to separate regions. Dynamic or “sweeping” coverage [3] is a common and challenging task with applications ranging from security to vacuuming. Although a sensor network composed of robots will have dynamic capabilities, we restrict attention in this brief paper to the static case in order to lay the groundwork for future inquiry. There are two primary approaches to static coverage problems in the literature. The first uses computational geometry tools applied to exact node coordinates. This typically involves ‘ruler-and-compass’ style geometry [10] or Delaunay triangulations of the domain [16], [14], [20]. Such approaches are very rigid with regards to inputs: one must know exact node coordinates and one must know the geometry of the domain precisely to determine the Delaunay complex. To alleviate the former requirement, many authors have turned to probabilistic tools. For example, in [13], the author assumes a randomly and uniformly distributed collection of nodes in a domain with a fixed geometry and proves expected area coverage. Other approaches [15], [19] give percolationtype results about coverage and network integrity for randomly distributed nodes. The drawback of these methods is the need for strong assumptions about the exact shape of the domain, as well as the need for a uniform distribution of nodes. In the sensor networks community, there is a compelling interest (and corresponding burgeoning literature) in determining properties of a network in which the nodes do not possess coordinate data. One example of a coordinate-free approach is in [17], which gives a heuristic method for geographic routing without coordinate data: among the large literature arising from this paper, we note in particular the mathematical analysis of this approach in [11]. To our knowledge, noone has treated the coverage problem in a coordinate-free setting. In this note, we introduce a new set of tools for answering coverage problems in robotics and sensor networks with minimal assumptions about domain geometry and node localization. We provide a sufficiency criterion for coverage. We do not answer the problem of how the nodes should be placed in order to maximize coverage, nor the minimum number of such nodes necessary; neither do we address how to reallocate nodes to fill coverage holes.

🍩 Database of Original & Non-Theoretical Uses of Topology

Topic Detection in Twitter Using Topology Data Analysis (2015)

Persistent Topology for Cryo-Em Data Analysis (2015)

A Topological Framework for Deep Learning (2020)

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

A Novel Method of Extracting Topological Features From Word Embeddings (2020)

Multidimensional Persistence in Biomolecular Data (2015)

Finite Topology as Applied to Image Analysis (1989)

Topology-Aware Segmentation Using Discrete Morse Theory (2021)

Topological Data Analysis of Biological Aggregation Models (2015)

Topological Regularization for Dense Prediction (2021)

TopoGAN: A Topology-Aware Generative Adversarial Network (2020)

Community Resources

Tenfold Topology of Crystals (2020)

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

Topological Portraits of Multiscale Coordination Dynamics (2020)

Evolutionary Homology on Coupled Dynamical Systems With Applications to Protein Flexibility Analysis (2020)

Bayesian Computation Meets Topology (2024)

From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives (2020)

Topological Data Analysis for Aviation Applications (2019)

Topological Data Analysis for Genomics and Evolution: Topology in Biology (2019)

Topological Machine Learning for Mixed Numeric and Categorical Data (2020)

Topological Detection of Alzheimer’s Disease Using Betti Curves (2021)

Community Resources

Atom-Specific Persistent Homology and Its Application to Protein Flexibility Analysis (2020)

Reviews: Topological Distances and Losses for Brain Networks (2021)

The Importance of the Whole: Topological Data Analysis for the Network Neuroscientist (2019)

Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

Topological Electronic Structure and Weyl Points in Nonsymmorphic Hexagonal Materials (2020)

Topological Edge Modes by Smart Patterning (2018)

Localization in the Crowd With Topological Constraints (2020)

Topological Autoencoders (2020)

Multivariate Data Analysis Using Persistence-Based Filtering and Topological Signatures (2012)

Persistent Homology Analysis of Protein Structure, Flexibility, and Folding (2014)

Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)

Topological Electronic Structure and Weyl Points in Nonsymmorphic Hexagonal Materials (2020)

Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)

Topological Early Warning Signals: Quantifying Varying Routes to Extinction in a Spatially Distributed Population Model (2022)

Analyzing Collective Motion With Machine Learning and Topology (2019)

Classification of Histopathology Slides With Persistence Homology Convolutions (2025)

Community Resources

Classification of Histopathology Slides With Persistence Homology Convolutions (2025)

Community Resources

The Growing Topology of the C. Elegans Connectome (2020)

Time-Inhomogeneous Diffusion Geometry and Topology (2022)

Topology Identifies Emerging Adaptive Mutations in SARS-CoV-2 (2021)

Community Resources

Capturing Dynamics of Time-Varying Data via Topology (2020)

Capturing Shape Information With Multi-Scale Topological Loss Terms For 3D Reconstruction (2022)

Histopathological Cancer Detection With Topological Signatures (2023)

Community Resources

Optimal Topological Cycles and Their Application in Cardiac Trabeculae Restoration (2017)

Measuring Hidden Phenotype: Quantifying the Shape of Barley Seeds Using the Euler Characteristic Transform (2021)

Feasibility of Topological Data Analysis for Event-Related fMRI (2019)

A Topological Measurement of Protein Compressibility (2015)

Barcodes Distinguishing Morphology of Neuronal Tauopathy (2022)

Topological Descriptors of Histology Images (2014)

Substructure Topology Preserving Simplification of Tetrahedral Meshes (2011)

A Topological Paradigm for Hippocampal Spatial Map Formation Using Persistent Homology (2012)

Topological Analysis of Population Activity in Visual Cortex (2008)

Topological Attention for Time Series Forecasting (2021)

Cooperative Grasping Through Topological Object Representation (2014)

Persistent Homology of the Cosmic Web. I: Hierarchical Topology in \$\Lambda\$CDM Cosmologies (2021)

What Can Topology Tell Us About the Neural Code? (2017)

Toward Automated Prediction of Manufacturing Productivity Based on Feature Selection Using Topological Data Analysis (2016)

A Topological Data Analysis Approach On Predicting Phenotypes From Gene Expression Data (2020)

Topological Differential Testing (2020)

Combining Geometric and Topological Information in Image Segmentation (2019)

Spatial Applications of Topological Data Analysis: Cities, Snowflakes, Random Structures, and Spiders Spinning Under the Influence (2020)

A Topological Approach to Selecting Models of Biological Experiments (2019)

Topological Data Analysis of Single-Trial Electroencephalographic Signals (2018)

Current Theoretical Models Fail to Predict the Topological Complexity of the Human Genome (2015)

Loops Abound in the Cosmic Microwave Background: A \$4\sigma\$ Anomaly on Super-Horizon Scales (2021)

Topology of Frame Field Meshing (2020)

Testing Topological Data Analysis for Condition Monitoring of Wind Turbines (2024)

Topological Data Analysis and Diagnostics of Compressible Magnetohydrodynamic Turbulence (2018)

Topological Persistence for Relating Microstructure and Capillary Fluid Trapping in Sandstones (2019)

Topologically Densified Distributions (2020)

Persistent Homology Analysis of Ion Aggregations and Hydrogen-Bonding Networks (2018)

Euler Characteristic Surfaces (2021)