🍩 Database of Original & Non-Theoretical Uses of Topology

(found 329 matches in 0.026649s)
  1. Induction Motor Eccentricity Fault Detection and Quantification Using Topological Data Analysis (2024)

    Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru
    Abstract In this paper, we propose a topological data analysis (TDA) method for the processing of induction motor stator current data, and apply it to the detection and quantification of eccentricity faults. Traditionally, physics-based models and involved signal processing techniques are required to identify and extract the subtle frequency components in current data related to a particular fault. We show that TDA offers an alternative way to extract fault related features, and effectively distinguish data from different fault conditions. We will introduce TDA method and the procedure of extracting topological features from time-domain data, and apply it to induction motor current data measured under different eccentricity fault conditions. We show that while the raw time-domain data are very challenging to distinguish, the extracted topological features from these data are distinct and highly associated with eccentricity fault level. With TDA processed data, we can effectively train machine learning models to predict fault levels with good accuracy, even for new data from eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of data-driven fault detection applications.
  2. Statistical Inference for Persistent Homology Applied to Simulated fMRI Time Series Data (2023)

    Hassan Abdallah, Adam Regalski, Mohammad Behzad Kang, Maria Berishaj, Nkechi Nnadi, Asadur Chowdury, Vaibhav A. Diwadkar, Andrew Salch
    Abstract Time-series data are amongst the most widely-used in biomedical sciences, including domains such as functional Magnetic Resonance Imaging (fMRI). Structure within time series data can be captured by the tools of topological data analysis (TDA). Persistent homology is the mostly commonly used data-analytic tool in TDA, and can effectively summarize complex high-dimensional data into an interpretable 2-dimensional representation called a persistence diagram. Existing methods for statistical inference for persistent homology of data depend on an independence assumption being satisfied. While persistent homology can be computed for each time index in a time-series, time-series data often fail to satisfy the independence assumption. This paper develops a statistical test that obviates the independence assumption by implementing a multi-level block sampled Monte Carlo test with sets of persistence diagrams. Its efficacy for detecting task-dependent topological organization is then demonstrated on simulated fMRI data. This new statistical test is therefore suitable for analyzing persistent homology of fMRI data, and of non-independent data in general.
  3. A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science (2023)

    Lander Ver Hoef, Henry Adams, Emily J. King, Imme Ebert-Uphoff
    Abstract Abstract Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code. Significance Statement Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.
  4. Feature Detection and Hypothesis Testing for Extremely Noisy Nanoparticle Images Using Topological Data Analysis (2023)

    Andrew M. Thomas, Peter A. Crozier, Yuchen Xu, David S. Matteson
    Abstract We propose a flexible algorithm for feature detection and hypothesis testing in images with ultra-low signal-to-noise ratio using cubical persistent homology. Our main application is in the identification of atomic columns and other features in Transmission Electron Microscopy (TEM). Cubical persistent homology is used to identify local minima and their size in subregions in the frames of nanoparticle videos, which are hypothesized to correspond to relevant atomic features. We compare the performance of our algorithm to other employed methods for the detection of columns and their intensity. Additionally, Monte Carlo goodness-of-fit testing using real-valued summaries of persistence diagrams derived from smoothed images (generated from pixels residing in the vacuum region of an image) is developed and employed to identify whether or not the proposed atomic features generated by our algorithm are due to noise. Using these summaries derived from the generated persistence diagrams, one can produce univariate time series for the nanoparticle videos, thus, providing a means for assessing fluxional behavior. A guarantee on the false discovery rate for multiple Monte Carlo testing of identical hypotheses is also established.

    Community Resources

  5. Pattern Characterization Using Topological Data Analysis: Application to Piezo Vibration Striking Treatment (2023)

    Max M. Chumley, Melih C. Yesilli, Jisheng Chen, Firas A. Khasawneh, Yang Guo
    Abstract Quantifying patterns in visual or tactile textures provides important information about the process or phenomena that generated these patterns. In manufacturing, these patterns can be intentionally introduced as a design feature, or they can be a byproduct of a specific process. Since surface texture has significant impact on the mechanical properties and the longevity of the workpiece, it is important to develop tools for quantifying surface patterns and, when applicable, comparing them to their nominal counterparts. While existing tools may be able to indicate the existence of a pattern, they typically do not provide more information about the pattern structure, or how much it deviates from a nominal pattern. Further, prior works do not provide automatic or algorithmic approaches for quantifying other pattern characteristics such as depths’ consistency, and variations in the pattern motifs at different level sets. This paper leverages persistent homology from Topological Data Analysis (TDA) to derive noise-robust scores for quantifying motifs’ depth and roundness in a pattern. Specifically, sublevel persistence is used to derive scores that quantify the consistency of indentation depths at any level set in Piezo Vibration Striking Treatment (PVST) surfaces. Moreover, we combine sublevel persistence with the distance transform to quantify the consistency of the indentation radii, and to compare them with the nominal ones. Although the tool in our PVST experiments had a semi-spherical profile, we present a generalization of our approach to tools/motifs of arbitrary shapes thus making our method applicable to other pattern-generating manufacturing processes.
  6. Efficient Planning of Multi-Robot Collective Transport Using Graph Reinforcement Learning With Higher Order Topological Abstraction (2023)

    Steve Paul, Wenyuan Li, Brian Smyth, Yuzhou Chen, Yulia Gel, Souma Chowdhury
    Abstract Efficient multi-robot task allocation (MRTA) is fundamental to various time-sensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call MRTA-collective transport or MRTA-CT - here tasks present varying workloads and deadlines, and robots are subject to flight range, communication range, and payload constraints. For large instances of these problems involving 100s-1000's of tasks and 10s-100s of robots, traditional non-learning solvers are often time-inefficient, and emerging learning-based policies do not scale well to larger-sized problems without costly retraining. To address this gap, we use a recently proposed encoder-decoder graph neural network involving Capsule networks and multi-head attention mechanism, and innovatively add topological descriptors (TD) as new features to improve transferability to unseen problems of similar and larger size. Persistent homology is used to derive the TD, and proximal policy optimization is used to train our TD-augmented graph neural network. The resulting policy model compares favorably to state-of-the-art non-learning baselines while being much faster. The benefit of using TD is readily evident when scaling to test problems of size larger than those used in training.
  7. A Novel Approach for Wafer Defect Pattern Classification Based on Topological Data Analysis (2023)

    Seungchan Ko, Dowan Koo
    Abstract In semiconductor manufacturing, wafer map defect pattern provides critical information for facility maintenance and yield management, so the classification of defect patterns is one of the most important tasks in the manufacturing process. In this paper, we propose a novel way to represent the shape of the defect pattern as a finite-dimensional vector, which will be used as an input for a neural network algorithm for classification. The main idea is to extract the topological features of each pattern by using the theory of persistent homology from topological data analysis (TDA). Through some experiments with a simulated dataset, we show that the proposed method is faster and much more efficient in training with higher accuracy, compared with the method using convolutional neural networks (CNN) which is the most common approach for wafer map defect pattern classification. Moreover, it was shown that our method outperforms the CNN-based method when the number of training data is not enough and is imbalanced.
  8. Optimizing Porosity Detection in Wire Laser Metal Deposition Processes Through Data-Driven AI Classification Techniques (2023)

    Meritxell Gomez-Omella, Jon Flores, Basilio Sierra, Susana Ferreiro, Nicolas Hascoët, Francisco Chinesta
    Abstract Additive manufacturing (AM) is an attractive solution for many companies that produce geometrically complex parts. This process consists of depositing material layer by layer following a sliced CAD geometry. It brings several benefits to manufacturing capabilities, such as design freedom, reduced material waste, and short-run customization. However, one of the current challenges faced by users of the process, mainly in wire laser metal deposition (wLMD), is to avoid defects in the manufactured part, especially the porosity. This defect is caused by extreme conditions and metallurgical transformations of the process. And not only does it directly affect the mechanical performance of the parts, especially the fatigue properties, but it also means an increase in costs due to the inspection tasks to which the manufactured parts must be subjected. This work compares three operational solution approaches, product-centric, based on signal-based feature extraction and Topological Data Analysis together with statistical and Machine Learning (ML) techniques, for the early detection and prediction of porosity failure in a wLMD process. The different forecasting and validation strategies demonstrate the variety of conclusions that can be drawn with different objectives in the analysis of the monitored data in AM problems.
  9. Relational Persistent Homology for Multispecies Data With Application to the Tumor Microenvironment (2023)

    Bernadette J. Stolz, Jagdeep Dhesi, Joshua A. Bull, Heather A. Harrington, Helen M. Byrne, Iris H. R. Yoon
    Abstract Topological data analysis (TDA) is an active field of mathematics for quantifying shape in complex data. Standard methods in TDA such as persistent homology (PH) are typically focused on the analysis of data consisting of a single entity (e.g., cells or molecular species). However, state-of-the-art data collection techniques now generate exquisitely detailed multispecies data, prompting a need for methods that can examine and quantify the relations among them. Such heterogeneous data types arise in many contexts, ranging from biomedical imaging, geospatial analysis, to species ecology. Here, we propose two methods for encoding spatial relations among different data types that are based on Dowker complexes and Witness complexes. We apply the methods to synthetic multispecies data of a tumor microenvironment and analyze topological features that capture relations between different cell types, e.g., blood vessels, macrophages, tumor cells, and necrotic cells. We demonstrate that relational topological features can extract biological insight, including the dominant immune cell phenotype (an important predictor of patient prognosis) and the parameter regimes of a data-generating model. The methods provide a quantitative perspective on the relational analysis of multispecies spatial data, overcome the limits of traditional PH, and are readily computable.
  10. Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)

    Rayna Andreeva, Anwesha Sarkar, Rik Sarkar
    Abstract The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics.
  11. Topological Singularity Detection at Multiple Scales (2023)

    Julius von Rohrscheidt, Bastian Rieck
    Abstract The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.
  12. Practical Joint Human-Machine Exploration of Industrial Time Series Using the Matrix Profile (2023)

    Felix Nilsson, Mohamed-Rafik Bouguelia, Thorsteinn Rögnvaldsson
    Abstract Technological advancements and widespread adaptation of new technology in industry have made industrial time series data more available than ever before. With this development grows the need for versatile methods for mining industrial time series data. This paper introduces a practical approach for joint human-machine exploration of industrial time series data using the Matrix Profile, and presents some challenges involved. The approach is demonstrated on three real-life industrial data sets to show how it enables the user to quickly extract semantic information, detect cycles, find deviating patterns, and gain a deeper understanding of the time series. A benchmark test is also presented on ECG (electrocardiogram) data, showing that the approach works well in comparison to previously suggested methods for extracting relevant time series motifs.
  13. Motor Eccentricity Fault Detection: Physics-Based and Data-Driven Approaches (2023)

    Bingnan Wang, Hiroshi Inoue, Makoto Kanemaru
    Abstract Fault detection using motor current signature analysis (MCSA) is attractive for industrial applications due to its simplicity with no additional sensor installation required. However current components associated with faults are often very subtle and much smaller than the supply frequency component, making it challenging to detect and quantify fault levels. In this paper, we present our work on quantitative eccentricity fault diagnosis technologies for electric motors, including physical-model approach using improved winding function theory, which can simulate motor dynamics under faulty conditions and agrees well with experiment data, and data-driven approach using topological data analysis (TDA), which can effectively differentiate signals measured at different eccentricity levels. The advantages and limitations of each approach is discussed. Both methods can be extended to the detection and quantification of other types of electric motor faults.
  14. Topological Data Analysis of Spatial Patterning in Heterogeneous Cell Populations: Clustering and Sorting With Varying Cell-Cell Adhesion (2023)

    Dhananjay Bhaskar, William Y. Zhang, Alexandria Volkening, Björn Sandstede, Ian Y. Wong
    Abstract Different cell types aggregate and sort into hierarchical architectures during the formation of animal tissues. The resulting spatial organization depends (in part) on the strength of adhesion of one cell type to itself relative to other cell types. However, automated and unsupervised classification of these multicellular spatial patterns remains challenging, particularly given their structural diversity and biological variability. Recent developments based on topological data analysis are intriguing to reveal similarities in tissue architecture, but these methods remain computationally expensive. In this article, we show that multicellular patterns organized from two interacting cell types can be efficiently represented through persistence images. Our optimized combination of dimensionality reduction via autoencoders, combined with hierarchical clustering, achieved high classification accuracy for simulations with constant cell numbers. We further demonstrate that persistence images can be normalized to improve classification for simulations with varying cell numbers due to proliferation. Finally, we systematically consider the importance of incorporating different topological features as well as information about each cell type to improve classification accuracy. We envision that topological machine learning based on persistence images will enable versatile and robust classification of complex tissue architectures that occur in development and disease.
  15. Diverse 3D Cellular Patterns Underlie the Development of Cardamine Hirsuta and Arabidopsis Thaliana Ovules (2023)

    Tejasvinee Atul Mody, Alexander Rolle, Nico Stucki, Fabian Roll, Ulrich Bauer, Kay Schneitz
    Abstract A fundamental question in biology is how organ morphogenesis comes about. The ovules of Arabidopsis thaliana have been established as a successful model to study numerous aspects of tissue morphogenesis; however, little is known regarding the relative contributions and dynamics of differential tissue and cellular growth and architecture in establishing ovule morphogenesis in different species. To address this issue, we generated a 3D digital atlas of Cardamine hirsuta ovule development with full cellular resolution. We combined quantitative comparative morphometrics and topological analysis to explore similarities and differences in the 3D cellular architectures underlying ovule development of the two species. We discovered that they show diversity in the way the three radial cell layers of the primordium contribute to its growth, in the formation of a new cell layer in the inner integument and, in certain cases, in the topological properties of the 3D cell architectures of homologous tissues despite their similar shape. Our work demonstrates the power of comparative 3D cellular morphometry and the importance of internal tissues and their cellular architecture in organ morphogenesis. Summary Statement Quantitative morphometric comparison of 3D digital ovules at full cellular resolution reveals diversity in internal 3D cellular architectures between similarly shaped ovules of Cardamine hirsuta and Arabidopsis thaliana.
  16. Manifold Learning for Coherent Design Interpolation Based on Geometrical and Topological Descriptors (2023)

    D. Muñoz, O. Allix, F. Chinesta, J. J. Ródenas, E. Nadal
    Abstract In the context of intellectual property in the manufacturing industry, know-how is referred to practical knowledge on how to accomplish a specific task. This know-how is often difficult to be synthesised in a set of rules or steps as it remains in the intuition and expertise of engineers, designers, and other professionals. Today, a new research line in this concern spot-up thanks to the explosion of Artificial Intelligence and Machine Learning algorithms and its alliance with Computational Mechanics and Optimisation tools. However, a key aspect with industrial design is the scarcity of available data, making it problematic to rely on deep-learning approaches. Assuming that the existing designs live in a manifold, in this paper, we propose a synergistic use of existing Machine Learning tools to infer a reduced manifold from the existing limited set of designs and, then, to use it to interpolate between the individuals, working as a generator basis, to create new and coherent designs. For this, a key aspect is to be able to properly interpolate in the reduced manifold, which requires a proper clustering of the individuals. From our experience, due to the scarcity of data, adding topological descriptors to geometrical ones considerably improves the quality of the clustering. Thus, a distance, mixing topology and geometry is proposed. This distance is used both, for the clustering and for the interpolation. For the interpolation, relying on optimal transport appear to be mandatory. Examples of growing complexity are proposed to illustrate the goodness of the method.
  17. Path Homologies of Motifs and Temporal Network Representations (2022)

    Samir Chowdhury, Steve Huntsman, Matvey Yutin
    Abstract Path homology is a powerful method for attaching algebraic invariants to digraphs. While there have been growing theoretical developments on the algebro-topological framework surrounding path homology, bona fide applications to the study of complex networks have remained stagnant. We address this gap by presenting an algorithm for path homology that combines efficient pruning and indexing techniques and using it to topologically analyze a variety of real-world complex temporal networks. A crucial step in our analysis is the complete characterization of path homologies of certain families of small digraphs that appear as subgraphs in these complex networks. These families include all digraphs, directed acyclic graphs, and undirected graphs up to certain numbers of vertices, as well as some specially constructed cases. Using information from this analysis, we identify small digraphs contributing to path homology in dimension two for three temporal networks in an aggregated representation and relate these digraphs to network behavior. We then investigate alternative temporal network representations and identify complementary subgraphs as well as behavior that is preserved across representations. We conclude that path homology provides insight into temporal network structure, and in turn, emergent structures in temporal networks provide us with new subgraphs having interesting path homology.
  18. Confinement in Non-Abelian Lattice Gauge Theory via Persistent Homology (2022)

    Daniel Spitz, Julian M. Urban, Jan M. Pawlowski
    Abstract We investigate the structure of confining and deconfining phases in SU(2) lattice gauge theory via persistent homology, which gives us access to the topology of a hierarchy of combinatorial objects constructed from given data. Specifically, we use filtrations by traced Polyakov loops, topological densities, holonomy Lie algebra fields, as well as electric and magnetic fields. This allows for a comprehensive picture of confinement. In particular, topological densities form spatial lumps which show signatures of the classical probability distribution of instanton-dyons. Signatures of well-separated dyons located at random positions are encoded in holonomy Lie algebra fields, following the semi-classical temperature dependence of the instanton appearance probability. Debye screening discriminating between electric and magnetic fields is visible in persistent homology and pronounced at large gauge coupling. All employed constructions are gauge-invariant without a priori assumptions on the configurations under study. This work showcases the versatility of persistent homology for statistical and quantum physics studies, barely explored to date.
  19. Persistent Homology for Breast Tumor Classification Using Mammogram Scans (2022)

    Aras Asaad, Dashti Ali, Taban Majeed, Rasber Rashid
    Abstract An Important tool in the field topological data analysis is known as persistent Homology (PH) which is used to encode abstract representation of the homology of data at different resolutions in the form of persistence diagram (PD). In this work we build more than one PD representation of a single image based on a landmark selection method, known as local binary patterns, that encode different types of local textures from images. We employed different PD vectorizations using persistence landscapes, persistence images, persistence binning (Betti Curve) and statistics. We tested the effectiveness of proposed landmark based PH on two publicly available breast abnormality detection datasets using mammogram scans. Sensitivity of landmark based PH obtained is over 90% in both datasets for the detection of abnormal breast scans. Finally, experimental results give new insights on using different types of PD vectorizations which help in utilising PH in conjunction with machine learning classifiers.
  20. Quantitative Analysis of Phase Transitions in Two-Dimensional XY Models Using Persistent Homology (2022)

    Nicholas Sale, Jeffrey Giansiracusa, Biagio Lucini
    Abstract We use persistent homology and persistence images as an observable of three different variants of the two-dimensional XY model in order to identify and study their phase transitions. We examine models with the classical XY action, a topological lattice action, and an action with an additional nematic term. In particular, we introduce a new way of computing the persistent homology of lattice spin model configurations and, by considering the fluctuations in the output of logistic regression and k-nearest neighbours models trained on persistence images, we develop a methodology to extract estimates of the critical temperature and the critical exponent of the correlation length. We put particular emphasis on finite-size scaling behaviour and producing estimates with quantifiable error. For each model we successfully identify its phase transition(s) and are able to get an accurate determination of the critical temperatures and critical exponents of the correlation length.
  21. Musical Stylistic Analysis: A Study of Intervallic Transition Graphs via Persistent Homology (2022)

    Martín Mijangos, Alessandro Bravetti, Pablo Padilla
    Abstract Topological data analysis has been recently applied to investigate stylistic signatures and trends in musical compositions. A useful tool in this area is Persistent Homology. In this paper, we develop a novel method to represent a weighted directed graph as a finite metric space and then use persistent homology to extract useful features. We apply this method to weighted directed graphs obtained from pitch transitions information of a given musical fragment and use these techniques to the study of stylistic trends. In particular, we are interested in using these tools to make quantitative stylistic comparisons. As a first illustration, we analyze a selection of string quartets by Haydn, Mozart and Beethoven and discuss possible implications of our results in terms of different approaches by these composers to stylistic exploration and variety. We observe that Haydn is stylistically the most conservative, followed by Mozart, while Beethoven is the most innovative, expanding and modifying the string quartet as a musical form. Finally we also compare the variability of different genres, namely minuets, allegros, prestos and adagios, by a given composer and conclude that the minuet is the most stable form of the string quartet movements.
  22. Persistent Homology in Cosmic Shear - II. A Tomographic Analysis of DES-Y1 (2022)

    Sven Heydenreich, Benjamin Brück, Pierre Burger, Joachim Harnois-Déraps, Sandra Unruh, Tiago Castro, Klaus Dolag, Nicolas Martinet
    Abstract We demonstrate how to use persistent homology for cosmological parameter inference in a tomographic cosmic shear survey. We obtain the first cosmological parameter constraints from persistent homology by applying our method to the first-year data of the Dark Energy Survey. To obtain these constraints, we analyse the topological structure of the matter distribution by extracting persistence diagrams from signal-to-noise maps of aperture masses. This presents a natural extension to the widely used peak count statistics. Extracting the persistence diagrams from the cosmo-SLICS, a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations with variable cosmological parameters, we interpolate the signal using Gaussian processes and marginalise over the most relevant systematic effects, including intrinsic alignments and baryonic effects. For the structure growth parameter, we find , which is in full agreement with other late-time probes. We also constrain the intrinsic alignment parameter to \textlessi\textgreaterA\textlessi/\textgreater = 1.54 ± 0.52, which constitutes a detection of the intrinsic alignment effect at almost 3\textlessi\textgreaterσ\textlessi/\textgreater.
  23. A Framework for Topological Music Analysis (TMA) (2022)

    Alberto Alcalá-Alvarez, Pablo Padilla-Longoria
    Abstract In the present article we describe and discuss a framework for applying different topological data analysis (TDA) techniques to a music fragment given as a score in traditional Western notation. We first consider different sets of points in Euclidean spaces of different dimensions that correspond to musical events in the score, and obtain their persistent homology features. Then we introduce two families of simplicial complexes that can be associated to chord sequences, and calculate their main homological descriptors. These complexes lead us to the definition of dynamical systems modeling harmonic progressions. Finally, we show the results of applying the described methods to the analysis and stylistic comparison of fragments from three Brandenburg Concertos by J.S. Bach and two Graffiti by Mexican composer Armando Luna.
  24. Transfer Learning for Autonomous Chatter Detection in Machining (2022)

    Melih C. Yesilli, Firas A. Khasawneh, Brian P. Mann
    Abstract Large-amplitude chatter vibrations are one of the most important phenomena in machining processes. It is often detrimental in cutting operations causing a poor surface finish and decreased tool life. Therefore, chatter detection using machine learning has been an active research area over the last decade. Three challenges can be identified in applying machine learning for chatter detection at large in industry: an insufficient understanding of the universality of chatter features across different processes, the need for automating feature extraction, and the existence of limited data for each specific workpiece-machine tool combination, e.g., when machining one-off products. These three challenges can be grouped under the umbrella of transfer learning, which is concerned with studying how knowledge gained from one setting can be leveraged to obtain information in new settings. This paper studies automating chatter detection by evaluating transfer learning of prominent as well as novel chatter detection methods. We investigate chatter classification accuracy using a variety of features extracted from turning and milling experiments with different cutting configurations. The studied methods include Fast Fourier Transform (FFT), Power Spectral Density (PSD), the Auto-correlation Function (ACF), and decomposition based tools such as Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We also examine more recent approaches based on Topological Data Analysis (TDA) and similarity measures of time series based on Discrete Time Warping (DTW). We evaluate transfer learning potential of each approach by training and testing both within and across the turning and milling data sets. Four supervised classification algorithms are explored: support vector machine (SVM), logistic regression, random forest classification, and gradient boosting. In addition to accuracy, we also comment on the automation potential of feature extraction for each approach which is integral to creating autonomous manufacturing centers. Our results show that carefully chosen time-frequency features can lead to high classification accuracies albeit at the cost of requiring manual pre-processing and the tagging of an expert user. On the other hand, we found that the TDA and DTW approaches can provide accuracies and F1-scores on par with the time-frequency methods without the need for manual preprocessing via completely automatic pipelines. Further, we discovered that the DTW approach outperforms all other methods when trained using the milling data and tested on the turning data. Therefore, TDA and DTW approaches may be preferred over the time-frequency-based approaches for fully automated chatter detection schemes. DTW and TDA also can be more advantageous when pooling data from either limited workpiece-machine tool combinations, or from small data sets of one-off processes.
  25. Time-Inhomogeneous Diffusion Geometry and Topology (2022)

    Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy
    Abstract Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
  26. Some Applications of TDA on Financial Markets (2022)

    Miguel Angel Ruiz-Ortiz, José Carlos Gómez-Larrañaga, Jesús Rodríguez-Viorato
    Abstract The Topological Data Analysis (TDA) has had many applications. However, financial markets has been studied slightly through TDA. Here we present a quick review of some recent applications of TDA on financial markets and propose a new turbulence index based on persistent homology -- the fundamental tool for TDA -- that seems to capture critical transitions on financial data, based on our experiment with SP500 data before 2020 stock market crash in February 20, 2020, due to the COVID-19 pandemic. We review applications in the early detection of turbulence periods in financial markets and how TDA can help to get new insights while investing and obtain superior risk-adjusted returns compared with investing strategies using classical turbulence indices as VIX and the Chow's index based on the Mahalanobis distance. Furthermore, we include an introduction to persistent homology so the reader could be able to understand this paper without knowing TDA.
  27. Unsupervised Topological Learning Approach of Crystal Nucleation (2022)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unravelled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometre length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to monatomic metals, it shows that both translational and orientational ordering always come into play simultaneously as a result of the strong bonding when homogeneous nucleation starts in regions with low five-fold symmetry. It also reveals the specificity of the nucleation pathways depending on the element considered, with features beyond the hypothesis of Classical Nucleation Theory.
  28. Determining Clinically Relevant Features in Cytometry Data Using Persistent Homology (2022)

    Soham Mukherjee, Darren Wethington, Tamal K. Dey, Jayajit Das
    Abstract Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as ‘elbows’. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.

    Community Resources

  29. Barcodes Distinguish Morphology of Neuronal Tauopathy (2022)

    David Beers, Despoina Goniotaki, Diane P. Hanger, Alain Goriely, Heather A. Harrington
    Abstract The geometry of neurons is known to be important for their functions. Hence, neurons are often classified by their morphology. Two recent methods, persistent homology and the topological morphology descriptor, assign a morphology descriptor called a barcode to a neuron equipped with a given function, such as the Euclidean distance from the root of the neuron. These barcodes can be converted into matrices called persistence images, which can then be averaged across groups. We show that when the defining function is the path length from the root, both the topological morphology descriptor and persistent homology are equivalent. We further show that persistence images arising from the path length procedure provide an interpretable summary of neuronal morphology. We introduce \topological morphology functions\, a class of functions similar to Sholl functions, that can be recovered from the associated topological morphology descriptor. To demonstrate this topological approach, we compare healthy cortical and hippocampal mouse neurons to those affected by progressive tauopathy. We find a significant difference in the morphology of healthy neurons and those with a tauopathy at a postsymptomatic age. We use persistence images to conclude that the diseased group tends to have neurons with shorter branches as well as fewer branches far from the soma.
  30. Unsupervised Topological Learning for Identification of Atomic Structures (2022)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract We propose an unsupervised learning methodology with descriptors based on topological data analysis (TDA) concepts to describe the local structural properties of materials at the atomic scale. Based only on atomic positions and without a priori knowledge, our method allows for an autonomous identification of clusters of atomic structures through a Gaussian mixture model. We apply successfully this approach to the analysis of elemental Zr in the crystalline and liquid states as well as homogeneous nucleation events under deep undercooling conditions. This opens the way to deeper and autonomous study of complex phenomena in materials at the atomic scale.
  31. Topological Biomarkers for Real-Time Detection of Epileptic Seizures (2022)

    Ximena Fernández, Diego Mateos
    Abstract Automated seizure detection is a fundamental problem in computational neuroscience towards diagnosis and treatment's improvement of epileptic disease. We propose a real-time computational method for automated tracking and detection of epileptic seizures from raw neurophysiological recordings. Our mechanism is based on the topological analysis of the sliding-window embedding of the time series derived from simultaneously recorded channels. We extract topological biomarkers from the signals via the computation of the persistent homology of time-evolving topological spaces. Remarkably, the proposed biomarkers robustly captures the change in the brain dynamics during the ictal state. We apply our methods in different types of signals including scalp and intracranial EEG and MEG, in patients during interictal and ictal states, showing high accuracy in a range of clinical situations.
  32. Gene Expression Data Classification Using Topology and Machine Learning Models (2022)

    Tamal K. Dey, Sayan Mandal, Soham Mukherjee
    Abstract Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.
  33. Extremal Event Graphs: A (Stable) Tool for Analyzing Noisy Time Series Data (2022)

    Robin Belton, Bree Cummins, Brittany Terese Fasy, Tomáš Gedeon
    Abstract Local maxima and minima, or extremal events, in experimental time series can be used as a coarse summary to characterize data. However, the discrete sampling in recording experimental measurements suggests uncertainty on the true timing of extrema during the experiment. This in turn gives uncertainty in the timing order of extrema within the time series. Motivated by applications in genomic time series and biological network analysis, we construct a weighted directed acyclic graph (DAG) called an extremal event DAG using techniques from persistent homology that is robust to measurement noise. Furthermore, we define a distance between extremal event DAGs based on the edit distance between strings. We prove several properties including local stability for the extremal event DAG distance with respect to pairwise \$L_\\infty\\$ distances between functions in the time series data. Lastly, we provide algorithms, publicly free software, and implementations on extremal event DAG construction and comparison.
  34. Topological Phase Estimation Method for Reparameterized Periodic Functions (2022)

    Thomas Bonis, Frédéric Chazal, Bertrand Michel, Wojciech Reise
    Abstract We consider a signal composed of several periods of a periodic function, of which we observe a noisy reparametrisation. The phase estimation problem consists of finding that reparametrisation, and, in particular, the number of observed periods. Existing methods are well-suited to the setting where the periodic function is known, or at least, simple. We consider the case when it is unknown and we propose an estimation method based on the shape of the signal. We use the persistent homology of sublevel sets of the signal to capture the temporal structure of its local extrema. We infer the number of periods in the signal by counting points in the persistence diagram and their multiplicities. Using the estimated number of periods, we construct an estimator of the reparametrisation. It is based on counting the number of sufficiently prominent local minima in the signal. This work is motivated by a vehicle positioning problem, on which we evaluated the proposed method.
  35. Severe Slugging Flow Identification From Topological Indicators (2022)

    Simone Casolo
    Abstract In this work, topological data analysis is used to identify the onset of severe slug flow in offshore petroleum production systems. Severe slugging is a multiphase flow regime known to be very inefficient and potentially harmful to process equipment and it is characterized by large oscillations in the production fluid pressure. Time series from pressure sensors in subsea oil wells are processed by means of Takens embedding to produce point clouds of data. Embedded sensor data is then analyzed using persistent homology to obtain topological indicators capable of revealing the occurrence of severe slugging in a condition-based monitoring approach. A large dataset of well events consisting of both real and simulated data is used to demonstrate the possibilty of authomatizing severe slugging detection from live data via topological data analysis. Methods based on persistence diagrams are shown to accurately identify severe slugging and to classify different flow regimes from pressure signals of producing wells with supervised machine learning.
  36. A Topological Machine Learning Pipeline for Classification (2022)

    Francesco Conti, Davide Moroni, Maria Antonietta Pascali
    Abstract In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.
  37. Exploring the Geometry and Topology of Neural Network Loss Landscapes (2022)

    Stefan Horoi, Jessie Huang, Bastian Rieck, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy
    Abstract Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have proposed visualizing the loss landscape through the use of simple dimensionality reduction techniques. However, such visualization methods have been limited by their linear nature and only capture features in one or two dimensions, thus restricting sampling of the loss landscape to lines or planes. Here, we expand and improve upon these in three ways. First, we present a novel “jump and retrain” procedure for sampling relevant portions of the loss landscape. We show that the resulting sampled data holds more meaningful information about the network’s ability to generalize. Next, we show that non-linear dimensionality reduction of the jump and retrain trajectories via PHATE, a trajectory and manifold-preserving method, allows us to visualize differences between networks that are generalizing well vs poorly. Finally, we combine PHATE trajectories with a computational homology characterization to quantify trajectory differences.
  38. Exploring Surface Texture Quantification in Piezo Vibration Striking Treatment (PVST) Using Topological Measures (2022)

    Melih C. Yesilli, Max M. Chumley, Jisheng Chen, Firas A. Khasawneh, Yang Guo
    Abstract Abstract. Surface texture influences wear and tribological properties of manufactured parts, and it plays a critical role in end-user products. Therefore, quantifying the order or structure of a manufactured surface provides important information on the quality and life expectancy of the product. Although texture can be intentionally introduced to enhance aesthetics or to satisfy a design function, sometimes it is an inevitable byproduct of surface treatment processes such as Piezo Vibration Striking Treatment (PVST). Measures of order for surfaces have been characterized using statistical, spectral, and geometric approaches. For nearly hexagonal lattices, topological tools have also been used to measure the surface order. This paper explores utilizing tools from Topological Data Analysis for measuring surface texture. We compute measures of order based on optical digital microscope images of surfaces treated using PVST. These measures are applied to the grid obtained from estimating the centers of tool impacts, and they quantify the grid’s deviations from the nominal one. Our results show that TDA provides a convenient framework for characterization of pattern type that bypasses some limitations of existing tools such as difficult manual processing of the data and the need for an expert user to analyze and interpret the surface images.
  39. Topological Descriptors for Coral Reef Resilience Using a Stochastic Spatial Model (2022)

    Robert A. McDonald, Rosanna Neuhausler, Martin Robinson, Laurel G. Larsen, Heather A. Harrington, Maria Bruna
    Abstract A complex interplay between species governs the evolution of spatial patterns in ecology. An open problem in the biological sciences is characterizing spatio-temporal data and understanding how changes at the local scale affect global dynamics/behavior. We present a toolkit of multiscale methods and use them to analyze coral reef resilience and dynamics.Here, we extend a well-studied temporal mathematical model of coral reef dynamics to include stochastic and spatial interactions and then generate data to study different ecological scenarios. We present descriptors to characterize patterns in heterogeneous spatio-temporal data surpassing spatially averaged measures. We apply these descriptors to simulated coral data and demonstrate the utility of two topological data analysis techniques--persistent homology and zigzag persistence--for characterizing the spatiotemporal evolution of reefs and generating insight into mechanisms of reef resilience. We show that the introduction of local competition between species leads to the appearance of coral clusters in the reef. Furthermore, we use our analyses to distinguish the temporal dynamics that stem from different initial configurations of coral, showing that the neighborhood composition of coral sites determines their long-term survival. Finally, we use zigzag persistence to quantify spatial behavior in the metastable regime as the level of fish grazing on algae varies and determine which spatial configurations protect coral from extinction in different environments.
  40. Topology in Cyber Research (2022)

    Steve Huntsman, Jimmy Palladino, Michael Robinson
    Abstract We give an idiosyncratic overview of applications of topology to cyber research, spanning the analysis of variables/assignments and control flow in computer programs, a brief sketch of topological data analysis in one dimension, and the use of sheaves to analyze wireless networks. The text is from a chapter in the forthcoming book Mathematics in Cyber Research, to be published by Taylor and Francis.
  41. A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks (2022)

    Bouchaib Azamir, Driss Bennis, Bertrand Michel
    Abstract Topological data analysis has recently been applied to the study of dynamic networks. In this context, an algorithm was introduced and helps, among other things, to detect early warning signals of abnormal changes in the dynamic network under study. However, the complexity of this algorithm increases significantly once the database studied grows. In this paper, we propose a simplification of the algorithm without affecting its performance. We give various applications and simulations of the new algorithm on some weighted networks. The obtained results show clearly the efficiency of the introduced approach. Moreover, in some cases, the proposed algorithm makes it possible to highlight local information and sometimes early warning signals of local abnormal changes.
  42. Topological Data Analysis for Electric Motor Eccentricity Fault Detection (2022)

    Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru
    Abstract In this paper, we develop topological data analysis (TDA) method for motor current signature analysis (MCSA), and apply it to induction motor eccentricity fault detection. We introduce TDA and present the procedure of extracting topological features from time-domain data that will be represented using persistence diagrams and vectorized Betti sequences. The procedure is applied to induction machine phase current signal analysis, and shown to be highly effective in differentiating signals from different eccentricity levels. With TDA, we are able to use a simple regression model that can predict the fault levels with reasonable accuracy, even for the data of eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of fault detection applications.
  43. Topological Early Warning Signals: Quantifying Varying Routes to Extinction in a Spatially Distributed Population Model (2022)

    Laura S. Storch, Sarah L. Day
    Abstract Understanding and predicting critical transitions in spatially explicit ecological systems is particularly challenging due to their complex spatial and temporal dynamics and high dimensionality. Here, we explore changes in population distribution patterns during a critical transition (an extinction event) using computational topology. Computational topology allows us to quantify certain features of a population distribution pattern, such as the level of fragmentation. We create population distribution patterns via a simple coupled patch model with Ricker map growth and nearest neighbors dispersal on a two dimensional lattice. We observe two dominant paths to extinction within the explored parameter space that depend critically on the dispersal rate d and the rate of parameter drift, Δϵ. These paths to extinction are easily topologically distinguishable, so categorization can be automated. We use this population model as a theoretical proof-of-concept for the methodology, and argue that computational topology is a powerful tool for analyzing dynamical changes in systems with noisy data that are coarsely resolved in space and/or time. In addition, computational topology can provide early warning signals for chaotic dynamical systems where traditional statistical early warning signals would fail. For these reasons, we envision this work as a helpful addition to the critical transitions prediction toolbox.
  44. Toroidal Topology of Population Activity in Grid Cells (2022)

    Richard J. Gardner, Erik Hermansen, Marius Pachitariu, Yoram Burak, Nils A. Baas, Benjamin A. Dunn, May-Britt Moser, Edvard I. Moser
    Abstract The medial entorhinal cortex is part of a neural system for mapping the position of an individual within a physical environment1. Grid cells, a key component of this system, fire in a characteristic hexagonal pattern of locations2, and are organized in modules3 that collectively form a population code for the animal’s allocentric position1. The invariance of the correlation structure of this population code across environments4,5 and behavioural states6,7, independent of specific sensory inputs, has pointed to intrinsic, recurrently connected continuous attractor networks (CANs) as a possible substrate of the grid pattern1,8–11. However, whether grid cell networks show continuous attractor dynamics, and how they interface with inputs from the environment, has remained unclear owing to the small samples of cells obtained so far. Here, using simultaneous recordings from many hundreds of grid cells and subsequent topological data analysis, we show that the joint activity of grid cells from an individual module resides on a toroidal manifold, as expected in a two-dimensional CAN. Positions on the torus correspond to positions of the moving animal in the environment. Individual cells are preferentially active at singular positions on the torus. Their positions are maintained between environments and from wakefulness to sleep, as predicted by CAN models for grid cells but not by alternative feedforward models12. This demonstration of network dynamics on a toroidal manifold provides a population-level visualization of CAN dynamics in grid cells.
  45. Continuous Indexing of Fibrosis (CIF): Improving the Assessment and Classification of MPN Patients (2022)

    Hosuk Ryou, Korsuk Sirinukunwattana, Alan Aberdeen, Gillian Grindstaff, Bernadette Stolz, Helen Byrne, Heather A. Harrington, Nikolaos Sousos, Anna L. Godfrey, Claire N. Harrison, Bethan Psaila, Adam J. Mead, Gabrielle Rees, Gareth D. H. Turner, Jens Rittscher, Daniel Royston
    Abstract The detection and grading of fibrosis in myeloproliferative neoplasms (MPN) is an important component of disease classification, prognostication and disease monitoring. However, current fibrosis grading systems are only semi-quantitative and fail to capture sample heterogeneity. To improve the detection, quantitation and representation of reticulin fibrosis, we developed a machine learning (ML) approach using bone marrow trephine (BMT) samples (n = 107) from patients diagnosed with MPN or a reactive / nonneoplastic marrow. The resulting Continuous Indexing of Fibrosis (CIF) enhances the detection and monitoring of fibrosis within BMTs, and aids the discrimination of MPN subtypes. When combined with megakaryocyte feature analysis, CIF discriminates between the frequently challenging differential diagnosis of essential thrombocythemia (ET) and pre-fibrotic myelofibrosis (pre-PMF) with high predictive accuracy [area under the curve = 0.94]. CIF also shows significant promise in the identification of MPN patients at risk of disease progression; analysis of samples from 35 patients diagnosed with ET and enrolled in the Primary Thrombocythemia-1 (PT-1) trial identified features predictive of post-ET myelofibrosis (area under the curve = 0.77). In addition to these clinical applications, automated analysis of fibrosis has clear potential to further refine disease classification boundaries and inform future studies of the micro-environmental factors driving disease initiation and progression in MPN and other stem cell disorders. The image analysis methods used to generate CIF can be readily integrated with those of other key morphological features in MPNs, including megakaryocyte morphology, that lie beyond the scope of conventional histological assessment. Key PointsMachine learning enables an objective and quantitative description of reticulin fibrosis within the bone marrow of patients with myeloproliferative neoplasms (MPN),Automated analysis and Continuous Indexing of Fibrosis (CIF) captures heterogeneity within MPN samples and has utility in refined classification and disease monitoringQuantitative fibrosis assessment combined with topological data analysis may help to predict patients at increased risk of progression to post-ET myelofibrosis, and assist in the discrimination of ET and pre-fibrotic PMF (pre-PMF)
  46. Homological Scaffold via Minimal Homology Bases (2021)

    Marco Guerra, Alessandro De Gregorio, Ulderico Fugacci, Giovanni Petri, Francesco Vaccarino
    Abstract The homological scaffold leverages persistent homology to construct a topologically sound summary of a weighted network. However, its crucial dependency on the choice of representative cycles hinders the ability to trace back global features onto individual network components, unless one provides a principled way to make such a choice. In this paper, we apply recent advances in the computation of minimal homology bases to introduce a quasi-canonical version of the scaffold, called minimal, and employ it to analyze data both real and in silico. At the same time, we verify that, statistically, the standard scaffold is a good proxy of the minimal one for sufficiently complex networks.
  47. Persistent Homology of Geospatial Data: A Case Study With Voting (2021)

    Michelle Feng, Mason A. Porter
    Abstract A crucial step in the analysis of persistent homology is the transformation of data into an appropriate topological object (which, in our case, is a simplicial complex). Software packages for computing persistent homology typically construct Vietoris--Rips or other distance-based simplicial complexes on point clouds because they are relatively easy to compute. We investigate alternative methods of constructing simplicial complexes and the effects of making associated choices during simplicial-complex construction on the output of persistent-homology algorithms. We present two new methods for constructing simplicial complexes from two-dimensional geospatial data (such as maps). We apply these methods to a California precinct-level voting data set, and we thereby demonstrate that our new constructions can capture geometric characteristics that are missed by distance-based constructions. Our new constructions can thus yield more interpretable persistence modules and barcodes for geospatial data. In particular, they are able to distinguish short-persistence features that occur only for a narrow range of distance scales (e.g., voting patterns in densely populated cities) from short-persistence noise by incorporating information about other spatial relationships between regions.
  48. Topological Data Analysis of C. Elegans Locomotion and Behavior (2021)

    Ashleigh Thomas, Kathleen Bates, Alex Elchesen, Iryna Hartsock, Hang Lu, Peter Bubenik
    Abstract Video of nematodes/roundworms was analyzed using persistent homology to study locomotion and behavior. In each frame, an organism's body posture was represented by a high-dimensional vector. By concatenating points in fixed-duration segments of this time series, we created a sliding window embedding (sometimes called a time delay embedding) where each point corresponds to a sequence of postures of an organism. Persistent homology on the points in this time series detected behaviors and comparisons of these persistent homology computations detected variation in their corresponding behaviors. We used average persistence landscapes and machine learning techniques to study changes in locomotion and behavior in varying environments.
  49. Topological Data Analysis of Collective and Individual Epithelial Cells Using Persistent Homology of Loops (2021)

    Dhananjay Bhaskar, William Y. Zhang, Ian Y. Wong
    Abstract Interacting, self-propelled particles such as epithelial cells can dynamically self-organize into complex multicellular patterns, which are challenging to classify without a priori information. Classically, different phases and phase transitions have been described based on local ordering, which may not capture structural features at larger length scales. Instead, topological data analysis (TDA) determines the stability of spatial connectivity at varying length scales (i.e. persistent homology), and can compare different particle configurations based on the “cost” of reorganizing one configuration into another. Here, we demonstrate a topology-based machine learning approach for unsupervised profiling of individual and collective phases based on large-scale loops. We show that these topological loops (i.e. dimension 1 homology) are robust to variations in particle number and density, particularly in comparison to connected components (i.e. dimension 0 homology). We use TDA to map out phase diagrams for simulated particles with varying adhesion and propulsion, at constant population size as well as when proliferation is permitted. Next, we use this approach to profile our recent experiments on the clustering of epithelial cells in varying growth factor conditions, which are compared to our simulations. Finally, we characterize the robustness of this approach at varying length scales, with sparse sampling, and over time. Overall, we envision TDA will be broadly applicable as a model-agnostic approach to analyze active systems with varying population size, from cytoskeletal motors to motile cells to flocking or swarming animals.
  50. Persistent Homology Based Graph Convolution Network for Fine-Grained 3D Shape Segmentation (2021)

    Chi-Chong Wong, Chi-Man Vong
    Abstract Fine-grained 3D segmentation is an important task in 3D object understanding, especially in applications such as intelligent manufacturing or parts analysis for 3D objects. However, many challenges involved in such problem are yet to be solved, such as i) interpreting the complex structures located in different regions for 3D objects; ii) capturing fine-grained structures with sufficient topology correctness. Current deep learning and graph machine learning methods fail to tackle such challenges and thus provide inferior performance in fine-grained 3D analysis. In this work, methods in topological data analysis are incorporated with geometric deep learning model for the task of fine-grained segmentation for 3D objects. We propose a novel neural network model called Persistent Homology based Graph Convolution Network (PHGCN), which i) integrates persistent homology into graph convolution network to capture multi-scale structural information that can accurately represent complex structures for 3D objects; ii) applies a novel Persistence Diagram Loss (ℒPD) that provides sufficient topology correctness for segmentation over the fine-grained structures. Extensive experiments on fine-grained 3D segmentation validate the effectiveness of the proposed PHGCN model and show significant improvements over current state-of-the-art methods.
  51. A Multi-Parameter Persistence Framework for Mathematical Morphology (2021)

    Yu-Min Chung, Sarah Day, Chuan-Shen Hu
    Abstract The field of mathematical morphology offers well-studied techniques for image processing. In this work, we view morphological operations through the lens of persistent homology, a tool at the heart of the field of topological data analysis. We demonstrate that morphological operations naturally form a multiparameter filtration and that persistent homology can then be used to extract information about both topology and geometry in the images as well as to automate methods for optimizing the study and rendering of structure in images. For illustration, we apply this framework to analyze noisy binary, grayscale, and color images.
  52. The (Homological) Persistence of Gerrymandering (2021)

    Moon Duchin, Tom Needham, Thomas Weighill
    Abstract \textlessp style='text-indent:20px;'\textgreaterWe apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. We begin by combining geographic and electoral data from a districting plan to produce a persistence diagram. Then, to see beyond a particular plan and understand the possibilities afforded by the choices made in redistricting, we build methods to visualize and analyze large ensembles of alternative plans. Our detailed case studies use zero-dimensional homology (persistent components) of filtered graphs constructed from voting data to analyze redistricting in Pennsylvania and North Carolina. We find that, across large ensembles of partitions, the features cluster in the persistence diagrams in a way that corresponds strongly to geographic location, so that we can construct an average diagram for an ensemble, with each point identified with a geographical region. Using this localization lets us produce zonings of each state at Congressional, state Senate, and state House scales, show the regional non-uniformity of election shifts, and identify attributes of partitions that tend to correspond to partisan advantage.\textless/p\textgreater\textlessp style='text-indent:20px;'\textgreaterThe methods here are set up to be broadly applicable to the use of TDA on large ensembles of data. Many studies will benefit from interpretable summaries of large sets of samples or simulations, and the work here on localization and zoning will readily generalize to other partition problems, which are abundant in scientific applications. For the mathematically and politically rich problem of redistricting in particular, TDA provides a powerful and elegant summarization tool whose findings will be useful for practitioners.\textless/p\textgreater
  53. Unsupervised Topological Learning Approach of Crystal Nucleation in Pure Tantalum (2021)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unraveled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometer length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to a monatomic metal, namely Tantalum (Ta), it shows that both translational and orientational ordering always come into play simultaneously when homogeneous nucleation starts in regions with low five-fold symmetry.
  54. Loops Abound in the Cosmic Microwave Background: A \$4\sigma\$ Anomaly on Super-Horizon Scales (2021)

    Pratyush Pranav
    Abstract We present a topological analysis of the temperature fluctuation maps from the \emph\Planck 2020\ Data release 4 (DR4) based on the \texttt\NPIPE\ data processing pipeline. For comparison, we also present the topological characteristics of the maps from \emph\Planck 2018\ Data release 3 (DR3). We perform our analysis in terms of the homology characteristics of the maps, invoking relative homology to account for analysis in the presence of masks. We perform our analysis for a range of smoothing scales spanning sub- and super-horizon scales corresponding to \$FWHM = 5', 10', 20', 40', 80', 160', 320', 640'\$. Our main result indicates a significantly anomalous behavior of the loops in the observed maps compared to simulations that are modeled as isotopic and homogeneous Gaussian random fields. Specifically, we observe a \$4\sigma\$ deviation between the observation and simulations in the number of loops at \$FWHM = 320'\$ and \$FWHM = 640'\$, corresponding to super-horizon scales of \$5\$ degrees and larger. In addition, we also notice a mildly significant deviation at \$2\sigma\$ for all the topological descriptors for almost all the scales analyzed. Our results show a consistency across different data releases, and therefore, the anomalous behavior deserves a careful consideration regarding its origin and ramifications. Disregarding the unlikely source of the anomaly being instrumental systematics, the origin of the anomaly may be genuinely astrophysical -- perhaps due to a yet unresolved foreground, or truly primordial in nature. Given the nature of the topological descriptors, that potentially encodes information of all orders, non-Gaussianities, of either primordial or late-type nature, may be potential candidates. Alternate possibilities include the Universe admitting a non-trivial global topology, including effects induced by large-scale topological defects.
  55. Data-Driven and Automatic Surface Texture Analysis Using Persistent Homology (2021)

    Melih C. Yesilli, Firas A. Khasawneh
    Abstract Surface roughness plays an important role in analyzing engineering surfaces. It quantifies the surface topography and can be used to determine whether the resulting surface finish is acceptable or not. Nevertheless, while several existing tools and standards are available for computing surface roughness, these methods rely heavily on user input thus slowing down the analysis and increasing manufacturing costs. Therefore, fast and automatic determination of the roughness level is essential to avoid costs resulting from surfaces with unacceptable finish, and user-intensive analysis. In this study, we propose a Topological Data Analysis (TDA) based approach to classify the roughness level of synthetic surfaces using both their areal images and profiles. We utilize persistent homology from TDA to generate persistence diagrams that encapsulate information on the shape of the surface. We then obtain feature matrices for each surface or profile using Carlsson coordinates, persistence images, and template functions. We compare our results to two widely used methods in the literature: Fast Fourier Transform (FFT) and Gaussian filtering. The results show that our approach yields mean accuracies as high as 97%. We also show that, in contrast to existing surface analysis tools, our TDA-based approach is fully automatable and provides adaptive feature extraction.
  56. Stable Topological Summaries for Analyzing the Organization of Cells in a Packed Tissue (2021)

    Nieves Atienza, Maria-Jose Jimenez, Manuel Soriano-Trigueros
    Abstract We use topological data analysis tools for studying the inner organization of cells in segmented images of epithelial tissues. More specifically, for each segmented image, we compute different persistence barcodes, which codify the lifetime of homology classes (persistent homology) along different filtrations (increasing nested sequences of simplicial complexes) that are built from the regions representing the cells in the tissue. We use a complete and well-grounded set of numerical variables over those persistence barcodes, also known as topological summaries. A novel combination of normalization methods for both the set of input segmented images and the produced barcodes allows for the proven stability results for those variables with respect to small changes in the input, as well as invariance to image scale. Our study provides new insights to this problem, such as a possible novel indicator for the development of the drosophila wing disc tissue or the importance of centroids’ distribution to differentiate some tissues from their CVT-path counterpart (a mathematical model of epithelia based on Voronoi diagrams). We also show how the use of topological summaries may improve the classification accuracy of epithelial images using a Random Forest algorithm.
  57. Using Persistent Homology as Preprocessing of Early Warning Signals for Critical Transition in Flood (2021)

    Syed Mohamad Sadiq Syed Musa, Mohd Salmi Md Noorani, Fatimah Abdul Razak, Munira Ismail, Mohd Almie Alias, Saiful Izzuan Hussain
    Abstract Flood early warning systems (FLEWSs) contribute remarkably to reducing economic and life losses during a flood. The theory of critical slowing down (CSD) has been successfully used as a generic indicator of early warning signals in various fields. A new tool called persistent homology (PH) was recently introduced for data analysis. PH employs a qualitative approach to assess a data set and provide new information on the topological features of the data set. In the present paper, we propose the use of PH as a preprocessing step to achieve a FLEWS through CSD. We test our proposal on water level data of the Kelantan River, which tends to flood nearly every year. The results suggest that the new information obtained by PH exhibits CSD and, therefore, can be used as a signal for a FLEWS. Further analysis of the signal, we manage to establish an early warning signal for ten of the twelve flood events recorded in the river; the two other events are detected on the first day of the flood. Finally, we compare our results with those of a FLEWS constructed directly from water level data and find that FLEWS via PH creates fewer false alarms than the conventional technique.
  58. Classification of COVID-19 via Homology of CT-SCAN (2021)

    Sohail Iqbal, H. Fareed Ahmed, Talha Qaiser, Muhammad Imran Qureshi, Nasir Rajpoot
    Abstract In this worldwide spread of SARS-CoV-2 (COVID-19) infection, it is of utmost importance to detect the disease at an early stage especially in the hot spots of this epidemic. There are more than 110 Million infected cases on the globe, sofar. Due to its promptness and effective results computed tomography (CT)-scan image is preferred to the reverse-transcription polymerase chain reaction (RT-PCR). Early detection and isolation of the patient is the only possible way of controlling the spread of the disease. Automated analysis of CT-Scans can provide enormous support in this process. In this article, We propose a novel approach to detect SARS-CoV-2 using CT-scan images. Our method is based on a very intuitive and natural idea of analyzing shapes, an attempt to mimic a professional medic. We mainly trace SARS-CoV-2 features by quantifying their topological properties. We primarily use a tool called persistent homology, from Topological Data Analysis (TDA), to compute these topological properties. We train and test our model on the "SARS-CoV-2 CT-scan dataset" i̧tep\soares2020sars\, an open-source dataset, containing 2,481 CT-scans of normal and COVID-19 patients. Our model yielded an overall benchmark F1 score of \$99.42\% \$, accuracy \$99.416\%\$, precision \$99.41\%\$, and recall \$99.42\%\$. The TDA techniques have great potential that can be utilized for efficient and prompt detection of COVID-19. The immense potential of TDA may be exploited in clinics for rapid and safe detection of COVID-19 globally, in particular in the low and middle-income countries where RT-PCR labs and/or kits are in a serious crisis.
  59. HiDeF: Identifying Persistent Structures in Multiscale ‘Omics Data (2021)

    Fan Zheng, She Zhang, Christopher Churas, Dexter Pratt, Ivet Bahar, Trey Ideker
    Abstract In any ‘omics study, the scale of analysis can dramatically affect the outcome. For instance, when clustering single-cell transcriptomes, is the analysis tuned to discover broad or specific cell types? Likewise, protein communities revealed from protein networks can vary widely in sizes depending on the method. Here, we use the concept of persistent homology, drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. Application to mouse single-cell transcriptomes significantly expands the catalog of identified cell types, while analysis of SARS-COV-2 protein interactions suggests hijacking of WNT. The method, HiDeF, is available via Python and Cytoscape.
  60. Topological Regularization for Dense Prediction (2021)

    Deqing Fu, Bradley J. Nelson
    Abstract Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with these topological descriptions. Experimental results show that the output topology can also appear in the internal activations of trained neural networks which allows for a novel use of topological regularization to the internal states of neural networks during training, reducing the computational cost of the regularization. We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures.
  61. Persistent Homology in Cosmic Shear: Constraining Parameters With Topological Data Analysis (2021)

    Sven Heydenreich, Benjamin Brück, Joachim Harnois-Déraps
    Abstract In recent years, cosmic shear has emerged as a powerful tool for studying the statistical distribution of matter in our Universe. Apart from the standard two-point correlation functions, several alternative methods such as peak count statistics offer competitive results. Here we show that persistent homology, a tool from topological data analysis, can extract more cosmological information than previous methods from the same data set. For this, we use persistent Betti numbers to efficiently summarise the full topological structure of weak lensing aperture mass maps. This method can be seen as an extension of the peak count statistics, in which we additionally capture information about the environment surrounding the maxima. We first demonstrate the performance in a mock analysis of the KiDS+VIKING-450 data: We extract the Betti functions from a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations and use these to train a Gaussian process emulator that provides rapid model predictions; we next run a Markov chain Monte Carlo analysis on independent mock data to infer the cosmological parameters and their uncertainties. When comparing our results, we recover the input cosmology and achieve a constraining power on that is 3% tighter than that on peak count statistics. Performing the same analysis on 100 deg\textlesssup\textgreater2\textlesssup/\textgreater of \textlessi\textgreaterEuclid\textlessi/\textgreater-like simulations, we are able to improve the constraints on \textlessi\textgreaterS\textlessi/\textgreater\textlesssub\textgreater8\textlesssub/\textgreater and Ω\textlesssub\textgreaterm\textlesssub/\textgreater by 19% and 12%, respectively, while breaking some of the degeneracy between \textlessi\textgreaterS\textlessi/\textgreater\textlesssub\textgreater8\textlesssub/\textgreater and the dark energy equation of state. To our knowledge, the methods presented here are the most powerful topological tools for constraining cosmological parameters with lensing data.
  62. Topological Attention for Time Series Forecasting (2021)

    Sebastian Zeng, Florian Graf, Christoph Hofer, Roland Kwitt
    Abstract The problem of (point) forecasting univariate time series is considered. Most approaches, ranging from traditional statistical methods to recent learning-based techniques with neural networks, directly operate on raw time series observations. As an extension, we study whether local topological properties, as captured via persistent homology, can serve as a reliable signal that provides complementary information for learning to forecast. To this end, we propose topological attention, which allows attending to local topological features within a time horizon of historical data. Our approach easily integrates into existing end-to-end trainable forecasting models, such as N-BEATS, and, in combination with the latter exhibits state-of-the-art performance on the large-scale M4 benchmark dataset of 100,000 diverse time series from different domains. Ablation experiments, as well as a comparison to recent techniques in a setting where only a single time series is available for training, corroborate the beneficial nature of including local topological information through an attention mechanism.
  63. Reviews: Topological Distances and Losses for Brain Networks (2021)

    Moo K. Chung, Alexander Smith, Gary Shiu
    Abstract Almost all statistical and machine learning methods in analyzing brain networks rely on distances and loss functions, which are mostly Euclidean or matrix norms. The Euclidean or matrix distances may fail to capture underlying subtle topological differences in brain networks. Further, Euclidean distances are sensitive to outliers. A few extreme edge weights may severely affect the distance. Thus it is necessary to use distances and loss functions that recognize topology of data. In this review paper, we survey various topological distance and loss functions from topological data analysis (TDA) and persistent homology that can be used in brain network analysis more effectively. Although there are many recent brain imaging studies that are based on TDA methods, possibly due to the lack of method awareness, TDA has not taken as the mainstream tool in brain imaging field yet. The main purpose of this paper is provide the relevant technical survey of these powerful tools that are immediately applicable to brain network data.
  64. A Topological Perspective on Regimes in Dynamical Systems (2021)

    Kristian Strommen, Matthew Chantry, Joshua Dorrington, Nina Otter
    Abstract The existence and behaviour of so-called `regimes' has been extensively studied in dynamical systems ranging from simple toy models to the atmosphere itself, due to their potential of drastically simplifying complex and chaotic dynamics. Nevertheless, no agreed-upon and clear-cut definition of a `regime' or a `regime system' exists in the literature. We argue here for a definition which equates the existence of regimes in a system with the existence of non-trivial topological structure. We show, using persistent homology, a tool in topological data analysis, that this definition is both computationally tractable, practically informative, and accounts for a variety of different examples. We further show that alternative, more strict definitions based on clustering and/or temporal persistence criteria fail to account for one or more examples of dynamical systems typically thought of as having regimes. We finally discuss how our methodology can shed light on regime behaviour in the atmosphere, and discuss future prospects.
  65. Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology (2021)

    David Pérez Fernández, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas
    Abstract Artificial Neural Networks (ANNs) are widely used for approximating complex functions. The process that is usually followed to define the most appropriate architecture for an ANN given a specific function is mostly empirical. Once this architecture has been defined, weights are usually optimized according to the error function. On the other hand, we observe that ANNs can be represented as graphs and their topological 'fingerprints' can be obtained using Persistent Homology (PH). In this paper, we describe a proposal focused on designing more principled architecture search procedures. To do this, different architectures for solving problems related to a heterogeneous set of datasets have been analyzed. The results of the evaluation corroborate that PH effectively characterizes the ANN invariants: when ANN density (layers and neurons) or sample feeding order is the only difference, PH topological invariants appear; in the opposite direction in different sub-problems (i.e. different labels), PH varies. This approach based on topological analysis helps towards the goal of designing more principled architecture search procedures and having a better understanding of ANNs.
  66. Coexistence Holes Characterize the Assembly and Disassembly of Multispecies Systems (2021)

    Marco Tulio Angulo, Aaron Kelley, Luis Montejano, Chuliang Song, Serguei Saavedra
    Abstract A central goal of ecological research has been to understand the limits on the maximum number of species that can coexist under given constraints. However, we know little about the assembly and disassembly processes under which a community can reach such a maximum number, or whether this number is in fact attainable in practice. This limitation is partly due to the challenge of performing experimental work and partly due to the lack of a formalism under which one can systematically study such processes. Here, we introduce a formalism based on algebraic topology and homology theory to study the space of species coexistence formed by a given pool of species. We show that this space is characterized by ubiquitous discontinuities that we call coexistence holes (that is, empty spaces surrounded by filled space). Using theoretical and experimental systems, we provide direct evidence showing that these coexistence holes do not occur arbitrarily—their diversity is constrained by the internal structure of species interactions and their frequency can be explained by the external factors acting on these systems. Our work suggests that the assembly and disassembly of ecological systems is a discontinuous process that tends to obey regularities.
  67. Topology Identifies Emerging Adaptive Mutations in SARS-CoV-2 (2021)

    Michael Bleher, Lukas Hahn, Juan Angel Patino-Galindo, Mathieu Carriere, Ulrich Bauer, Raul Rabadan, Andreas Ott
    Abstract The COVID-19 pandemic has lead to a worldwide effort to characterize its evolution through the mapping of mutations in the genome of the coronavirus SARS-CoV-2. Ideally, one would like to quickly identify new mutations that could confer adaptive advantages (e.g. higher infectivity or immune evasion) by leveraging the large number of genomes. One way of identifying adaptive mutations is by looking at convergent mutations, mutations in the same genomic position that occur independently. However, the large number of currently available genomes precludes the efficient use of phylogeny-based techniques. Here, we establish a fast and scalable Topological Data Analysis approach for the early warning and surveillance of emerging adaptive mutations based on persistent homology. It identifies convergent events merely by their topological footprint and thus overcomes limitations of current phylogenetic inference techniques. This allows for an unbiased and rapid analysis of large viral datasets. We introduce a new topological measure for convergent evolution and apply it to the GISAID dataset as of February 2021, comprising 303,651 high-quality SARS-CoV-2 isolates collected since the beginning of the pandemic. We find that topologically salient mutations on the receptor-binding domain appear in several variants of concern and are linked with an increase in infectivity and immune escape, and for many adaptive mutations the topological signal precedes an increase in prevalence. We show that our method effectively identifies emerging adaptive mutations at an early stage. By localizing topological signals in the dataset, we extract geo-temporal information about the early occurrence of emerging adaptive mutations. The identification of these mutations can help to develop an alert system to monitor mutations of concern and guide experimentalists to focus the study of specific circulating variants.
  68. Topological Graph Neural Networks (2021)

    Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt
    Abstract Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data.
  69. Euler Characteristic Surfaces (2021)

    Gabriele Beltramo, Rayna Andreeva, Ylenia Giarratano, Miguel O. Bernabeu, Rik Sarkar, Primoz Skraba
    Abstract We study the use of the Euler characteristic for multiparameter topological data analysis. Euler characteristic is a classical, well-understood topological invariant that has appeared in numerous applications, including in the context of random fields. The goal of this paper is to present the extension of using the Euler characteristic in higher-dimensional parameter spaces. While topological data analysis of higher-dimensional parameter spaces using stronger invariants such as homology continues to be the subject of intense research, Euler characteristic is more manageable theoretically and computationally, and this analysis can be seen as an important intermediary step in multi-parameter topological data analysis. We show the usefulness of the techniques using artificially generated examples, and a real-world application of detecting diabetic retinopathy in retinal images.
  70. Topological Data Analysis: Concepts, Computation, and Applications in Chemical Engineering (2021)

    Alexander D. Smith, Paweł Dłotko, Victor M. Zavala
    Abstract A primary hypothesis that drives scientific and engineering studies is that data has structure. The dominant paradigms for describing such structure are statistics (e.g., moments, correlation functions) and signal processing (e.g., convolutional neural nets, Fourier series). Topological Data Analysis (TDA) is a field of mathematics that analyzes data from a fundamentally different perspective. TDA represents datasets as geometric objects and provides dimensionality reduction techniques that project such objects onto low-dimensional descriptors. The key properties of these descriptors (also known as topological features) are that they provide multiscale information and that they are stable under perturbations (e.g., noise, translation, and rotation). In this work, we review the key mathematical concepts and methods of TDA and present different applications in chemical engineering.
  71. Topological Detection of Alzheimer’s Disease Using Betti Curves (2021)

    Ameer Saadat-Yazdi, Rayna Andreeva, Rik Sarkar
    Abstract Alzheimer’s disease is a debilitating disease in the elderly, and is an increasing burden to the society due to an aging population. In this paper, we apply topological data analysis to structural MRI scans of the brain, and show that topological invariants make accurate predictors for Alzheimer’s. Using the construct of Betti Curves, we first show that topology is a good predictor of Age. Then we develop an approach to factor out the topological signature of age from Betti curves, and thus obtain accurate detection of Alzheimer’s disease. Experimental results show that topological features used with standard classifiers perform comparably to recently developed convolutional neural networks. These results imply that topology is a major aspect of structural changes due to aging and Alzheimer’s. We expect this relation will generate further insights for both early detection and better understanding of the disease.
  72. From Topological Analyses to Functional Modeling: The Case of Hippocampus (2021)

    Yuri Dabaghian
    Abstract Topological data analyses are widely used for describing and conceptualizing large volumes of neurobiological data, e.g., for quantifying spiking outputs of large neuronal ensembles and thus understanding the functions of the corresponding networks. Below we discuss an approach in which convergent topological analyses produce insights into how information may be processed in mammalian hippocampus—a brain part that plays a key role in learning and memory. The resulting functional model provides a unifying framework for integrating spiking data at different timescales and following the course of spatial learning at different levels of spatiotemporal granularity. This approach allows accounting for contributions from various physiological phenomena into spatial cognition—the neuronal spiking statistics, the effects of spiking synchronization by different brain waves, the roles played by synaptic efficacies and so forth. In particular, it is possible to demonstrate that networks with plastic and transient synaptic architectures can encode stable cognitive maps, revealing the characteristic timescales of memory processing.
  73. TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through Topology-Based Machine Learning (2021)

    Parker Edwards, Kristen Skruber, Nikola Milićević, James B. Heidings, Tracy-Ann Read, Peter Bubenik, Eric A. Vitriol
    Abstract Recent advances in machine learning have greatly enhanced automatic methods to extract information from fluorescence microscopy data. However, current machine-learning-based models can require hundreds to thousands of images to train, and the most readily accessible models classify images without describing which parts of an image contributed to classification. Here, we introduce TDAExplore, a machine learning image analysis pipeline based on topological data analysis. It can classify different types of cellular perturbations after training with only 20–30 high-resolution images and performs robustly on images from multiple subjects and microscopy modes. Using only images and whole-image labels for training, TDAExplore provides quantitative, spatial information, characterizing which image regions contribute to classification. Computational requirements to train TDAExplore models are modest and a standard PC can perform training with minimal user input. TDAExplore is therefore an accessible, powerful option for obtaining quantitative information about imaging data in a wide variety of applications.
  74. TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images (2021)

    Mustafa Hajij, Ghada Zamzmi, Fawwaz Batayneh
    Abstract Topological Data Analysis (TDA) has emerged recently as a robust tool to extract and compare the structure of datasets. TDA identifies features in data (e.g., connected components and holes) and assigns a quantitative measure to these features. Several studies reported that topological features extracted by TDA tools provide unique information about the data, discover new insights, and determine which feature is more related to the outcome. On the other hand, the overwhelming success of deep neural networks in learning patterns and relationships has been proven on various data applications including images. To capture the characteristics of both worlds, we propose TDA-Net, a novel ensemble network that fuses topological and deep features for the purpose of enhancing model generalizability and accuracy. We apply the proposed TDA-Net to a critical application, which is the automated detection of COVID-19 from CXR images. Experimental results showed that the proposed network achieved excellent performance and suggested the applicability of our method in practice.
  75. Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

    John T. Nardini, Bernadette J. Stolz, Kevin B. Flores, Heather A. Harrington, Helen M. Byrne
    Abstract Angiogenesis is the process by which blood vessels form from pre-existing vessels. It plays a key role in many biological processes, including embryonic development and wound healing, and contributes to many diseases including cancer and rheumatoid arthritis. The structure of the resulting vessel networks determines their ability to deliver nutrients and remove waste products from biological tissues. Here we simulate the Anderson-Chaplain model of angiogenesis at different parameter values and quantify the vessel architectures of the resulting synthetic data. Specifically, we propose a topological data analysis (TDA) pipeline for systematic analysis of the model. TDA is a vibrant and relatively new field of computational mathematics for studying the shape of data. We compute topological and standard descriptors of model simulations generated by different parameter values. We show that TDA of model simulation data stratifies parameter space into regions with similar vessel morphology. The methodologies proposed here are widely applicable to other synthetic and experimental data including wound healing, development, and plant biology.
  76. Inferring COVID-19 Biological Pathways From Clinical Phenotypes via Topological Analysis (2021)

    Negin Karisani, Daniel E. Platt, Saugata Basu, Laxmi Parida
    Abstract COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this process in a timely manner. However, medical records are often unstructured clinical notes, and this poses significant challenges to developing the automated systems. In this article, we propose a pipeline to aid practitioners in analyzing clinical notes and revealing the pathways associated with this disease. Our pipeline relies on topological properties and consists of three steps: 1) pre-processing the clinical notes to extract the salient concepts, 2) constructing a feature space of the patients to characterize the extracted concepts, and finally, 3) leveraging the topological properties to distill the available knowledge and visualize the result. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.
  77. The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)

    Salvador Chulián, Bernadette J. Stolz, Álvaro Martínez-Rubio, Cristina Blázquez Goñi, Juan F. Rodríguez Gutiérrez, Teresa Caballero Velázquez, Águeda Molinos Quintana, Manuel Ramírez Orellana, Ana Castillo Robleda, José Luis Fuster Soler, Alfredo Minguela Puras, María Victoria Martínez Sánchez, María Rosa, Víctor M. Pérez-García, Helen Byrne
    Abstract Acute Lymphoblastic Leukaemia (ALL) is the most frequent paediatric cancer. Modern therapies have improved survival rates, but approximately 15-20 % of patients relapse. At present, patients’ risk of relapse are assessed by projecting high-dimensional flow cytometry data onto a subset of biomarkers and manually estimating the shape of this reduced data. Here, we apply methods from topological data analysis (TDA), which quantify shape in data via features such as connected components and loops, to pre-treatment ALL datasets with known outcomes. We combine these fully unsupervised analyses with machine learning to identify features in the pre-treatment data that are prognostic for risk of relapse. We find significant topological differences between relapsing and non-relapsing patients and confirm the predictive power of CD10, CD20, CD38, and CD45. Further, we are able to use the TDA descriptors to predict patients who relapsed. We propose three prognostic pipelines that readily extend to other haematological malignancies. Teaser Topology reveals features in flow cytometry data which predict relapse of patients with acute lymphoblastic leukemia
  78. Persistent Homology of the Cosmic Web. I: Hierarchical Topology in \$\Lambda\$CDM Cosmologies (2021)

    Georg Wilding, Keimpe Nevenzeel, Rien van de Weygaert, Gert Vegter, Pratyush Pranav, Bernard J. T. Jones, Konstantinos Efstathiou, Job Feldbrugge
    Abstract Using a set of \$\Lambda\$CDM simulations of cosmic structure formation, we study the evolving connectivity and changing topological structure of the cosmic web using state-of-the-art tools of multiscale topological data analysis (TDA). We follow the development of the cosmic web topology in terms of the evolution of Betti number curves and feature persistence diagrams of the three (topological) classes of structural features: matter concentrations, filaments and tunnels, and voids. The Betti curves specify the prominence of features as a function of density level, and their evolution with cosmic epoch reflects the changing network connections between these structural features. The persistence diagrams quantify the longevity and stability of topological features. In this study we establish, for the first time, the link between persistence diagrams, the features they show, and the gravitationally driven cosmic structure formation process. By following the diagrams' development over cosmic time, the link between the multiscale topology of the cosmic web and the hierarchical buildup of cosmic structure is established. The sharp apexes in the diagrams are intimately related to key transitions in the structure formation process. The apex in the matter concentration diagrams coincides with the density level at which, typically, they detach from the Hubble expansion and begin to collapse. At that level many individual islands merge to form the network of the cosmic web and a large number of filaments and tunnels emerge to establish its connecting bridges. The location trends of the apex possess a self-similar character that can be related to the cosmic web's hierarchical buildup. We find that persistence diagrams provide a significantly higher and more profound level of information on the structure formation process than more global summary statistics like Euler characteristic or Betti numbers.
  79. Development of the Functional Connectome Topology in Adolescence: Evidence From Topological Data Analysis (2021)

    Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Martha Beatriz Moreno, Fernando A. Barrios, Sarael Alcauter
    Abstract Adolescence is a crucial developmental period in terms of behavior and mental health. Therefore, understanding how the brain develops during this stage is a fundamental challenge for neuroscience. Recent studies have modelled the brain as a network or connectome, mainly applying measures from graph theory, showing a change in its functional organization such as an increase in its segregation and integration. Topological Data Analysis (TDA) complements such modelling by extracting high-dimensional features across the whole range of connectivity values, instead of exploring a fixed set of connections. This study enquiries into the developmental trajectories of such properties using a longitudinal sample of typically developing participants (N = 98; 53/45 F/M; 6.7-18.1 years), applying TDA into their functional connectomes. In addition, we explore the effect of puberty on the individual developmental trajectories. Results showed that compared to random networks, the adolescent brain is more segregated at the global level, but more densely connected at the local level. Furthermore, developmental effects showed nonlinear trajectories for the integration of the whole brain and fronto-parietal networks, with an inflection point and increasing trajectories after puberty onset. These results add to the insights in the development of the functional organization of the adolescent. Significance Statement Topological Data Analysis may be used to explore the topology of the brain along the whole range of connectivity values instead of selecting only a fixed set of connectivity thresholds. Here, we explored some properties of the topology of the brain functional connectome, and how they develop in adolescence. First, we show that developmental trajectories are nonlinear and better explained by the puberty status than chronological age, with an inflection point around the puberty onset. The greatest effect is the increase in functional integration for the whole brain, and particularly for the Fronto-Parietal Network when exploring functional subnetworks.
  80. Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)

    Nicole Bussola, Bruno Papa, Ombretta Melaiu, Aurora Castellano, Doriana Fruci, Giuseppe Jurman
    Abstract We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the U-Net model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform.
  81. Hierarchical Clustering and Zeroth Persistent Homology (2020)

    İsmail Güzel, Atabey Kaygun
    Abstract In this article, we show that hierarchical clustering and the zeroth persistent homology do deliver the same topological information about a given data set. We show this fact using cophenetic matrices constructed out of the filtered Vietoris-Rips complex of the data set at hand. As in any cophenetic matrix, one can also display the inter-relations of zeroth homology classes via a rooted tree, also known as a dendogram. Since homological cophenetic matrices can be calculated for higher homologies, one can also sketch similar dendograms for higher persistent homology classes.
  82. Path Homology as a Stronger Analogue of Cyclomatic Complexity (2020)

    Steve Huntsman
    Abstract Cyclomatic complexity is an incompletely specified but mathematically principled software metric that can be usefully applied to both source and binary code. We consider the application of path homology as a stronger analogue of cyclomatic complexity. We have implemented an algorithm to compute path homology in arbitrary dimension and applied it to several classes of relevant flow graphs, including randomly generated flow graphs representing structured and unstructured control flow. We also compared path homology and cyclomatic complexity on a set of disassembled binaries obtained from the grep utility. There exist control flow graphs realizable at the assembly level with nontrivial path homology in arbitrary dimension. We exhibit several classes of examples in this vein while also experimentally demonstrating that path homology gives identicial results to cyclomatic complexity for at least one detailed notion of structured control flow. We also experimentally demonstrate that the two notions differ on disassembled binaries, and we highlight an example of extreme disagreement. Path homology empirically generalizes cyclomatic complexity for an elementary notion of structured code and appears to identify more structurally relevant features of control flow in general. Path homology therefore has the potential to substantially improve upon cyclomatic complexity.
  83. Cubical Ripser: Software for Computing Persistent Homology of Image and Volume Data (2020)

    Shizuo Kaji, Takeki Sudo, Kazushi Ahara
    Abstract We introduce Cubical Ripser for computing persistent homology of image and volume data. To our best knowledge, Cubical Ripser is currently the fastest and the most memory-efficient program for computing persistent homology of image and volume data. We demonstrate our software with an example of image analysis in which persistent homology and convolutional neural networks are successfully combined. Our open source implementation is available at [14].
  84. Weighted Persistent Homology for Osmolyte Molecular Aggregation and Hydrogen-Bonding Network Analysis (2020)

    D. Vijay Anand, Zhenyu Meng, Kelin Xia, Yuguang Mu
    Abstract It has long been observed that trimethylamine N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work on these two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). Boltzmann persistent entropy (BPE) is proposed to quantitatively characterize the topological features from LPH and IPH, together with persistent Betti number (PBN). More specifically, from the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO is found to exhibit a local network structure. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 Å and a few global circle elements at around 12 Å. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function. Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4 Å, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5 Å to 10 Å. These differences are also reflected in the PBNs and BPEs of the local-scale IPH. These localized topological information has never been revealed before. Since graphs can be transferred into simplicial complexes by the clique complex, our weighted persistent homology models can be used in the analysis of various networks and graphs from any molecular structures and aggregation systems.
  85. Weighted Persistent Homology for Biomolecular Data Analysis (2020)

    Zhenyu Meng, D. Vijay Anand, Yunpeng Lu, Jie Wu, Kelin Xia
    Abstract In this paper, we systematically review weighted persistent homology (WPH) models and their applications in biomolecular data analysis. Essentially, the weight value, which reflects physical, chemical and biological properties, can be assigned to vertices (atom centers), edges (bonds), or higher order simplexes (cluster of atoms), depending on the biomolecular structure, function, and dynamics properties. Further, we propose the first localized weighted persistent homology (LWPH). Inspired by the great success of element specific persistent homology (ESPH), we do not treat biomolecules as an inseparable system like all previous weighted models, instead we decompose them into a series of local domains, which may be overlapped with each other. The general persistent homology or weighted persistent homology analysis is then applied on each of these local domains. In this way, functional properties, that are embedded in local structures, can be revealed. Our model has been applied to systematically study DNA structures. It has been found that our LWPH based features can be used to successfully discriminate the A-, B-, and Z-types of DNA. More importantly, our LWPH based principal component analysis (PCA) model can identify two configurational states of DNA structures in ion liquid environment, which can be revealed only by the complicated helical coordinate system. The great consistence with the helical-coordinate model demonstrates that our model captures local structure variations so well that it is comparable with geometric models. Moreover, geometric measurements are usually defined in local regions. For instance, the helical-coordinate system is limited to one or two basepairs. However, our LWPH can quantitatively characterize structure information in regions or domains with arbitrary sizes and shapes, where traditional geometrical measurements fail.
  86. Weighted-Persistent-Homology-Based Machine Learning for RNA Flexibility Analysis (2020)

    Chi Seng Pun, Brandon Yung Sin Yong, Kelin Xia
    Abstract With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.
  87. Representations of Energy Landscapes by Sublevelset Persistent Homology: An Example With N-Alkanes (2020)

    Joshua Mirth, Yanqin Zhai, Johnathan Bush, Enrique G. Alvarado, Howie Jordan, Mark Heim, Bala Krishnamoorthy, Markus Pflaum, Aurora Clark, Y. Z, Henry Adams
    Abstract Encoding the complex features of an energy landscape is a challenging task, and often chemists pursue the most salient features (minima and barriers) along a highly reduced space, i.e. 2- or 3-dimensions. Even though disconnectivity graphs or merge trees summarize the connectivity of the local minima of an energy landscape via the lowest-barrier pathways, there is more information to be gained by also considering the topology of each connected component at different energy thresholds (or sublevelsets). We propose sublevelset persistent homology as an appropriate tool for this purpose. Our computations on the configuration phase space of n-alkanes from butane to octane allow us to conjecture, and then prove, a complete characterization of the sublevelset persistent homology of the alkane \$C_m H_\2m+2\\$ potential energy landscapes, for all \$m\$, and in all homological dimensions. We further compare both the analytical configurational potential energy landscapes and sampled data from molecular dynamics simulation, using the united and all-atom descriptions of the intramolecular interactions. In turn, this supports the application of distance metrics to quantify sampling fidelity and lays the foundation for future work regarding new metrics that quantify differences between the topological features of high-dimensional energy landscapes.
  88. Interpretable Phase Detection and Classification With Persistent Homology (2020)

    Alex Cole, Gregory J. Loges, Gary Shiu
    Abstract We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for conducting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortex-antivortex structure are identified as relevant features for characterizing phase transitions.
  89. Atom-Specific Persistent Homology and Its Application to Protein Flexibility Analysis (2020)

    David Bramer, Guo-Wei Wei
    Abstract Recently, persistent homology has had tremendous success in biomolecular data analysis. It works by examining the topological relationship or connectivity of a group of atoms in a molecule at a variety of scales, then rendering a family of topological representations of the molecule. However, persistent homology is rarely employed for the analysis of atomic properties, such as biomolecular flexibility analysis or B-factor prediction. This work introduces atom-specific persistent homology to provide a local atomic level representation of a molecule via a global topological tool. This is achieved through the construction of a pair of conjugated sets of atoms and corresponding conjugated simplicial complexes, as well as conjugated topological spaces. The difference between the topological invariants of the pair of conjugated sets is measured by Bottleneck and Wasserstein metrics and leads to an atom-specific topological representation of individual atomic properties in a molecule. Atom-specific topological features are integrated with various machine learning algorithms, including gradient boosting trees and convolutional neural network for protein thermal fluctuation analysis and B-factor prediction. Extensive numerical results indicate the proposed method provides a powerful topological tool for analyzing and predicting localized information in complex macromolecules.
  90. Protein-Folding Analysis Using Features Obtained by Persistent Homology (2020)

    Takashi Ichinomiya, Ippei Obayashi, Yasuaki Hiraoka
    Abstract Understanding the protein-folding process is an outstanding issue in biophysics; recent developments in molecular dynamics simulation have provided insights into this phenomenon. However, the large freedom of atomic motion hinders the understanding of this process. In this study, we applied persistent homology, an emerging method to analyze topological features in a data set, to reveal protein-folding dynamics. We developed a new, to our knowledge, method to characterize the protein structure based on persistent homology and applied this method to molecular dynamics simulations of chignolin. Using principle component analysis or nonnegative matrix factorization, our analysis method revealed two stable states and one saddle state, corresponding to the native, misfolded, and transition states, respectively. We also identified an unfolded state with slow dynamics in the reduced space. Our method serves as a promising tool to understand the protein-folding process.
  91. Finding Universal Structures in Quantum Many-Body Dynamics via Persistent Homology (2020)

    Daniel Spitz, Jürgen Berges, Markus K. Oberthaler, Anna Wienhard
    Abstract Inspired by topological data analysis techniques, we introduce persistent homology observables and apply them in a geometric analysis of the dynamics of quantum field theories. As a prototype application, we consider simulated data of a two-dimensional Bose gas far from equilibrium. We discover a continuous spectrum of dynamical scaling exponents, which provides a refined classification of nonequilibrium universal phenomena. A possible explanation of the underlying processes is provided in terms of mixing wave turbulence and vortex kinetics components in point clouds. We find that the persistent homology scaling exponents are inherently linked to the geometry of the system, as the derivation of a packing relation reveals. The approach opens new ways of analyzing quantum many-body dynamics in terms of robust topological structures beyond standard field theoretic techniques.
  92. Persistent Homology Advances Interpretable Machine Learning for Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Joseph Montoya, Jens Hummelshøj, Dmitriy Morozov
    Abstract Machine learning for nanoporous materials design and discovery has emerged as a promising alternative to more time-consuming experiments and simulations. The challenge with this approach is the selection of features that enable universal and interpretable materials representations across multiple prediction tasks. We use persistent homology to construct holistic representations of the materials structure. We show that these representations can also be augmented with other generic features such as word embeddings from natural language processing to capture chemical information. We demonstrate our approach on multiple metal-organic framework datasets by predicting a variety of gas adsorption targets. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from commonly used manually curated features. Persistent homology features allow us to locate the pores that correlate best to adsorption at different pressures, contributing to understanding atomic level structure-property relationships for materials design.
  93. Graph Filtration Learning (2020)

    Christoph Hofer, Florian Graf, Bastian Rieck, Marc Niethammer, Roland Kwitt
    Abstract We propose an approach to learning with graph-structured data in the problem domain of graph classification. In particular, we present a novel type of readout operation to aggregate node features into a graph-level representation. To this end, we leverage persistent homology computed via a real-valued, learnable, filter function. We establish the theoretical foundation for differentiating through the persistent homology computation. Empirically, we show that this type of readout operation compares favorably to previous techniques, especially when the graph connectivity structure is informative for the learning problem.
  94. Can Neural Networks Learn Persistent Homology Features? (2020)

    Guido Montúfar, Nina Otter, Yuguang Wang
    Abstract Topological data analysis uses tools from topology -- the mathematical area that studies shapes -- to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.
  95. Using Zigzag Persistent Homology to Detect Hopf Bifurcations in Dynamical Systems (2020)

    Sarah Tymochko, Elizabeth Munch, Firas A. Khasawneh
    Abstract Bifurcations in dynamical systems characterize qualitative changes in the system behavior. Therefore, their detection is important because they can signal the transition from normal system operation to imminent failure. While standard persistent homology has been used in this setting, it usually requires analyzing a collection of persistence diagrams, which in turn drives up the computational cost considerably. Using zigzag persistence, we can capture topological changes in the state space of the dynamical system in only one persistence diagram. Here we present Bifurcations using ZigZag (BuZZ), a one-step method to study and detect bifurcations using zigzag persistence. The BuZZ method is successfully able to detect this type of behavior in two synthetic examples as well as an example dynamical system.
  96. Complexes of Tournaments, Directionality Filtrations and Persistent Homology (2020)

    Dejan Govc, Ran Levi, Jason P. Smith
    Abstract Complete digraphs are referred to in the combinatorics literature as tournaments. We consider a family of semi-simplicial complexes, that we refer to as "tournaplexes", whose simplices are tournaments. In particular, given a digraph \$\mathcal\G\\$, we associate with it a "flag tournaplex" which is a tournaplex containing the directed flag complex of \$\mathcal\G\\$, but also the geometric realisation of cliques that are not directed. We define several types of filtrations on tournaplexes, and exploiting persistent homology, we observe that flag tournaplexes provide finer means of distinguishing graph dynamics than the directed flag complex. We then demonstrate the power of these ideas by applying them to graph data arising from the Blue Brain Project's digital reconstruction of a rat's neocortex.
  97. Model Comparison via Simplicial Complexes and Persistent Homology (2020)

    Sean T. Vittadello, Michael P. H. Stumpf
    Abstract In many scientific and technological contexts we have only a poor understanding of the structure and details of appropriate mathematical models. We often need to compare different models. With available data we can use formal statistical model selection to compare and contrast the ability of different mathematical models to describe such data. But there is a lack of rigorous methods to compare different models \emph\a priori\. Here we develop and illustrate two such approaches that allow us to compare model structures in a systematic way. Using well-developed and understood concepts from simplicial geometry we are able to define a distance based on the persistent homology applied to the simplicial complexes that captures the model structure. In this way we can identify shared topological features of different models. We then expand this, and move from a distance between simplicial complexes to studying equivalences between models in order to determine their functional relatedness.
  98. Topological Data Analysis on Simple English Wikipedia Articles (2020)

    Matthew Wright, Xiaojun Zheng
    Abstract Single-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems, with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology, and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. These approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. We demonstrate the applicability of our methods by distinguishing certain subsets of the Wikipedia data, and by comparison with random data. Results include insights into the construction of null distributions and stability of our methods with respect to noisy data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter.
  99. Quantitative and Interpretable Order Parameters for Phase Transitions From Persistent Homology (2020)

    Alex Cole, Gregory J. Loges, Gary Shiu
    Abstract We apply modern methods in computational topology to the task of discovering and characterizing phase transitions. As illustrations, we apply our method to four two-dimensional lattice spin models: the Ising, square ice, XY, and fully-frustrated XY models. In particular, we use persistent homology, which computes the births and deaths of individual topological features as a coarse-graining scale or sublevel threshold is increased, to summarize multiscale and high-point correlations in a spin configuration. We employ vector representations of this information called persistence images to formulate and perform the statistical task of distinguishing phases. For the models we consider, a simple logistic regression on these images is sufficient to identify the phase transition. Interpretable order parameters are then read from the weights of the regression. This method suffices to identify magnetization, frustration, and vortex-antivortex structure as relevant features for phase transitions in our models. We also define "persistence" critical exponents and study how they are related to those critical exponents usually considered.
  100. Evolutionary Homology on Coupled Dynamical Systems With Applications to Protein Flexibility Analysis (2020)

    Zixuan Cang, Elizabeth Munch, Guo-Wei Wei
    Abstract While the spatial topological persistence is naturally constructed from a radius-based filtration, it has hardly been derived from a temporal filtration. Most topological models are designed for the global topology of a given object as a whole. There is no method reported in the literature for the topology of an individual component in an object to the best of our knowledge. For many problems in science and engineering, the topology of an individual component is important for describing its properties. We propose evolutionary homology (EH) constructed via a time evolution-based filtration and topological persistence. Our approach couples a set of dynamical systems or chaotic oscillators by the interactions of a physical system, such as a macromolecule. The interactions are approximated by weighted graph Laplacians. Simplices, simplicial complexes, algebraic groups and topological persistence are defined on the coupled trajectories of the chaotic oscillators. The resulting EH gives rise to time-dependent topological invariants or evolutionary barcodes for an individual component of the physical system, revealing its topology-function relationship. In conjunction with Wasserstein metrics, the proposed EH is applied to protein flexibility analysis, an important problem in computational biophysics. Numerical results for the B-factor prediction of a benchmark set of 364 proteins indicate that the proposed EH outperforms all the other state-of-the-art methods in the field.
  101. Topological Autoencoders (2020)

    Michael Moor, Max Horn, Bastian Rieck, Karsten Borgwardt
    Abstract We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors.
  102. The Persistence of Large Scale Structures I: Primordial Non-Gaussianity (2020)

    Matteo Biagetti, Alex Cole, Gary Shiu
    Abstract We develop an analysis pipeline for characterizing the topology of large scale structure and extracting cosmological constraints based on persistent homology. Persistent homology is a technique from topological data analysis that quantifies the multiscale topology of a data set, in our context unifying the contributions of clusters, filament loops, and cosmic voids to cosmological constraints. We describe how this method captures the imprint of primordial local non-Gaussianity on the late-time distribution of dark matter halos, using a set of N-body simulations as a proxy for real data analysis. For our best single statistic, running the pipeline on several cubic volumes of size \$40~(\rm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes. Additionally we test our ability to resolve degeneracies between the topological signature of \$f_\\rm NL\\textasciicircum\\rm loc\\$ and variation of \$\sigma_8\$ and argue that correctly identifying nonzero \$f_\\rm NL\\textasciicircum\\rm loc\\$ in this case is possible via an optimal template method. Our method relies on information living at \$\mathcal\O\(10)\$ Mpc/h, a complementary scale with respect to commonly used methods such as the scale-dependent bias in the halo/galaxy power spectrum. Therefore, while still requiring a large volume, our method does not require sampling long-wavelength modes to constrain primordial non-Gaussianity. Moreover, our statistics are interpretable: we are able to reproduce previous results in certain limits and we make new predictions for unexplored observables, such as filament loops formed by dark matter halos in a simulation box.
  103. Hypothesis Testing for Shapes Using Vectorized Persistence Diagrams (2020)

    Chul Moon, Nicole A. Lazar
    Abstract Topological data analysis involves the statistical characterization of the shape of data. Persistent homology is a primary tool of topological data analysis, which can be used to analyze those topological features and perform statistical inference. In this paper, we present a two-stage hypothesis test for vectorized persistence diagrams. The first stage filters elements in the vectorized persistence diagrams to reduce false positives. The second stage consists of multiple hypothesis tests, with false positives controlled by false discovery rates. We demonstrate applications of the proposed procedure on simulated point clouds and three-dimensional rock image data. Our results show that the proposed hypothesis tests can provide flexible and informative inferences on the shape of data with lower computational cost compared to the permutation test.
  104. A Novel Method of Extracting Topological Features From Word Embeddings (2020)

    Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
    Abstract In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.
  105. Localization in the Crowd With Topological Constraints (2020)

    Shahira Abousamra, Minh Hoai, Dimitris Samaras, Chao Chen
    Abstract We address the problem of crowd localization, i.e., the prediction of dots corresponding to people in a crowded scene. Due to various challenges, a localization method is prone to spatial semantic errors, i.e., predicting multiple dots within a same person or collapsing multiple dots in a cluttered region. We propose a topological approach targeting these semantic errors. We introduce a topological constraint that teaches the model to reason about the spatial arrangement of dots. To enforce this constraint, we define a persistence loss based on the theory of persistent homology. The loss compares the topographic landscape of the likelihood map and the topology of the ground truth. Topological reasoning improves the quality of the localization algorithm especially near cluttered regions. On multiple public benchmarks, our method outperforms previous localization methods. Additionally, we demonstrate the potential of our method in improving the performance in the crowd counting task.
  106. A Topological Data Analysis Approach On Predicting Phenotypes From Gene Expression Data (2020)

    Sayan Mandal, Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida
    Abstract The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis., We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods., This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
  107. Topological Echoes of Primordial Physics in the Universe at Large Scales (2020)

    Alex Cole, Matteo Biagetti, Gary Shiu
    Abstract We present a pipeline for characterizing and constraining initial conditions in cosmology via persistent homology. The cosmological observable of interest is the cosmic web of large scale structure, and the initial conditions in question are non-Gaussianities (NG) of primordial density perturbations. We compute persistence diagrams and derived statistics for simulations of dark matter halos with Gaussian and non-Gaussian initial conditions. For computational reasons and to make contact with experimental observations, our pipeline computes persistence in sub-boxes of full simulations and simulations are subsampled to uniform halo number. We use simulations with large NG (\$f_\\rm NL\\textasciicircum\\rm loc\=250\$) as templates for identifying data with mild NG (\$f_\\rm NL\\textasciicircum\\rm loc\=10\$), and running the pipeline on several cubic volumes of size \$40~(\textrm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes for our best single statistic. Throughout we benefit from the interpretability of topological features as input for statistical inference, which allows us to make contact with previous first-principles calculations and make new predictions.
  108. HERMES: Persistent Spectral Graph Software (2020)

    Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei Wei
    Abstract Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacians (PLs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.
  109. A Sheaf and Topology Approach to Generating Local Branch Numbers in Digital Images (2020)

    Chuan-Shen Hu, Yu-Min Chung
    Abstract This paper concerns a theoretical approach that combines topological data analysis (TDA) and sheaf theory. Topological data analysis, a rising field in mathematics and computer science, concerns the shape of the data and has been proven effective in many scientific disciplines. Sheaf theory, a mathematics subject in algebraic geometry, provides a framework for describing the local consistency in geometric objects. Persistent homology (PH) is one of the main driving forces in TDA, and the idea is to track changes of geometric objects at different scales. The persistence diagram (PD) summarizes the information of PH in the form of a multi-set. While PD provides useful information about the underlying objects, it lacks fine relations about the local consistency of specific pairs of generators in PD, such as the merging relation between two connected components in the PH. The sheaf structure provides a novel point of view for describing the merging relation of local objects in PH. It is the goal of this paper to establish a theoretic framework that utilizes the sheaf theory to uncover finer information from the PH. We also show that the proposed theory can be applied to identify the branch numbers of local objects in digital images.
  110. Dynamic State Analysis of a Driven Magnetic Pendulum Using Ordinal Partition Networks and Topological Data Analysis (2020)

    Audun Myers, Firas A. Khasawneh
    Abstract Abstract. The use of complex networks for time series analysis has recently shown to be useful as a tool for detecting dynamic state changes for a wide variety of applications. In this work, we implement the commonly used ordinal partition network to transform a time series into a network for detecting these state changes for the simple magnetic pendulum. The time series that we used are obtained experimentally from a base-excited magnetic pendulum apparatus, and numerically from the corresponding governing equations. The magnetic pendulum provides a relatively simple, non-linear example demonstrating transitions from periodic to chaotic motion with the variation of system parameters. For our method, we implement persistent homology, a shape measuring tool from Topological Data Analysis (TDA), to summarize the shape of the resulting ordinal partition networks as a tool for detecting state changes. We show that this network analysis tool provides a clear distinction between periodic and chaotic time series. Another contribution of this work is the successful application of the networks-TDA pipeline, for the first time, to signals from non-autonomous nonlinear systems. This opens the door for our approach to be used as an automatic design tool for studying the effect of design parameters on the resulting system response. Other uses of this approach include fault detection from sensor signals in a wide variety of engineering operations.
  111. The Growing Topology of the C. Elegans Connectome (2020)

    Alec Helm, Ann S. Blevins, Danielle S. Bassett
    Abstract Probing the developing neural circuitry in Caenorhabditis elegans has enhanced our understanding of nervous systems. The C. elegans connectome, like those of other species, is characterized by a rich club of densely connected neurons embedded within a small-world architecture. This organization of neuronal connections, captured by quantitative network statistics, provides insight into the system's capacity to perform integrative computations. Yet these network measures are limited in their ability to detect weakly connected motifs, such as topological cavities, that may support the systems capacity to perform segregated computations. We address this limitation by using persistent homology to track the evolution of topological cavities in the growing C. elegans connectome throughout neural development, and assess the degree to which the growing connectomes topology is resistant to biological noise. We show that the developing connectome topology is both relatively robust to changes in neuron birth times and not captured by similar growth models. Additionally, we quantify the consequence of a neurons specific birth time and ask if this metric tracks other biological properties of neurons. Our results suggest that the connectomes growing topology is a robust feature of the developing connectome that is distinct from other network properties, and that the growing topology is particularly sensitive to the exact birth times of a small set of predominantly motor neurons. By utilizing novel measurements that track biological features, we anticipate that our study will be helpful in the construction of more accurate models of neuronal development in C. elegans
  112. Topological Portraits of Multiscale Coordination Dynamics (2020)

    Mengsen Zhang, William D. Kalies, J. A. Scott Kelso, Emmanuelle Tognoli
    Abstract Living systems exhibit complex yet organized behavior on multiple spatiotemporal scales. To investigate the nature of multiscale coordination in living systems, one needs a meaningful and systematic way to quantify the complex dynamics, a challenge in both theoretical and empirical realms. The present work shows how integrating approaches from computational algebraic topology and dynamical systems may help us meet this challenge. In particular, we focus on the application of multiscale topological analysis to coordinated rhythmic processes. First, theoretical arguments are introduced as to why certain topological features and their scale-dependency are highly relevant to understanding complex collective dynamics. Second, we propose a method to capture such dynamically relevant topological information using persistent homology, which allows us to effectively construct a multiscale topological portrait of rhythmic coordination. Finally, the method is put to test in detecting transitions in real data from an experiment of rhythmic coordination in ensembles of interacting humans. The recurrence plots of topological portraits highlight collective transitions in coordination patterns that were elusive to more traditional methods. This sensitivity to collective transitions would be lost if the behavioral dynamics of individuals were treated as separate degrees of freedom instead of constituents of the topology that they collectively forge. Such multiscale topological portraits highlight collective aspects of coordination patterns that are irreducible to properties of individual parts. The present work demonstrates how the analysis of multiscale coordination dynamics can benefit from topological methods, thereby paving the way for further systematic quantification of complex, high-dimensional dynamics in living systems.
  113. Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

    Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny
    Abstract While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.
  114. Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution (2020)

    Michael Lesnick, Raúl Rabadán, Daniel I. S. Rosenbloom
    Abstract A topological approach to the study of genetic recombination, based on persistent homology, was introduced by Chan, Carlsson, and Rabadán in 2013. This associates a sequence of signatures called barcodes to genomic data sampled from an evolutionary history. In this paper, we develop theoretical foundations for this approach. First, we present a novel formulation of the underlying inference problem. Specifically, we introduce and study the novelty profile, a simple, stable statistic of an evolutionary history which not only counts recombination events but also quantifies how recombination creates genetic diversity. We propose that the (hitherto implicit) goal of the topological approach to recombination is the estimation of novelty profiles. We then study the problem of obtaining a lower bound on the novelty profile using barcodes. We focus on a low-recombination regime, where the evolutionary history can be described by a directed acyclic graph called a galled tree, which differs from a tree only by isolated topological defects. We show that in this regime, under a complete sampling assumption, the \$1\textasciicircum\mathrm\st\\$ barcode yields a lower bound on the novelty profile, and hence on the number of recombination events. For \$i\textgreater1\$, the \$i\textasciicircum\\mathrm\th\\\$ barcode is empty. In addition, we use a stability principle to strengthen these results to ones which hold for any subsample of an arbitrary evolutionary history. To establish these results, we describe the topology of the Vietoris--Rips filtrations arising from evolutionary histories indexed by galled trees. As a step towards a probabilistic theory, we also show that for a random history indexed by a fixed galled tree and satisfying biologically reasonable conditions, the intervals of the \$1\textasciicircum\\mathrm\st\\\$ barcode are independent random variables. Using simulations, we explore the sensitivity of these intervals to recombination.
  115. Topological Data Analysis Quantifies Biological Nano-Structure From Single Molecule Localization Microscopy (2020)

    Jeremy A. Pike, Abdullah O. Khan, Chiara Pallini, Steven G. Thomas, Markus Mund, Jonas Ries, Natalie S. Poulter, Iain B. Styles
    Abstract AbstractMotivation. Localization microscopy data is represented by a set of spatial coordinates, each corresponding to a single detection, that form a point cl
  116. Topological Data Analysis for Arrhythmia Detection Through Modular Neural Networks (2020)

    Meryll Dindin, Yuhei Umeda, Frederic Chazal
    Abstract This paper presents an innovative and generic deep learning approach to monitor heart conditions from ECG signals. We focus our attention on both the detection and classification of abnormal heartbeats, known as arrhythmia. We strongly insist on generalization throughout the construction of a shallow deep-learning model that turns out to be effective for new unseen patient. The novelty of our approach relies on the use of topological data analysis to deal with individual differences. We show that our structure reaches the performances of the state-of-the-art methods for both arrhythmia detection and classification.
  117. Topologically Densified Distributions (2020)

    Christoph Hofer, Florian Graf, Marc Niethammer, Roland Kwitt
    Abstract We study regularization in the context of small sample-size learning with over-parametrized neural networks. Specifically, we shift focus from architectural properties, such as norms on the network weights, to properties of the internal representations before a linear classifier. Specifically, we impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances, i.e., a property beneficial for generalization. By leveraging previous work to impose topological constrains in a neural network setting, we provide empirical evidence (across various vision benchmarks) to support our claim for better generalization.
  118. Classification of Skin Lesions by Topological Data Analysis Alongside With Neural Network (2020)

    Naiereh Elyasi, Mehdi Hosseini Moghadam
    Abstract In this paper we use TDA mapper alongside with deep convolutional neural networks in the classification of 7 major skin diseases. First we apply kepler mapper with neural network as one of its filter steps to classify the dataset HAM10000. Mapper visualizes the classification result by a simplicial complex, where neural network can not do this alone, but as a filter step neural network helps to classify data better. Furthermore we apply TDA mapper and persistent homology to understand the weights of layers of mobilenet network in different training epochs of HAM10000. Also we use persistent diagrams to visualize the results of analysis of layers of mobilenet network.
  119. Contagion Dynamics for Manifold Learning (2020)

    Barbara I. Mahler
    Abstract Contagion maps exploit activation times in threshold contagions to assign vectors in high-dimensional Euclidean space to the nodes of a network. A point cloud that is the image of a contagion map reflects both the structure underlying the network and the spreading behaviour of the contagion on it. Intuitively, such a point cloud exhibits features of the network's underlying structure if the contagion spreads along that structure, an observation which suggests contagion maps as a viable manifold-learning technique. We test contagion maps as a manifold-learning tool on a number of different real-world and synthetic data sets, and we compare their performance to that of Isomap, one of the most well-known manifold-learning algorithms. We find that, under certain conditions, contagion maps are able to reliably detect underlying manifold structure in noisy data, while Isomap fails due to noise-induced error. This consolidates contagion maps as a technique for manifold learning.
  120. Towards a Philological Metric Through a Topological Data Analysis Approach (2020)

    Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Miguel A. Gutiérrez-Naranjo
    Abstract The canon of the baroque Spanish literature has been thoroughly studied with philological techniques. The major representatives of the poetry of this epoch are Francisco de Quevedo and Luis de Góngora y Argote. They are commonly classified by the literary experts in two different streams: Quevedo belongs to the Conceptismo and G\ńgora to the Culteranismo. Besides, traditionally, even if Quevedo is considered the most representative of the Conceptismo, Lope de Vega is also considered to be, at least, closely related to this literary trend. In this paper, we use Topological Data Analysis techniques to provide a first approach to a metric distance between the literary style of these poets. As a consequence, we reach results that are under the literary experts' criteria, locating the literary style of Lope de Vega, closer to the one of Quevedo than to the one of G\'ǵora.
  121. Automatic Tree Ring Detection Using Jacobi Sets (2020)

    Kayla Makela, Tim Ophelders, Michelle Quigley, Elizabeth Munch, Daniel Chitwood, Asia Dowtin
    Abstract Tree ring widths are an important source of climatic and historical data, but measuring these widths typically requires extensive manual work. Computer vision techniques provide promising directions towards the automation of tree ring detection, but most automated methods still require a substantial amount of user interaction to obtain high accuracy. We perform analysis on 3D X-ray CT images of a cross-section of a tree trunk, known as a tree disk. We present novel automated methods for locating the pith (center) of a tree disk, and ring boundaries. Our methods use a combination of standard image processing techniques and tools from topological data analysis. We evaluate the efficacy of our method for two different CT scans by comparing its results to manually located rings and centers and show that it is better than current automatic methods in terms of correctly counting each ring and its location. Our methods have several parameters, which we optimize experimentally by minimizing edit distances to the manually obtained locations.
  122. Spatial Applications of Topological Data Analysis: Cities, Snowflakes, Random Structures, and Spiders Spinning Under the Influence (2020)

    Michelle Feng, Mason A. Porter
    Abstract Spatial networks are ubiquitous in social, geographic, physical, and biological applications. To understand their large-scale structure, it is important to develop methods that allow one to directly probe the effects of space on structure and dynamics. Historically, algebraic topology has provided one framework for rigorously and quantitatively describing the global structure of a space, and recent advances in topological data analysis (TDA) have given scholars a new lens for analyzing network data. In this paper, we study a variety of spatial networks --- including both synthetic and natural ones --- using novel topological methods that we recently developed specifically for analyzing spatial networks. We demonstrate that our methods are able to capture meaningful quantities, with specifics that depend on context, in spatial networks and thereby provide useful insights into the structure of those networks, including a novel approach for characterizing them based on their topological structures. We illustrate these ideas with examples of synthetic networks and dynamics on them, street networks in cities, snowflakes, and webs spun by spiders under the influence of various psychotropic substances.
  123. Topological Persistence Machine of Phase Transitions (2020)

    Quoc Hoan Tran, Mark Chen, Yoshihiko Hasegawa
    Abstract The study of phase transitions from experimental data becomes challenging, especially when little prior knowledge of the system is available. Topological data analysis is an emerging framework for characterizing the shape of data and has recently achieved success in detecting structural transitions in material science such as glass-liquid transition. However, data obtained from physical states may not have explicit shapes as structural materials. We propose a general framework called topological persistence machine to construct the shape of data from correlations in states; hence decipher phase transitions via the qualitative changes of the shape. Our framework enables an effective and unified approach in phase transition analysis. We demonstrate the impact in highly precise detection of Berezinskii-Kosterlitz-Thouless phase transitions in the classical XY model, and quantum phase transition in the transverse Ising model and Bose-Hubbard model. Intriguingly, these phase transitions have proven to be notoriously difficult in traditional methods but can be characterized in our framework without requiring prior knowledge about phases. Our approach is thus expected applicable and brings a remarkable perspective for exploring phases of experimental physical systems.
  124. PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction (2020)

    Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov, Katherine Yelick
    Abstract Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structure-based prediction schemes. We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own as well as a baseline neural network that learns from the same information. PersGNN achieves a 9.3% boost in area under the precision recall curve (AUPR) compared to the best individual model, as well as high F1 scores across different gene ontology categories, indicating the transferability of this approach.
  125. Topological Data Analysis of Zebrafish Patterns (2020)

    Melissa R. McGuirl, Alexandria Volkening, Björn Sandstede
    Abstract Self-organized pattern behavior is ubiquitous throughout nature, from fish schooling to collective cell dynamics during organism development. Qualitatively these patterns display impressive consistency, yet variability inevitably exists within pattern-forming systems on both microscopic and macroscopic scales. Quantifying variability and measuring pattern features can inform the underlying agent interactions and allow for predictive analyses. Nevertheless, current methods for analyzing patterns that arise from collective behavior capture only macroscopic features or rely on either manual inspection or smoothing algorithms that lose the underlying agent-based nature of the data. Here we introduce methods based on topological data analysis and interpretable machine learning for quantifying both agent-level features and global pattern attributes on a large scale. Because the zebrafish is a model organism for skin pattern formation, we focus specifically on analyzing its skin patterns as a means of illustrating our approach. Using a recent agent-based model, we simulate thousands of wild-type and mutant zebrafish patterns and apply our methodology to better understand pattern variability in zebrafish. Our methodology is able to quantify the differential impact of stochasticity in cell interactions on wild-type and mutant patterns, and we use our methods to predict stripe and spot statistics as a function of varying cellular communication. Our work provides an approach to automatically quantifying biological patterns and analyzing agent-based dynamics so that we can now answer critical questions in pattern formation at a much larger scale.
  126. The Emergence of Higher-Order Structure in Scientific and Technological Knowledge Networks (2020)

    Thomas Gebhart, Russell J. Funk
    Abstract The growth of science and technology is primarily a recombinative process, wherein new discoveries and inventions are generally built from prior knowledge. While the recent past has seen rapid growth in scientific and technological knowledge, relatively little is known about the manner in which science and technology develop and coalesce knowledge into larger structures that enable or constrain future breakthroughs. Network science has recently emerged as a framework for measuring the structure and dynamics of knowledge. While helpful, these existing approaches struggle to capture the global structural properties of the underlying networks, leading to conflicting observations about the nature of scientific and technological progress. We bridge this methodological gap using tools from algebraic topology to characterize the higher-order structure of knowledge networks in science and technology across scale. We observe rapid and varied growth in the high-dimensional structure in many fields of science and technology, and find this high-dimensional growth coincides with decline in lower-dimensional structure. This higher-order growth in knowledge networks has historically far outpaced the growth in scientific and technological collaboration networks. We also characterize the relationship between higher-order structure and the nature of the science and technology produced within these structural environments and find a positive relationship between the abstractness of language used within fields and increasing high-dimensional structure. We also find a robust relationship between high-dimensional structure and number of metrics for publication success, implying this high-dimensional structure may be linked to discovery and invention.
  127. Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov
    Abstract Machine learning has emerged as an attractive alternative to experiments and simulations for predicting material properties. Usually, such an approach relies on specific domain knowledge for feature design: each learning target requires careful selection of features that an expert recognizes as important for the specific task. The major drawback of this approach is that computation of only a few structural features has been implemented so far, and it is difficult to tell a priori which features are important for a particular application. The latter problem has been empirically observed for predictors of guest uptake in nanoporous materials: local and global porosity features become dominant descriptors at low and high pressures, respectively. We investigate a feature representation of materials using tools from topological data analysis. Specifically, we use persistent homology to describe the geometry of nanoporous materials at various scales. We combine our topological descriptor with traditional structural features and investigate the relative importance of each to the prediction tasks. We demonstrate an application of this feature representation by predicting methane adsorption in zeolites, for pressures in the range of 1-200 bar. Our results not only show a considerable improvement compared to the baseline, but they also highlight that topological features capture information complementary to the structural features: this is especially important for the adsorption at low pressure, a task particularly difficult for the traditional features. Furthermore, by investigation of the importance of individual topological features in the adsorption model, we are able to pinpoint the location of the pores that correlate best to adsorption at different pressure, contributing to our atom-level understanding of structure-property relationships.
  128. Topological Data Analysis Reveals Robust Alterations in the Whole-Brain and Frontal Lobe Functional Connectomes in Attention-Deficit/Hyperactivity Disorder (2020)

    Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Sarael Alcauter
    Abstract Visual Abstract \textlessimg class="highwire-fragment fragment-image" alt="Figure" src="https://www.eneuro.org/content/eneuro/7/3/ENEURO.0543-19.2020/F1.medium.gif" width="369" height="440"/\textgreaterDownload figureOpen in new tabDownload powerpoint Attention-deficit/hyperactivity disorder (ADHD) is a developmental disorder characterized by difficulty to control the own behavior. Neuroimaging studies have related ADHD with the interplay of fronto-parietal attention systems with the default mode network (DMN; Castellanos and Aoki, 2016). However, some results have been inconsistent, potentially due to methodological differences in the analytical strategies when defining the brain functional network, i.e., the functional connectivity threshold and/or the brain parcellation scheme. Here, we make use of topological data analysis (TDA) to explore the brain connectome as a function of the filtration value (i.e., the connectivity threshold), instead of using a static connectivity threshold. Specifically, we characterized the transition from all nodes being isolated to being connected into a single component as a function of the filtration value. We explored the utility of such a method to identify differences between 81 children with ADHD (45 male, age: 7.26–17.61 years old) and 96 typically developing children (TDC; 59 male, age: 7.17–17.96 years old), using a public dataset of resting state (rs)fMRI in human subjects. Results were highly congruent when using four different brain segmentations (atlases), and exhibited significant differences for the brain topology of children with ADHD, both at the whole-brain network and the functional subnetwork levels, particularly involving the frontal lobe and the DMN. Therefore, this is a solid approach that complements connectomics-related methods and may contribute to identify the neurophysio-pathology of ADHD.
  129. Fast and Accurate Tumor Segmentation of Histology Images Using Persistent Homology and Deep Convolutional Features (2019)

    Talha Qaiser, Yee-Wah Tsang, Daiki Taniyama, Naoya Sakamoto, Kazuaki Nakane, David Epstein, Nasir Rajpoot
    Abstract Tumor segmentation in whole-slide images of histology slides is an important step towards computer-assisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. We propose an efficient way of computing topological persistence of an image, alternative to simplicial homology. The PHPs are devised to distinguish tumor regions from their normal counterparts by modeling the atypical characteristics of tumor nuclei. We propose two variants of our method for tumor segmentation: one that targets speed without compromising accuracy and the other that targets higher accuracy. The fast version is based on a selection of exemplar image patches from a convolution neural network (CNN) and patch classification by quantifying the divergence between the PHPs of exemplars and the input image patch. Detailed comparative evaluation shows that the proposed algorithm is significantly faster than competing algorithms while achieving comparable results. The accurate version combines the PHPs and high-level CNN features and employs a multi-stage ensemble strategy for image patch labeling. Experimental results demonstrate that the combination of PHPs and CNN features outperform competing algorithms. This study is performed on two independently collected colorectal datasets containing adenoma, adenocarcinoma, signet, and healthy cases. Collectively, the accurate tumor segmentation produces the highest average patch-level F1-score, as compared with competing algorithms, on malignant and healthy cases from both the datasets. Overall the proposed framework highlights the utility of persistent homology for histopathology image analysis.
  130. Persistent Homology Analysis of Brain Transcriptome Data in Autism (2019)

    Daniel Shnier, Mircea A. Voineagu, Irina Voineagu
    Abstract Persistent homology methods have found applications in the analysis of multiple types of biological data, particularly imaging data or data with a spatial and/or temporal component. However, few studies have assessed the use of persistent homology for the analysis of gene expression data. Here we apply persistent homology methods to investigate the global properties of gene expression in post-mortem brain tissue (cerebral cortex) of individuals with autism spectrum disorders (ASD) and matched controls. We observe a significant difference in the geometry of inter-sample relationships between autism and healthy controls as measured by the sum of the death times of zero-dimensional components and the Euler characteristic. This observation is replicated across two distinct datasets, and we interpret it as evidence for an increased heterogeneity of gene expression in autism. We also assessed the topology of gene-level point clouds and did not observe significant differences between ASD and control transcriptomes, suggesting that the overall transcriptome organization is similar in ASD and healthy cerebral cortex. Overall, our study provides a novel framework for persistent homology analyses of gene expression data for genetically complex disorders.
  131. Tuning Cavitation and Crazing in Polymer Nanocomposite Glasses Containing Bimodal Grafted Nanoparticles at the Nanoparticle/Polymer Interface (2019)

    Rui Shi, Hu-Jun Qian, Zhong-Yuan Lu
    Abstract It is widely accepted that adding nanoparticles (NPs) into polymer matrices can dramatically alter the mechanical properties of the material, and that the properties at the NP/polymer interface play a vital role. By performing coarse-grained molecular dynamics simulations, we study the stress–strain behaviour of polymer/NP composites (PNCs) in a glassy state under a triaxial tensile deformation, in which the NPs are well dispersed in the system via bimodal grafting. A ‘HOMO’ system, in which the short grafted chains are chemically identical to the matrix polymer, and a ‘HETERO’ system, in which the short grafted chains interact weakly with the matrix, are investigated. Our simulations demonstrate that the HOMO system behaves very similarly to the pure polymer system, with quick cavitation and a drop in stress after the yielding point, corresponding to a craze deformation process. While in the HETERO system, weak interactions between the short grafts and the matrix polymer induce a low local modulus, therefore, rather homogeneous void formation and consequently a slower cavitation process are observed at the surface of the well dispersed NPs during the tensile deformation. As a result, the depletion effect at the NP surface eventually leads to NP re-assembly at large strains. Moreover, the HETERO system undergoes a shear-deformation-tended tensile process rather than the craze deformation found in the HOMO system. At the same time, the HETERO system is more ductile, with a much slower drop in stress after yielding than the HOMO system. In addition, the homogeneous generation of voids at small strain in the HETERO system can be utilized in the fabrication of polymer films with desirable separation abilities for gases or small molecules. We hope that these simulation results will be helpful for the property regulation of PNC materials containing polymer grafted NPs.
  132. Persistent Homology to Quantify the Quality of Surface-Supported Covalent Networks (2019)

    Abraham Gutierrez, Mickaël Buchet, Sylvain Clair
    Abstract Covalent networks formed by on-surface synthesis usually suffer from the presence of a large number of defects. We report on a methodology to characterize such two-dimensional networks from their experimental images obtained by scanning probe microscopy. The computation is based on a persistent homology approach and provides a quantitative score indicative of the network homogeneity. We compare our scoring method with results previously obtained using minimal spanning tree analyses and we apply it to some molecular systems appearing in the existing literature.
  133. Understanding Diffraction Patterns of Glassy, Liquid and Amorphous Materials via Persistent Homology Analyses (2019)

    Yohei Onodera, Shinji Kohara, Shuta Tahara, Atsunobu Masuno, Hiroyuki Inoue, Motoki Shiga, Akihiko Hirata, Koichi Tsuchiya, Yasuaki Hiraoka, Ippei Obayashi, Koji Ohara, Akitoshi Mizuno, Osami Sakata
    Abstract The structure of glassy, liquid, and amorphous materials is still not well understood, due to the insufficient structural information from diffraction data. In this article, attempts are made to understand the origin of diffraction peaks, particularly of the first sharp diffraction peak (FSDP, Q1), the principal peak (PP, Q2), and the third peak (Q3), observed in the measured diffraction patterns of disordered materials whose structure contains tetrahedral motifs. It is confirmed that the FSDP (Q1) is not a signature of the formation of a network, because an FSDP is observed in tetrahedral molecular liquids. It is found that the PP (Q2) reflects orientational correlations of tetrahedra. Q3, that can be observed in all disordered materials, even in common liquid metals, stems from simple pair correlations. Moreover, information on the topology of disordered materials was revealed by utilizing persistent homology analyses. The persistence diagram of silica (SiO2) glass suggests that the shape of rings in the glass is similar not only to those in the crystalline phase with comparable density (α-cristobalite), but also to rings present in crystalline phases with higher density (α-quartz and coesite); this is thought to be the signature of disorder. Furthermore, we have succeeded in revealing the differences, in terms of persistent homology, between tetrahedral networks and tetrahedral molecular liquids, and the difference/similarity between liquid and amorphous (glassy) states. Our series of analyses demonstrated that a combination of diffraction data and persistent homology analyses is a useful tool for allowing us to uncover structural features hidden in halo pattern of disordered materials.
  134. Chatter Diagnosis in Milling Using Supervised Learning and Topological Features Vector (2019)

    Melih C. Yesilli, Sarah Tymochko, Firas A. Khasawneh, Elizabeth Munch
    Abstract Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning algorithms to diagnose chatter in the milling process. Persistence diagrams, a method of representing topological features, are not easily used in the context of machine learning, so they must be transformed into a form that is more amenable. Specifically, we will focus on two different methods for featurizing persistence diagrams, Carlsson coordinates and template functions. In this paper, we provide classification results for simulated data from various cutting configurations, including upmilling and downmilling, in addition to the same data with some added noise. Our results show that Carlsson Coordinates and Template Functions yield accuracies as high as 96% and 95%, respectively. We also provide evidence that these topological methods are noise robust descriptors for chatter detection.
  135. Microscopic Description of Yielding in Glass Based on Persistent Homology (2019)

    Tatsuhiko Shirai, Takenobu Nakamura
    Abstract Persistent homology (PH) was applied to probe the structural changes of glasses under shear. PH associates each local atomistic structure in an atomistic configuration to a geometric object, namely, a hole, and evaluates the robustness of these holes against noise. We found that the microscopic structures were qualitatively different before and after yielding. The structures before yielding contained robust holes, the number of which decreased after yielding. We also observed that the structures after yielding approached those of quickly quenched glass. This work demonstrates the crucial role of robust holes in yielding and provides an interpretation based on geometry.
  136. Persistent Homology Machine Learning for Fingerprint Classification (2019)

    N. Giansiracusa, R. Giansiracusa, C. Moon
    Abstract The fingerprint classification problem is to sort fingerprints into predetermined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near state-of-the-art classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3-dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint ink-roll images, which yields a lower accuracy rate but still shows promise; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD-27.
  137. Topological Persistence for Relating Microstructure and Capillary Fluid Trapping in Sandstones (2019)

    A. L. Herring, V. Robins, A. P. Sheppard
    Abstract Results from a series of two-phase fluid flow experiments in Leopard, Berea, and Bentheimer sandstones are presented. Fluid configurations are characterized using laboratory-based and synchrotron based 3-D X-ray computed tomography. All flow experiments are conducted under capillary-dominated conditions. We conduct geometry-topology analysis via persistent homology and compare this to standard topological and watershed-partition-based pore-network statistics. Metrics identified as predictors of nonwetting fluid trapping are calculated from the different analytical methods and are compared to levels of trapping measured during drainage-imbibition cycles in the experiments. Metrics calculated from pore networks (i.e., pore body-throat aspect ratio and coordination number) and topological analysis (Euler characteristic) do not correlate well with trapping in these samples. In contrast, a new metric derived from the persistent homology analysis, which incorporates counts of topological features as well as their length scale and spatial distribution, correlates very well (R2 = 0.97) to trapping for all systems. This correlation encompasses a wide range of porous media and initial fluid configurations, and also applies to data sets of different imaging and image processing protocols.
  138. Ultrahigh-Pressure Form of \$\Mathrm\Si\\\mathrm\O\\_\2\\$ Glass With Dense Pyrite-Type Crystalline Homology (2019)

    M. Murakami, S. Kohara, N. Kitamura, J. Akola, H. Inoue, A. Hirata, Y. Hiraoka, Y. Onodera, I. Obayashi, J. Kalikka, N. Hirao, T. Musso, A. S. Foster, Y. Idemoto, O. Sakata, Y. Ohishi
    Abstract High-pressure synthesis of denser glass has been a longstanding interest in condensed-matter physics and materials science because of its potentially broad industrial application. Nevertheless, understanding its nature under extreme pressures has yet to be clarified due to experimental and theoretical challenges. Here we reveal the formation of OSi4 tetraclusters associated with that of SiO7 polyhedra in SiO2 glass under ultrahigh pressures to 200 gigapascal confirmed both experimentally and theoretically. Persistent homology analyses with molecular dynamics simulations found increased packing fraction of atoms whose topological diagram at ultrahigh pressures is similar to a pyrite-type crystalline phase, although the formation of tetraclusters is prohibited in the crystalline phase. This critical difference would be caused by the potential structural tolerance in the glass for distortion of oxygen clusters. Furthermore, an expanded electronic band gap demonstrates that chemical bonds survive at ultrahigh pressure. This opens up the synthesis of topologically disordered dense oxide glasses.
  139. Topology Highlights Mesoscopic Functional Equivalence Between Imagery and Perception: The Case of Hypnotizability (2019)

    Esther Ibáñez-Marcelo, Lisa Campioni, Angkoon Phinyomark, Giovanni Petri, Enrica L. Santarcangelo
    Abstract The functional equivalence (FE) between imagery and perception or motion has been proposed on the basis of neuroimaging evidence of large spatially overlapping activations between real and imagined sensori-motor conditions. However, similar local activation patterns do not imply the same mesoscopic integration of brain regions, which can be described by tools from Topological Data Analysis (TDA). On the basis of behavioral findings, stronger FE has been hypothesized in the individuals with high scores of hypnotizability scores (highs) with respect to low hypnotizable participants (lows) who differ between each other in the proneness to modify memory, perception and behavior according to specific imaginative suggestions. Here we present the first EEG evidence of stronger FE in highs. In fact, persistent homology shows that the highs EEG topological asset during real and imagined sensory conditions is significantly more similar than the lows. As a corollary finding, persistent homology shows lower restructuring of the EEG asset in highs than in lows during both sensory and imagery tasks with respect to basal conditions. Present findings support the view that greater embodiment of mental images may be responsible for the highs greater proneness to respond to sensori-motor suggestions and to report involuntariness in action. In addition, findings indicate hypnotizability-related sensory and cognitive information processing and suggest that the psycho-physiological trait of hypnotizability may modulate more than one aspect of the everyday life.
  140. Nonlinear Dynamic Approaches to Identify Atrial Fibrillation Progression Based on Topological Methods (2019)

    Bahareh Safarbali, Seyed Mohammad Reza Hashemi Golpayegani
    Abstract In recent years, atrial fibrillation (AF) development from paroxysmal to persistent or permanent forms has become an important issue in cardiovascular disorders. Information about AF pattern of presentation (paroxysmal, persistent, or permanent) was useful in the management of algorithms in each category. This management is aimed at reducing symptoms and stopping severe problems associated with AF. AF classification has been based on time duration and episodes until now. In particular, complexity changes in Heart Rate Variation (HRV) may contain clinically relevant signals of imminent systemic dysregulation. A number of nonlinear methods based on phase space and topological properties can give more insight into HRV abnormalities such as fibrillation. Aiming to provide a nonlinear tool to qualitatively classify AF stages, we proposed two geometrical indices (fractal dimension and persistent homology) based on HRV phase space, which can successfully replicate the changes in AF progression. The study population includes 38 lone AF patients and 20 normal subjects, which are collected from the Physio-Bank database. “Time of Life (TOL)” is proposed as a new feature based on the initial and final Čech radius in the persistent homology diagram. A neural network was implemented to prove the effectiveness of both TOL and fractal dimension as classification features. The accuracy of classification performance was 93%. The proposed indices provide a signal representation framework useful to understand the dynamic changes in AF cardiac patterns and to classify normal and pathological rhythms.
  141. Topological Data Analysis for Aviation Applications (2019)

    Max Z. Li, Megan S. Ryerson, Hamsa Balakrishnan
    Abstract Aviation data sets are increasingly high-dimensional and sparse. Consequently, the underlying features and interactions are not easily uncovered by traditional data analysis methods. Recent advancements in applied mathematics introduce topological methods, offering a new approach to obtain these features. This paper applies the fundamental notions underlying topological data analysis and persistent homology (TDA/PH) to aviation data analytics. We review past aviation research that leverage topological methods, and present a new computational case study exploring the topology of airport surface connectivity. In each case, we connect abstract topological features with real-world processes in aviation, and highlight potential operational and managerial insights.
  142. A Persistent Weisfeiler-Lehman Procedure for Graph Classification (2019)

    Bastian Rieck, Christian Bock, Karsten Borgwardt
    Abstract The Weisfeiler–Lehman graph kernel exhibits competitive performance in many graph classification tasks. However, its subtree features are not able to capture connected components and cycles, topological features known for characterising graphs. To extract such features, we leverage propagated node label information and transform unweighted graphs into metric ones. This permits us to augment the subtree features with topological information obtained using persistent homology, a concept from topological data analysis. Our method, which we formalise as a generalisation of Weisfeiler–Lehman subtree features, exhibits favourable classification accuracy and its improvements in predictive performance are mainly driven by including cycle information.
  143. Topological Machine Learning With Persistence Indicator Functions (2019)

    Bastian Rieck, Filip Sadlo, Heike Leitte
    Abstract Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distance-based data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernel-based similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data.
  144. Persistent Homology Analysis of Osmolyte Molecular Aggregation and Their Hydrogen-Bonding Networks (2019)

    Kelin Xia, D. Vijay Anand, Saxena Shikhar, Yuguang Mu
    Abstract Dramatically different properties have been observed for two types of osmolytes, i.e., trimethylamine N-oxide (TMAO) and urea, in a protein folding process. Great progress has been made in revealing the potential underlying mechanism of these two osmolyte systems. However, many problems still remain unsolved. In this paper, we propose to use the persistent homology to systematically study the osmolytes’ molecular aggregation and their hydrogen-bonding network from a global topological perspective. It has been found that, for the first time, TMAO and urea show two extremely different topological behaviors, i.e., an extensive network and local clusters, respectively. In general, TMAO forms highly consistent large loop or circle structures in high concentrations. In contrast, urea is more tightly aggregated locally. Moreover, the resulting hydrogen-bonding networks also demonstrate distinguishable features. With a concentration increase, TMAO hydrogen-bonding networks vary greatly in their total number of loop structures and large-sized loop structures consistently increase. In contrast, urea hydrogen-bonding networks remain relatively stable with slight reduction of the total loop number. Moreover, the persistent entropy (PE) is, for the first time, used in characterization of the topological information of the aggregation and hydrogen-bonding networks. The average PE systematically increases with the concentration for both TMAO and urea, and decreases in their hydrogen-bonding networks. But their PE variances have totally different behaviors. Finally, topological features of the hydrogen-bonding networks are found to be highly consistent with those from the ion aggregation systems, indicating that our topological invariants can characterize intrinsic features of the “structure making” and “structure breaking” systems.
  145. Analyzing Collective Motion With Machine Learning and Topology (2019)

    Dhananjay Bhaskar, Angelika Manhart, Jesse Milzman, John T. Nardini, Kathleen M. Storey, Chad M. Topaz, Lori Ziegelmeier
    Abstract We use topological data analysis and machine learning to study a seminal model of collective motion in biology [M. R. D’Orsogna et al., Phys. Rev. Lett. 96, 104302 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based on topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.
  146. Hepatic Tumor Classification Using Texture and Topology Analysis of Non-Contrast-Enhanced Three-Dimensional T1-Weighted MR Images With a Radiomics Approach (2019)

    Asuka Oyama, Yasuaki Hiraoka, Ippei Obayashi, Yusuke Saikawa, Shigeru Furui, Kenshiro Shiraishi, Shinobu Kumagai, Tatsuya Hayashi, Jun’ichi Kotoku
    Abstract The purpose of this study is to evaluate the accuracy for classification of hepatic tumors by characterization of T1-weighted magnetic resonance (MR) images using two radiomics approaches with machine learning models: texture analysis and topological data analysis using persistent homology. This study assessed non-contrast-enhanced fat-suppressed three-dimensional (3D) T1-weighted images of 150 hepatic tumors. The lesions included 50 hepatocellular carcinomas (HCCs), 50 metastatic tumors (MTs), and 50 hepatic hemangiomas (HHs) found respectively in 37, 23, and 33 patients. For classification, texture features were calculated, and also persistence images of three types (degree 0, degree 1 and degree 2) were obtained for each lesion from the 3D MR imaging data. We used three classification models. In the classification of HCC and MT (resp. HCC and HH, HH and MT), we obtained accuracy of 92% (resp. 90%, 73%) by texture analysis, and the highest accuracy of 85% (resp. 84%, 74%) when degree 1 (resp. degree 1, degree 2) persistence images were used. Our methods using texture analysis or topological data analysis allow for classification of the three hepatic tumors with considerable accuracy, and thus might be useful when applied for computer-aided diagnosis with MR images.
  147. Specimen-Based Analysis of Morphology and the Environment in Ecologically Dominant Grasses: The Power of the Herbarium (2019)

    Christine A. McAllister, Michael R. McKain, Mao Li, Bess Bookout, Elizabeth A. Kellogg
    Abstract Herbaria contain a cumulative sample of the world's flora, assembled by thousands of people over centuries. To capitalize on this resource, we conducted a specimen-based analysis of a major clade in the grass tribe Andropogoneae, including the dominant species of the world's grasslands in the genera Andropogon, Schizachyrium, Hyparrhenia and several others. We imaged 186 of the 250 named species of the clade, georeferenced the specimens and extracted climatic variables for each. Using semi- and fully automated image analysis techniques, we extracted spikelet morphological characters and correlated these with environmental variables. We generated chloroplast genome sequences to correct for phylogenetic covariance and here present a new phylogeny for 81 of the species. We confirm and extend earlier studies to show that Andropogon and Schizachyrium are not monophyletic. In addition, we find all morphological and ecological characters are homoplasious but variable among clades. For example, sessile spikelet length is positively correlated with awn length when all accessions are considered, but when separated by clade, the relationship is positive for three sub-clades and negative for three others. Climate variables showed no correlation with morphological variation in the spikelet pair; only very weak effects of temperature and precipitation were detected on macrohair density. This article is part of the theme issue ‘Biological collections for understanding biodiversity in the Anthropocene'.
  148. Signal Enrichment With Strain-Level Resolution in Metagenomes Using Topological Data Analysis (2019)

    Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida
    Abstract Background A metagenome is a collection of genomes, usually in a micro-environment, and sequencing a metagenomic sample en masse is a powerful means for investigating the community of the constituent microorganisms. One of the challenges is in distinguishing between similar organisms due to rampant multiple possible assignments of sequencing reads, resulting in false positive identifications. We map the problem to a topological data analysis (TDA) framework that extracts information from the geometric structure of data. Here the structure is defined by multi-way relationships between the sequencing reads using a reference database. Results Based primarily on the patterns of co-mapping of the reads to multiple organisms in the reference database, we use two models: one a subcomplex of a Barycentric subdivision complex and the other a Čech complex. The Barycentric subcomplex allows a natural mapping of the reads along with their coverage of organisms while the Čech complex takes simply the number of reads into account to map the problem to homology computation. Using simulated genome mixtures we show not just enrichment of signal but also microbe identification with strain-level resolution. Conclusions In particular, in the most refractory of cases where alternative algorithms that exploit unique reads (i.e., mapped to unique organisms) fail, we show that the TDA approach continues to show consistent performance. The Čech model that uses less information is equally effective, suggesting that even partial information when augmented with the appropriate structure is quite powerful.
  149. A Topological Approach to Selecting Models of Biological Experiments (2019)

    M. Ulmer, Lori Ziegelmeier, Chad M. Topaz
    Abstract We use topological data analysis as a tool to analyze the fit of mathematical models to experimental data. This study is built on data obtained from motion tracking groups of aphids in [Nilsen et al., PLOS One, 2013] and two random walk models that were proposed to describe the data. One model incorporates social interactions between the insects via a functional dependence on an aphid’s distance to its nearest neighbor. The second model is a control model that ignores this dependence. We compare data from each model to data from experiment by performing statistical tests based on three different sets of measures. First, we use time series of order parameters commonly used in collective motion studies. These order parameters measure the overall polarization and angular momentum of the group, and do not rely on a priori knowledge of the models that produced the data. Second, we use order parameter time series that do rely on a priori knowledge, namely average distance to nearest neighbor and percentage of aphids moving. Third, we use computational persistent homology to calculate topological signatures of the data. Analysis of the a priori order parameters indicates that the interactive model better describes the experimental data than the control model does. The topological approach performs as well as these a priori order parameters and better than the other order parameters, suggesting the utility of the topological approach in the absence of specific knowledge of mechanisms underlying the data.
  150. Unexpected Topology of the Temperature Fluctuations in the Cosmic Microwave Background (2019)

    Pratyush Pranav, Robert J. Adler, Thomas Buchert, Herbert Edelsbrunner, Bernard J. T. Jones, Armin Schwartzman, Hubert Wagner, Rien van de Weygaert
    Abstract We study the topology generated by the temperature fluctuations of the cosmic microwave background (CMB) radiation, as quantified by the number of components and holes, formally given by the Betti numbers, in the growing excursion sets. We compare CMB maps observed by the \textlessi\textgreaterPlanck\textlessi/\textgreater satellite with a thousand simulated maps generated according to the ΛCDM paradigm with Gaussian distributed fluctuations. The comparison is multi-scale, being performed on a sequence of degraded maps with mean pixel separation ranging from 0.05 to 7.33°. The survey of the CMB over 𝕊\textlesssup\textgreater2\textlesssup/\textgreater is incomplete due to obfuscation effects by bright point sources and other extended foreground objects like our own galaxy. To deal with such situations, where analysis in the presence of “masks” is of importance, we introduce the concept of relative homology. The parametric \textlessi\textgreaterχ\textlessi/\textgreater\textlesssup\textgreater2\textlesssup/\textgreater-test shows differences between observations and simulations, yielding \textlessi\textgreaterp\textlessi/\textgreater-values at percent to less than permil levels roughly between 2 and 7°, with the difference in the number of components and holes peaking at more than 3\textlessi\textgreaterσ\textlessi/\textgreater sporadically at these scales. The highest observed deviation between the observations and simulations for \textlessi\textgreaterb\textlessi/\textgreater\textlesssub\textgreater0\textlesssub/\textgreater and \textlessi\textgreaterb\textlessi/\textgreater\textlesssub\textgreater1\textlesssub/\textgreater is approximately between 3\textlessi\textgreaterσ\textlessi/\textgreater and 4\textlessi\textgreaterσ\textlessi/\textgreater at scales of 3–7°. There are reports of mildly unusual behaviour of the Euler characteristic at 3.66° in the literature, computed from independent measurements of the CMB temperature fluctuations by \textlessi\textgreaterPlanck\textlessi/\textgreater’s predecessor, the \textlessi\textgreaterWilkinson\textlessi/\textgreater Microwave Anisotropy Probe (WMAP) satellite. The mildly anomalous behaviour of the Euler characteristic is phenomenologically related to the strongly anomalous behaviour of components and holes, or the zeroth and first Betti numbers, respectively. Further, since these topological descriptors show consistent anomalous behaviour over independent measurements of \textlessi\textgreaterPlanck\textlessi/\textgreater and WMAP, instrumental and systematic errors may be an unlikely source. These are also the scales at which the observed maps exhibit low variance compared to the simulations, and approximately the range of scales at which the power spectrum exhibits a dip with respect to the theoretical model. Non-parametric tests show even stronger differences at almost all scales. Crucially, Gaussian simulations based on power-spectrum matching the characteristics of the observed dipped power spectrum are not able to resolve the anomaly. Understanding the origin of the anomalies in the CMB, whether cosmological in nature or arising due to late-time effects, is an extremely challenging task. Regardless, beyond the trivial possibility that this may still be a manifestation of an extreme Gaussian case, these observations, along with the super-horizon scales involved, may motivate the study of primordial non-Gaussianity. Alternative scenarios worth exploring may be models with non-trivial topology, including topological defect models.
  151. Learning Representations of Persistence Barcodes (2019)

    Christoph D. Hofer, Roland Kwitt, Marc Niethammer
    Abstract We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals.
  152. The Importance of the Whole: Topological Data Analysis for the Network Neuroscientist (2019)

    Ann E. Sizemore, Jennifer E. Phillips-Cremins, Robert Ghrist, Danielle S. Bassett
    Abstract Data analysis techniques from network science have fundamentally improved our understanding of neural systems and the complex behaviors that they support. Yet the restriction of network techniques to the study of pairwise interactions prevents us from taking into account intrinsic topological features such as cavities that may be crucial for system function. To detect and quantify these topological features, we must turn to algebro-topological methods that encode data as a simplicial complex built from sets of interacting nodes called simplices. We then use the relations between simplices to expose cavities within the complex, thereby summarizing its topological features. Here we provide an introduction to persistent homology, a fundamental method from applied topology that builds a global descriptor of system structure by chronicling the evolution of cavities as we move through a combinatorial object such as a weighted network. We detail the mathematics and perform demonstrative calculations on the mouse structural connectome, synapses in C. elegans, and genomic interaction data. Finally, we suggest avenues for future work and highlight new advances in mathematics ready for use in neural systems., For the network neuroscientist, this exposition aims to communicate both the mathematics and the advantages of using tools from applied topology for the study of neural systems. Using data from the mouse connectome, electrical and chemical synapses in C. elegans, and chromatin interaction data, we offer example computations and applications to further demonstrate the power of topological data analysis in neuroscience. Finally, we expose the reader to novel developments in applied topology and relate these developments to current questions and methodological difficulties in network neuroscience.
  153. Characterising Epithelial Tissues Using Persistent Entropy (2019)

    N. Atienza, L. M. Escudero, M. J. Jimenez, M. Soriano-Trigueros
    Abstract In this paper, we apply persistent entropy, a novel topological statistic, for characterization of images of epithelial tissues. We have found out that persistent entropy is able to summarize topological and geometric information encoded by \$\$\alpha \$\$α-complexes and persistent homology. After using some statistical tests, we can guarantee the existence of significant differences in the studied tissues.
  154. ChainNet: Learning on Blockchain Graphs With Topological Features (2019)

    N. C. Abay, C. G. Akcora, Y. R. Gel, M. Kantarcioglu, U. D. Islambekov, Y. Tian, B. Thuraisingham
    Abstract The following topics are dealt with: learning (artificial intelligence); graph theory; neural nets; pattern classification; data mining; feature extraction; recommender systems; pattern clustering; social networking (online); optimisation.
  155. A Machine-Learning-Based Early Warning System Boosted by Topological Data Analysis (2019)

    Devraj Basu, Tieqiang Li
    Abstract We propose a novel early warning system for detecting financial market crashes that utilizes the information extracted from the shape of financial market movement. Our system incorporates Topological Data Analysis (TDA), a new set of data analytics techniques specialised in profiling the shape of data, into a more traditional machine learning framework. Incorporating TDA leads to substantial improvements in timely detecting the onset of a sharp market decline. Our framework is both able to generate new features and also unlock more value from existing factors. Our results illustrate the importance of understanding the shape of financial market data and suggest that incorporating TDA into a machine learning framework could be beneficial in a number of financial market settings.
  156. Phase-Field Investigation of the Coarsening of Porous Structures by Surface Diffusion (2019)

    Pierre-Antoine Geslin, Mickaël Buchet, Takeshi Wada, Hidemi Kato
    Abstract Nano and microporous connected structures have attracted increasing attention in the past decades due to their high surface area, presenting interesting properties for a number of applications. These structures generally coarsen by surface diffusion, leading to an enlargement of the structure characteristic length scale. We propose to study this coarsening behavior using a phase-field model for surface diffusion. In addition to reproducing the expected scaling law, our simulations enable to investigate precisely the evolution of the topological and morphological characteristics along the coarsening process. In particular, we show that after a transient regime, the coarsening is self-similar as exhibited by the evolution of both morphological and topological features. In addition, the influence of surface anisotropy is discussed and comparisons with experimental tomographic observations are presented.
  157. Topological Data Analysis for Genomics and Evolution: Topology in Biology (2019)

    Raul Rabadan, Andrew J. Blumberg
    Abstract Biology has entered the age of Big Data. A technical revolution has transformed the field, and extracting meaningful information from large biological data sets is now a central methodological challenge. Algebraic topology is a well-established branch of pure mathematics that studies qualitative descriptors of the shape of geometric objects. It aims to reduce comparisons of shape to a comparison of algebraic invariants, such as numbers, which are typically easier to work with. Topological data analysis is a rapidly developing subfield that leverages the tools of algebraic topology to provide robust multiscale analysis of data sets. This book introduces the central ideas and techniques of topological data analysis and its specific applications to biology, including the evolution of viruses, bacteria and humans, genomics of cancer, and single cell characterization of developmental processes. Bridging two disciplines, the book is for researchers and graduate students in genomics and evolutionary biology as well as mathematicians interested in applied topology.
  158. The Accumulated Persistence Function, a New Useful Functional Summary Statistic for Topological Data Analysis, With a View to Brain Artery Trees and Spatial Point Process Applications (2019)

    C.A.N. Biscio, J. Møller
    Abstract We start with a simple introduction to topological data analysis where the most popular tool is called a persistence diagram. Briefly, a persistence diagram is a multiset of points in the plane describing the persistence of topological features of a compact set when a scale parameter varies. Since statistical methods are difficult to apply directly on persistence diagrams, various alternative functional summary statistics have been suggested, but either they do not contain the full information of the persistence diagram or they are two-dimensional functions. We suggest a new functional summary statistic that is one-dimensional and hence easier to handle, and which under mild conditions contains the full information of the persistence diagram. Its usefulness is illustrated in statistical settings concerned with point clouds and brain artery trees. The supplementary materials include additional methods and examples, technical details, and the R code used for all examples. © 2019, © 2019 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
  159. Revealing Key Structural Features Hidden in Liquids and Glasses (2019)

    Hajime Tanaka, Hua Tong, Rui Shi, John Russo
    Abstract A great success of solid state physics comes from the characterization of crystal structures in the reciprocal (wave vector) space. The power of structural characterization in Fourier space originates from the breakdown of translational and rotational symmetries. However, unlike crystals, liquids and amorphous solids possess continuous translational and rotational symmetries on a macroscopic scale, which makes Fourier space analysis much less effective. Lately, several studies have revealed local breakdown of translational and rotational symmetries even for liquids and glasses. Here, we review several mathematical methods used to characterize local structural features of apparently disordered liquids and glasses in real space. We distinguish two types of local ordering in liquids and glasses: energy-driven and entropy-driven. The former, which is favoured energetically by symmetry-selective directional bonding, is responsible for anomalous behaviours commonly observed in water-type liquids such as water, silicon, germanium and silica. The latter, which is often favoured entropically, shows connections with the heterogeneous, slow dynamics found in hard-sphere-like glass-forming liquids. We also discuss the relationship between such local ordering and crystalline structures and its impact on glass-forming ability.
  160. Molecular Phenotyping Using Networks, Diffusion, and Topology: Soft Tissue Sarcoma (2019)

    James C. Mathews, Maryam Pouryahya, Caroline Moosmüller, Yannis G. Kevrekidis, Joseph O. Deasy, Allen Tannenbaum
    Abstract Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
  161. Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

    Arjuna P. H. Don, James F. Peters
    Abstract This article introduces an application of Ghrist barcodes in the study of persistent Betti numbers derived from vortex nerve complexes found in triangulations of video frames. A Ghrist barcode (also called a persistence barcode) is a topology of data pic- tograph useful in representing the persistence of the features of changing shapes. The basic approach is to introduce a free Abelian group representation of intersecting filled polygons on the barycenters of the triangles of Alexandroff nerves. An Alexandroff nerve is a maximal collection of triangles of a common vertex in the triangulation of a finite, bounded planar region. In our case, the planar region is a video frame. A Betti number is a count of the number of generators is a finite Abelian group. The focus here is on the persistent Betti numbers across sequences of triangulated video frames. Each Betti number is mapped to an entry in a Ghrist barcode. Two main results are given, namely, vortex nerves are Edelsbrunner-Harer nerve complexes and the Betti number of a vortex nerve equals k + 2 for a vortex nerve containing k edges attached between a pair of vortex cycles in the nerve.
  162. Rootstock Effects on Scion Phenotypes in a ‘Chambourcin’ Experimental Vineyard (2019)

    Zoë Migicovsky, Zachary N Harris, Laura L Klein, Mao Li, Adam McDermaid, Daniel H Chitwood, Anne Fennell, Laszlo G Kovacs, Misha Kwasniewski, Jason P Londo, Qin Ma, Allison J Miller
    Abstract Understanding how root systems modulate shoot system phenotypes is a fundamental question in plant biology and will be useful in developing resilient agricultural crops. Grafting is a common horticultural practice that joins the roots (rootstock) of one plant to the shoot (scion) of another, providing an excellent method for investigating how these two organ systems affect each other. In this study, we used the French-American hybrid grapevine ‘Chambourcin’ (Vitis L.) as a model to explore the rootstock–scion relationship. We examined leaf shape, ion concentrations, and gene expression in ‘Chambourcin’ grown ungrafted as well as grafted to three different rootstocks (‘SO4’, ‘1103P’ and ‘3309C’) across 2 years and three different irrigation treatments. We found that a significant amount of the variation in leaf shape could be explained by the interaction between rootstock and irrigation. For ion concentrations, the primary source of variation identified was the position of a leaf in a shoot, although rootstock and rootstock by irrigation interaction also explained a significant amount of variation for most ions. Lastly, we found rootstock-specific patterns of gene expression in grafted plants when compared to ungrafted vines. Thus, our work reveals the subtle and complex effect of grafting on ‘Chambourcin’ leaf morphology, ionomics, and gene expression.
  163. Fast Estimation of Recombination Rates Using Topological Data Analysis (2019)

    Devon P. Humphreys, Melissa R. McGuirl, Michael Miyagi, Andrew J. Blumberg
    Abstract Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number (\textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-1.gif"/\textgreater) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call ψ, with a natural connection to coalescent models, and present novel arguments relating \textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-2.gif"/\textgreater to population genetic models. Using simulations, we show that ψ and \textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-3.gif"/\textgreater are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE’s efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.
  164. Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

    Zixuan Cang, Lin Mu, Guo-Wei Wei
    Abstract This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
  165. Gene Coexpression Network Comparison via Persistent Homology (2018)

    Ali Nabi Duman, Harun Pirim
    Abstract Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies.
  166. The Persistent Homology Mathematical Framework Provides Enhanced Genotype-to-Phenotype Associations for Plant Morphology (2018)

    Mao Li, Margaret H. Frank, Viktoriya Coneva, Washington Mio, Daniel H. Chitwood, Christopher N. Topp
    Abstract Efforts to understand the genetic and environmental conditioning of plant morphology are hindered by the lack of flexible and effective tools for quantifying morphology. Here, we demonstrate that persistent-homology-based topological methods can improve measurement of variation in leaf shape, serrations, and root architecture. We apply these methods to 2D images of leaves and root systems in field-grown plants of a domesticated introgression line population of tomato (Solanum pennellii). We find that compared with some commonly used conventional traits, (1) persistent-homology-based methods can more comprehensively capture morphological variation; (2) these techniques discriminate between genotypes with a larger normalized effect size and detect a greater number of unique quantitative trait loci (QTLs); (3) multivariate traits, whether statistically derived from univariate or persistent-homology-based traits, improve our ability to understand the genetic basis of phenotype; and (4) persistent-homology-based techniques detect unique QTLs compared to conventional traits or their multivariate derivatives, indicating that previously unmeasured aspects of morphology are now detectable. The QTL results further imply that genetic contributions to morphology can affect both the shoot and root, revealing a pleiotropic basis to natural variation in tomato. Persistent homology is a versatile framework to quantify plant morphology and developmental processes that complements and extends existing methods.
  167. Visual Detection of Structural Changes in Time-Varying Graphs Using Persistent Homology (2018)

    Mustafa Hajij, Bei Wang, Carlos Scheidegger, Paul Rosen
    Abstract Topological data analysis is an emerging area in exploratory data analysis and data mining. Its main tool, persistent homology, has become a popular technique to study the structure of complex, high-dimensional data. In this paper, we propose a novel method using persistent homology to quantify structural changes in time-varying graphs. Specifically, we transform each instance of the time-varying graph into a metric space, extract topological features using persistent homology, and compare those features over time. We provide a visualization that assists in time-varying graph exploration and helps to identify patterns of behavior within the data. To validate our approach, we conduct several case studies on real-world datasets and show how our method can find cyclic patterns, deviations from those patterns, and one-time events in time-varying graphs. We also examine whether a persistence-based similarity measure satisfies a set of well-established, desirable properties for graph metrics.
  168. Homological Analysis of Multi-Qubit Entanglement (2018)

    Alessandra di Pierro, Stefano Mancini, Laleh Memarzadeh, Riccardo Mengoni
    Abstract We propose the usage of persistent homologies to characterize multipartite entanglement. On a multi-qubit data set we introduce metric-like measures defined in terms of bipartite entanglement and then we derive barcodes. We show that, depending on the distance, they are able to produce different classifications. In one case, it is possible to obtain the standard separability classes. In the other case, a new classification of entangled states of three and four qubits is provided.
  169. Topological Data Analysis as a Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace (2018)

    Mao Li, Hong An, Ruthie Angelovici, Clement Bagaza, Albert Batushansky, Lynn Clark, Viktoriya Coneva, Michael J. Donoghue, Erika Edwards, Diego Fajardo, Hui Fang, Margaret H. Frank, Timothy Gallaher, Sarah Gebken, Theresa Hill, Shelley Jansky, Baljinder Kaur, Phillip C. Klahs, Laura L. Klein, Vasu Kuraparthy, Jason Londo, Zoë Migicovsky, Allison Miller, Rebekah Mohn, Sean Myles, Wagner C. Otoni, J. C. Pires, Edmond Rieffer, Sam Schmerler, Elizabeth Spriggs, Christopher N. Topp, Allen Van Deynze, Kuang Zhang, Linglong Zhu, Braden M. Zink, Daniel H. Chitwood
    Abstract Current morphometric methods that comprehensively measure shape cannot compare the disparate leaf shapes found in seed plants and are sensitive to processing artifacts. We explore the use of persistent homology, a topological method applied as a filtration across simplicial complexes (or more simply, a method to measure topological features of spaces across different spatial resolutions), to overcome these limitations. The described method isolates subsets of shape features and measures the spatial relationship of neighboring pixel densities in a shape. We apply the method to the analysis of 182,707 leaves, both published and unpublished, representing 141 plant families collected from 75 sites throughout the world. By measuring leaves from throughout the seed plants using persistent homology, a defined morphospace comparing all leaves is demarcated. Clear differences in shape between major phylogenetic groups are detected and estimates of leaf shape diversity within plant families are made. The approach predicts plant family above chance. The application of a persistent homology method, using topological features, to measure leaf shape allows for a unified morphometric framework to measure plant form, including shapes, textures, patterns, and branching architectures.
  170. Mind the Gap: A Study in Global Development Through Persistent Homology (2018)

    Andrew Banman, Lori Ziegelmeier
    Abstract The Gapminder project set out to use statistics to dispel simplistic notions about global development. In the same spirit, we use persistent homology, a technique from computational algebraic topology, to explore the relationship between country development and geography. For each country, four indicators, gross domestic product per capita; average life expectancy; infant mortality; and gross national income per capita, were used to quantify the development. Two analyses were performed. The first considers clusters of the countries based on these indicators, and the second uncovers cycles in the data when combined with geographic border structure. Our analysis is a multi-scale approach that reveals similarities and connections among countries at a variety of levels. We discover localized development patterns that are invisible in standard statistical methods.
  171. Persistent Homology Analysis of Ion Aggregations and Hydrogen-Bonding Networks (2018)

    Kelin Xia
    Abstract Despite the great advancement of experimental tools and theoretical models, a quantitative characterization of the microscopic structures of ion aggregates and their associated water hydrogen-bonding networks still remains a challenging problem. In this paper, a newly-invented mathematical method called persistent homology is introduced, for the first time, to quantitatively analyze the intrinsic topological properties of ion aggregation systems and hydrogen-bonding networks. The two most distinguishable properties of persistent homology analysis of assembly systems are as follows. First, it does not require a predefined bond length to construct the ion or hydrogen-bonding network. Persistent homology results are determined by the morphological structure of the data only. Second, it can directly measure the size of circles or holes in ion aggregates and hydrogen-bonding networks. To validate our model, we consider two well-studied systems, i.e., NaCl and KSCN solutions, generated from molecular dynamics simulations. They are believed to represent two morphological types of aggregation, i.e., local clusters and extended ion networks. It has been found that the two aggregation types have distinguishable topological features and can be characterized by our topological model very well. Further, we construct two types of networks, i.e., O-networks and H2O-networks, for analyzing the topological properties of hydrogen-bonding networks. It is found that for both models, KSCN systems demonstrate much more dramatic variations in their local circle structures with a concentration increase. A consistent increase of large-sized local circle structures is observed and the sizes of these circles become more and more diverse. In contrast, NaCl systems show no obvious increase of large-sized circles. Instead a consistent decline of the average size of the circle structures is observed and the sizes of these circles become more and more uniform with a concentration increase. As far as we know, these unique intrinsic topological features in ion aggregation systems have never been pointed out before. More importantly, our models can be directly used to quantitatively analyze the intrinsic topological invariants, including circles, loops, holes, and cavities, of any network-like structures, such as nanomaterials, colloidal systems, biomolecular assemblies, among others. These topological invariants cannot be described by traditional graph and network models.
  172. Pore Geometry Characterization by Persistent Homology Theory (2018)

    Fei Jiang, Takeshi Tsuji, Tomoyuki Shirai
    Abstract Rock pore geometry has heterogeneous characteristics and is scale dependent. This feature in a geological formation differs significantly from artificial materials and makes it difficult to predict hydrologic and elastic properties. To characterize pore heterogeneity, we propose an evaluation method that exploits the recently developed persistent homology theory. In the proposed method, complex pore geometry is first represented as sphere cloud data using a pore-network extraction method. Then, a persistence diagram (PD) is calculated from the point cloud, which represents the spatial distribution of pore bodies. A new parameter (distance index H) derived from the PD is proposed to characterize the degree of rock heterogeneity. Low H value indicates high heterogeneity. A new empirical equation using this index H is proposed to predict the effective elastic modulus of porous media. The results indicate that the proposed PD analysis is very efficient for extracting topological feature of pore geometry.
  173. Improving Health Care Management Through Persistent Homology of Time-Varying Variability of Emergency Department Patient Flow (2018)

    Mael Dugast, Guillaume Bouleux, Olivier Mory, Eric Marcon
    Abstract Excessive admissions at the Emergency Department (ED) is a phenomenon very closely linked to the propagation of viruses. It is a cause of overcrowding for EDs and a public health problem. The aim of this work is to give EDs’ leaders more time for decision making during this period. Based on the admissions time series associated with specific clinical diagnoses, we will first perform a Detrended Fluctuation Analysis (DFA) to obtain the corresponding variability time series. Next, we will embed this time series on a manifold to obtain a point cloud representation and use Topological Data Analysis (TDA) through persistent homology technic to propose two early realtime indicators. One is the early indicator of abnormal arrivals at the ED whereas the second gives the information on the time index of the maximum number of arrivals. The performance of the detectors is parameter dependent and it can evolve each year. That is why we also propose to solve a bi-objective optimization problem to track the variations of this parameter.
  174. Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)

    Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
    Abstract Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.
  175. Improved Understanding of Aqueous Solubility Modeling Through Topological Data Analysis (2018)

    Mariam Pirashvili, Lee Steinberg, Francisco Belchi Guillamon, Mahesan Niranjan, Jeremy G. Frey, Jacek Brodzki
    Abstract Topological data analysis is a family of recent mathematical techniques seeking to understand the ‘shape’ of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.
  176. Using Persistent Homology as a New Approach for Super-Resolution Localization Microscopy Data Analysis and Classification of γH2AX Foci/Clusters (2018)

    Andreas Hofmann, Matthias Krufczik, Dieter W. Heermann, Michael Hausmann
    Abstract DNA double strand breaks (DSB) are the most severe damages in chromatin induced by ionizing radiation. In response to such environmentally determined stress situations, cells have developed repair mechanisms. Although many investigations have contributed to a detailed understanding of repair processes, e.g., homologous recombination repair or non-homologous end-joining, the question is not sufficiently answered, how a cell decides to apply a certain repair process at a certain damage site, since all different repair pathways could simultaneously occur in the same cell nucleus. One of the first processes after DSB induction is phosphorylation of the histone variant H2AX to γH2AX in the given surroundings of the damaged locus. Since the spatial organization of chromatin is not random, it may be conclusive that the spatial organization of γH2AX foci is also not random, and rather, contributes to accessibility of special repair proteins to the damaged site, and thus, to the following repair pathway at this given site. The aim of this article is to demonstrate a new approach to analyze repair foci by their topology in order to obtain a cell independent method of categorization. During the last decade, novel super-resolution fluorescence light microscopic techniques have enabled new insights into genome structure and spatial organization on the nano-scale in the order of 10 nm. One of these techniques is single molecule localization microscopy (SMLM) with which the spatial coordinates of single fluorescence molecules can precisely be determined and density and distance distributions can be calculated. This method is an appropriate tool to quantify complex changes of chromatin and to describe repair foci on the single molecule level. Based on the pointillist information obtained by SMLM from specifically labeled heterochromatin and γH2AX foci reflecting the chromatin morphology and repair foci topology, we have developed a new analytical methodology of foci or foci cluster characterization, respectively, by means of persistence homology. This method allows, for the first time, a cell independent comparison of two point distributions (here the point distributions of two γH2AX clusters) with each other of a selected ensample and to give a mathematical measure of their similarity. In order to demonstrate the feasibility of this approach, cells were irradiated by low LET (linear energy transfer) radiation with different doses and the heterochromatin and γH2AX foci were fluorescently labeled by antibodies for SMLM. By means of our new analysis method, we were able to show that the topology of clusters of γH2AX foci can be categorized depending on the distance to heterochromatin. This method opens up new possibilities to categorize spatial organization of point patterns by parameterization of topological similarity.
  177. Lung Topology Characteristics in Patients With Chronic Obstructive Pulmonary Disease (2018)

    Francisco Belchi, Mariam Pirashvili, Joy Conway, Michael Bennett, Ratko Djukanovic, Jacek Brodzki
    Abstract Quantitative features that can currently be obtained from medical imaging do not provide a complete picture of Chronic Obstructive Pulmonary Disease (COPD). In this paper, we introduce a novel analytical tool based on persistent homology that extracts quantitative features from chest CT scans to describe the geometric structure of the airways inside the lungs. We show that these new radiomic features stratify COPD patients in agreement with the GOLD guidelines for COPD and can distinguish between inspiratory and expiratory scans. These CT measurements are very different to those currently in use and we demonstrate that they convey significant medical information. The results of this study are a proof of concept that topological methods can enhance the standard methodology to create a finer classification of COPD and increase the possibilities of more personalized treatment.
  178. A Probabilistic Topological Approach to Feature Identification Using a Stochastic Robotic Swarm (2018)

    Ragesh K. Ramachandran, Sean Wilson, Spring Berman
    Abstract This paper presents a novel automated approach to quantifying the topological features of an unknown environment using a swarm of robots with local sensing and limited or no access to global position information. The robots randomly explore the environment and record a time series of their estimated position and the covariance matrix associated with this estimate. After the robots’ deployment, a point cloud indicating the free space of the environment is extracted from their aggregated data. Tools from topological data analysis, in particular the concept of persistent homology, are applied to a subset of the point cloud to construct barcode diagrams, which are used to determine the numbers of different types of features in the domain. We demonstrate that our approach can correctly identify the number of topological features in simulations with zero to four features and in multi-robot experiments with one to three features.
  179. (Quasi)Periodicity Quantification in Video Data, Using Topology (2018)

    Christopher J. Tralie, Jose A. Perea
    Abstract This work introduces a novel framework for quantifying the presence and strength of recurrent dynamics in video data. Specifically, we provide continuous measures of periodicity (perfect repetition) and quasiperiodicity (superposition of periodic modes with noncommensurate periods), in a way which does not require segmentation, training, object tracking, or 1-dimensional surrogate signals. Our methodology operates directly on video data. The approach combines ideas from nonlinear time series analysis (delay embeddings) and computational topology (persistent homology) by translating the problem of finding recurrent dynamics in video data into the problem of determining the circularity or toroidality of an associated geometric space. Through extensive testing, we show the robustness of our scores with respect to several noise models/levels; we show that our periodicity score is superior to other methods when compared to human-generated periodicity rankings; and furthermore, we show that our quasiperiodicity score clearly indicates the presence of biphonation in videos of vibrating vocal folds, which has never before been accomplished quantitatively end to end.
  180. RGB Image-Based Data Analysis via Discrete Morse Theory and Persistent Homology (2018)

    Chuan Du, Christopher Szul, Adarsh Manawa, Nima Rasekh, Rosemary Guzman, Ruth Davidson
    Abstract Understanding and comparing images for the purposes of data analysis is currently a very computationally demanding task. A group at Australian National University (ANU) recently developed open-source code that can detect fundamental topological features of a grayscale image in a computationally feasible manner. This is made possible by the fact that computers store grayscale images as cubical cellular complexes. These complexes can be studied using the techniques of discrete Morse theory. We expand the functionality of the ANU code by introducing methods and software for analyzing images encoded in red, green, and blue (RGB), because this image encoding is very popular for publicly available data. Our methods allow the extraction of key topological information from RGB images via informative persistence diagrams by introducing novel methods for transforming RGB-to-grayscale. This paradigm allows us to perform data analysis directly on RGB images representing water scarcity variability as well as crime variability. We introduce software enabling a a user to predict future image properties, towards the eventual aim of more rapid image-based data behavior prediction.
  181. Connectivity in fMRI: Blind Spots and Breakthroughs (2018)

    Victor Solo, Jean-Baptiste Poline, Martin A. Lindquist, Sean L. Simpson, F. DuBois Bowman, Moo K. Chung, Ben Cassidy
    Abstract In recent years, driven by scientific and clinical concerns, there has been an increased interest in the analysis of functional brain networks. The goal of these analyses is to better understand how brain regions interact, how this depends upon experimental conditions and behavioral measures and how anomalies (disease) can be recognized. In this work we provide, firstly, a brief review of some of the main existing methods of functional brain network analysis. But rather than compare them, as a traditional review would do, instead, we draw attention to their significant limitations and blind spots. Then, secondly, relevant experts, sketch a number of emerging methods, which can break through these limitations. In particular we discuss five such methods. The first two, stochastic block models and exponential random graph models, provide an inferential basis for network analysis lacking in the exploratory graph analysis methods. The other three address: network comparison via persistent homology, time-varying connectivity that distinguishes sample fluctuations from neural fluctuations and, network system identification that draws inferential strength from temporal autocorrelation.
  182. Morphometrics Reveals Complex and Heritable Apple Leaf Shapes (2018)

    Zoë Migicovsky, Mao Li, Daniel H. Chitwood, Sean Myles
    Abstract Apple (Malus spp.) is a widely grown and valuable fruit crop. Leaf shape is important for flowering in apple and may also be an early indicator for other agriculturally valuable traits. We examined 9,000 leaves from 869 unique apple accessions using linear measurements and comprehensive morphometric techniques. We identified allometric variation as the result of differing length-to-width aspect ratios between accessions and species of apple. The allometric variation was due to variation in the width of the leaf blade, not the length. Aspect ratio was highly correlated with the first principal component (PC1) of morphometric variation quantified using elliptical Fourier descriptors (EFDs) and persistent homology (PH). While the primary source of variation was aspect ratio, subsequent PCs corresponded to complex shape variation not captured by linear measurements. After linking the morphometric information with over 122,000 genome-wide single nucleotide polymorphisms (SNPs), we found high SNP heritability values even at later PCs, indicating that comprehensive morphometrics can capture complex, heritable phenotypes. Thus, techniques such as EFDs and PH are capturing heritable biological variation that would be missed using linear measurements alone.
  183. Topological Data Analysis for True Step Detection in Periodic Piecewise Constant Signals (2018)

    Firas A. Khasawneh, Elizabeth Munch
    Abstract This paper introduces a simple yet powerful approach based on topological data analysis for detecting true steps in a periodic, piecewise constant (PWC) signal. The signal is a two-state square wave with randomly varying in-between-pulse spacing, subject to spurious steps at the rising or falling edges which we call digital ringing. We use persistent homology to derive mathematical guarantees for the resulting change detection which enables accurate identification and counting of the true pulses. The approach is tested using both synthetic and experimental data obtained using an engine lathe instrumented with a laser tachometer. The described algorithm enables accurate and automatic calculations of the spindle speed without any choice of parameters. The results are compared with the frequency and sequency methods of the Fourier and Walsh–Hadamard transforms, respectively. Both our approach and the Fourier analysis yield comparable results for pulses with regular spacing and digital ringing while the latter causes large errors using the Walsh–Hadamard method. Further, the described approach significantly outperforms the frequency/sequency analyses when the spacing between the peaks is varied. We discuss generalizing the approach to higher dimensional PWC signals, although using this extension remains an interesting question for future research.
  184. Image-Based Phenotyping for Identification of QTL Determining Fruit Shape and Size in American Cranberry (Vaccinium Macrocarpon L.) (2018)

    Luis Diaz-Garcia, Giovanny Covarrubias-Pazaran, Brandon Schlautman, Edward Grygleski, Juan Zalapa
    Abstract Image-based phenotyping methodologies are powerful tools to determine quality parameters for fruit breeders and processors. The fruit size and shape of American cranberry (Vaccinium macrocarpon L.) are particularly important characteristics that determine the harvests’ processing value and potential end-use products (e.g., juice vs. sweetened dried cranberries). However, cranberry fruit size and shape attributes can be difficult and time consuming for breeders and processors to measure, especially when relying on manual measurements and visual ratings. Therefore, in this study, we implemented image-based phenotyping techniques for gathering data regarding basic cranberry fruit parameters such as length, width, length-to-width ratio, and eccentricity. Additionally, we applied a persistent homology algorithm to better characterize complex shape parameters. Using this high-throughput artificial vision approach, we characterized fruit from 351 progeny from a full-sib cranberry population over three field seasons. Using a covariate analysis to maximize the identification of well-supported quantitative trait loci (QTL), we found 252 single QTL in a 3-year period for cranberry fruit size and shape descriptors from which 20% were consistently found in all years. The present study highlights the potential for the identified QTL and the image-based methods to serve as a basis for future explorations of the genetic architecture of fruit size and shape in cranberry and other fruit crops.
  185. Airway Pathological Heterogeneity in Asthma: Visualization of Disease Microclusters Using Topological Data Analysis (2018)

    Salman Siddiqui, Aarti Shikotra, Matthew Richardson, Emma Doran, David Choy, Alex Bell, Cary D. Austin, Jeffrey Eastham-Anderson, Beverley Hargadon, Joseph R. Arron, Andrew Wardlaw, Christopher E. Brightling, Liam G. Heaney, Peter Bradding
    Abstract Background Asthma is a complex chronic disease underpinned by pathological changes within the airway wall. How variations in structural airway pathology and cellular inflammation contribute to the expression and severity of asthma are poorly understood. Objectives Therefore we evaluated pathological heterogeneity using topological data analysis (TDA) with the aim of visualizing disease clusters and microclusters. Methods A discovery population of 202 adult patients (142 asthmatic patients and 60 healthy subjects) and an external replication population (59 patients with severe asthma) were evaluated. Pathology and gene expression were examined in bronchial biopsy samples. TDA was applied by using pathological variables alone to create pathology-driven visual networks. Results In the discovery cohort TDA identified 4 groups/networks with multiple microclusters/regions of interest that were masked by group-level statistics. Specifically, TDA group 1 consisted of a high proportion of healthy subjects, with a microcluster representing a topological continuum connecting healthy subjects to patients with mild-to-moderate asthma. Three additional TDA groups with moderate-to-severe asthma (Airway Smooth MuscleHigh, Reticular Basement MembraneHigh, and RemodelingLow groups) were identified and contained numerous microclusters with varying pathological and clinical features. Mutually exclusive TH2 and TH17 tissue gene expression signatures were identified in all pathological groups. Discovery and external replication applied to the severe asthma subgroup identified only highly similar “pathological data shapes” through analyses of persistent homology. Conclusions We have identified and replicated novel pathological phenotypes of asthma using TDA. Our methodology is applicable to other complex chronic diseases.
  186. Stable Signatures for Dynamic Graphs and Dynamic Metric Spaces via Zigzag Persistence (2018)

    Woojin Kim, Facundo Memoli
    Abstract When studying flocking/swarming behaviors in animals one is interested in quantifying and comparing the dynamics of the clustering induced by the coalescence and disbanding of animals in different groups. In a similar vein, studying the dynamics of social networks leads to the problem of characterizing groups/communities as they form and disperse throughout time. Motivated by this, we study the problem of obtaining persistent homology based summaries of time-dependent data. Given a finite dynamic graph (DG), we first construct a zigzag persistence module arising from linearizing the dynamic transitive graph naturally induced from the input DG. Based on standard results, we then obtain a persistence diagram or barcode from this zigzag persistence module. We prove that these barcodes are stable under perturbations in the input DG under a suitable distance between DGs that we identify. More precisely, our stability theorem can be interpreted as providing a lower bound for the distance between DGs. Since it relies on barcodes, and their bottleneck distance, this lower bound can be computed in polynomial time from the DG inputs. Since DGs can be given rise by applying the Rips functor (with a fixed threshold) to dynamic metric spaces, we are also able to derive related stable invariants for these richer class of dynamic objects. Along the way, we propose a summarization of dynamic graphs that captures their time-dependent clustering features which we call formigrams. These set-valued functions generalize the notion of dendrogram, a prevalent tool for hierarchical clustering. In order to elucidate the relationship between our distance between two DGs and the bottleneck distance between their associated barcodes, we exploit recent advances in the stability of zigzag persistence due to Botnan and Lesnick, and to Bjerkevik.
  187. Spatial Embedding Imposes Constraints on Neuronal Network Architectures (2018)

    Jennifer Stiso, Danielle S. Bassett
    Abstract Recent progress towards understanding circuit function has capitalized on tools from network science to parsimoniously describe the spatiotemporal architecture of neural systems. Such tools often address systems topology divorced from its physical instantiation. Nevertheless, for embedded systems such as the brain, physical laws directly constrain the processes of network growth, development, and function. We review here the rules imposed by the space and volume of the brain on the development of neuronal networks, and show that these rules give rise to a specific set of complex topologies. These rules also affect the repertoire of neural dynamics that can emerge from the system, and thereby inform our understanding of network dysfunction in disease. We close by discussing new tools and models to delineate the effects of spatial embedding.
  188. Geometry and Topology of the Space of Sonar Target Echos (2018)

    Michael Robinson, Sean Fennell, Brian DiZio, Jennifer Dumiak
    Abstract Successful synthetic aperture sonar target classification depends on the “shape” of the scatterers within a target signature. This article presents a workflow that computes a target-to-target distance from persistence diagrams, since the “shape” of a signature informs its persistence diagram in a structure-preserving way. The target-to-target distances derived from persistence diagrams compare favorably against those derived from spectral features and have the advantage of being substantially more compact. While spectral features produce clusters associated to each target type that are reasonably dense and well formed, the clusters are not well-separated from one another. In rather dramatic contrast, a distance derived from persistence diagrams results in highly separated clusters at the expense of some misclassification of outliers.
  189. The Importance of Forgetting: Limiting Memory Improves Recovery of Topological Characteristics From Neural Data (2018)

    Samir Chowdhury, Bowen Dai, Facundo Mémoli
    Abstract We develop of a line of work initiated by Curto and Itskov towards understanding the amount of information contained in the spike trains of hippocampal place cells via topology considerations. Previously, it was established that simply knowing which groups of place cells fire together in an animal’s hippocampus is sufficient to extract the global topology of the animal’s physical environment. We model a system where collections of place cells group and ungroup according to short-term plasticity rules. In particular, we obtain the surprising result that in experiments with spurious firing, the accuracy of the extracted topological information decreases with the persistence (beyond a certain regime) of the cell groups. This suggests that synaptic transience, or forgetting, is a mechanism by which the brain counteracts the effects of spurious place cell activity.
  190. Revisiting Abnormalities in Brain Network Architecture Underlying Autism Using Topology-Inspired Statistical Inference (2018)

    Sourabh Palande, Vipin Jose, Brandon Zielinski, Jeffrey Anderson, P. Thomas Fletcher, Bei Wang
    Abstract A large body of evidence relates autism with abnormal structural and functional brain connectivity. Structural covariance magnetic resonance imaging (scMRI) is a technique that maps brain regions with covarying gray matter densities across subjects. It provides a way to probe the anatomical structure underlying intrinsic connectivity networks (ICNs) through analysis of gray matter signal covariance. In this article, we apply topological data analysis in conjunction with scMRI to explore network-specific differences in the gray matter structure in subjects with autism versus age-, gender-, and IQ-matched controls. Specifically, we investigate topological differences in gray matter structure captured by structural correlation graphs derived from three ICNs strongly implicated in autism, namely the salience network, default mode network, and executive control network. By combining topological data analysis with statistical inference, our results provide evidence of statistically significant network-specific structural abnormalities in autism.
  191. Knowledge Gaps in the Early Growth of Semantic Feature Networks (2018)

    Ann E. Sizemore, Elisabeth A. Karuza, Chad Giusti, Danielle S. Bassett
    Abstract Understanding language learning and more general knowledge acquisition requires the characterization of inherently qualitative structures. Recent work has applied network science to this task by creating semantic feature networks, in which words correspond to nodes and connections correspond to shared features, and then by characterizing the structure of strongly interrelated groups of words. However, the importance of sparse portions of the semantic network—knowledge gaps—remains unexplored. Using applied topology, we query the prevalence of knowledge gaps, which we propose manifest as cavities in the growing semantic feature network of toddlers. We detect topological cavities of multiple dimensions and find that, despite word order variation, the global organization remains similar. We also show that nodal network measures correlate with filling cavities better than basic lexical properties. Finally, we discuss the importance of semantic feature network topology in language learning and speculate that the progression through knowledge gaps may be a robust feature of knowledge acquisition.
  192. Topological Eulerian Synthesis of Slow Motion Periodic Videos (2018)

    Christopher Tralie, Matthew Berger
    Abstract We consider the problem of taking a video that is comprised of multiple periods of repetitive motion, and reordering the frames of the video into a single period, producing a detailed, single cycle video of motion. This problem is challenging, as such videos often contain noise, drift due to camera motion and from cycle to cycle, and irrelevant background motion/occlusions, and these factors can confound the relevant periodic motion we seek in the video. To address these issues in a simple and efficient manner, we introduce a tracking free Eulerian approach for synthesizing a single cycle of motion. Our approach is geometric: we treat each frame as a point in high-dimensional Euclidean space, and analyze the sliding window embedding formed by this sequence of points, which yields samples along a topological loop regardless of the type of periodic motion. We combine tools from topological data analysis and spectral geometric analysis to estimate the phase of each window, and we exploit the sliding window structure to robustly reorder frames. We show quantitative results that highlight the robustness of our technique to camera shake, noise, and occlusions, and qualitative results of single-cycle motion synthesis across a variety of scenarios.
  193. Chatter Classification in Turning Using Machine Learning and Topological Data Analysis (2018)

    Firas A. Khasawneh, Elizabeth Munch, Jose A. Perea
    Abstract Chatter identification and detection in machining processes has been an active area of research in the past two decades. Part of the challenge in studying chatter is that machining equations that describe its occurrence are often nonlinear delay differential equations. The majority of the available tools for chatter identification rely on defining a metric that captures the characteristics of chatter, and a threshold that signals its occurrence. The difficulty in choosing these parameters can be somewhat alleviated by utilizing machine learning techniques. However, even with a successful classification algorithm, the transferability of typical machine learning methods from one data set to another remains very limited. In this paper we combine supervised machine learning with Topological Data Analysis (TDA) to obtain a descriptor of the process which can detect chatter. The features we use are derived from the persistence diagram of an attractor reconstructed from the time series via Takens embedding. We test the approach using deterministic and stochastic turning models, where the stochasticity is introduced via the cutting coefficient term. Our results show a 97% successful classification rate on the deterministic model labeled by the stability diagram obtained using the spectral element method. The features gleaned from the deterministic model are then utilized for characterization of chatter in a stochastic turning model where there are very limited analysis methods.
  194. Topological Data Analysis for the Characterization of Atomic Scale Morphology From Atom Probe Tomography Images (2018)

    Tianmu Zhang, Scott R. Broderick, Krishna Rajan
    Abstract Atom probe tomography (APT) represents a revolutionary characterization tool for materials that combine atomic imaging with a time-of-flight (TOF) mass spectrometer to provide direct space three-dimensional, atomic scale resolution images of materials with the chemical identities of hundreds of millions of atoms. It involves the controlled removal of atoms from a specimen’s surface by field evaporation and then sequentially analyzing them with a position sensitive detector and TOF mass spectrometer. A paradox in APT is that while on the one hand, it provides an unprecedented level of imaging resolution in three dimensions, it is very difficult to obtain an accurate perspective of morphology or shape outlined by atoms of similar chemistry and microstructure. The origins of this problem are numerous, including incomplete detection of atoms and the complexity of the evaporation fields of atoms at or near interfaces. Hence, unlike scattering techniques such as electron microscopy, interfaces appear diffused, not sharp. This, in turn, makes it challenging to visualize and quantitatively interpret the microstructure at the “meso” scale, where one is interested in the shape and form of the interfaces and their associated chemical gradients. It is here that the application of informatics at the nanoscale and statistical learning methods plays a critical role in both defining the level of uncertainty and helping to make quantitative, statistically objective interpretations where heuristics often dominate. In this chapter, we show how the tools of Topological Data Analysis provide a new and powerful tool in the field of nanoinformatics for materials characterization.
  195. Cliques and Cavities in the Human Connectome (2018)

    Ann E. Sizemore, Chad Giusti, Ari Kahn, Jean M. Vettel, Richard F. Betzel, Danielle S. Bassett
    Abstract Encoding brain regions and their connections as a network of nodes and edges captures many of the possible paths along which information can be transmitted as humans process and perform complex behaviors. Because cognitive processes involve large, distributed networks of brain areas, principled examinations of multi-node routes within larger connection patterns can offer fundamental insights into the complexities of brain function. Here, we investigate both densely connected groups of nodes that could perform local computations as well as larger patterns of interactions that would allow for parallel processing. Finding such structures necessitates that we move from considering exclusively pairwise interactions to capturing higher order relations, concepts naturally expressed in the language of algebraic topology. These tools can be used to study mesoscale network structures that arise from the arrangement of densely connected substructures called cliques in otherwise sparsely connected brain networks. We detect cliques (all-to-all connected sets of brain regions) in the average structural connectomes of 8 healthy adults scanned in triplicate and discover the presence of more large cliques than expected in null networks constructed via wiring minimization, providing architecture through which brain network can perform rapid, local processing. We then locate topological cavities of different dimensions, around which information may flow in either diverging or converging patterns. These cavities exist consistently across subjects, differ from those observed in null model networks, and – importantly – link regions of early and late evolutionary origin in long loops, underscoring their unique role in controlling brain function. These results offer a first demonstration that techniques from algebraic topology offer a novel perspective on structural connectomics, highlighting loop-like paths as crucial features in the human brain’s structural architecture.
  196. The Architecture of the Endoplasmic Reticulum Is Regulated by the Reversible Lipid Modification of the Shaping Protein CLIMP-63 (2018)

    Patrick A. Sandoz, Robin A. Denhardt-Eriksson, Laurence Abrami, Luciano Abriata, Gard Spreemann, Catherine Maclachlan, Sylvia Ho, Béatrice Kunz, Kathryn Hess, Graham Knott, Vassily Hatzimanikatis, F. Gisou van der Goot
    Abstract \textlessh3\textgreaterAbstract\textless/h3\textgreater \textlessp\textgreaterThe endoplasmic reticulum (ER) has a complex morphology generated and maintained by membrane-shaping proteins and membrane energy minimization, though not much is known about how it is regulated. The architecture of this intracellular organelle is balanced between large, thin sheets that are densely packed in the perinuclear region and a connected network of branched, elongated tubules that extend throughout the cytoplasm. Sheet formation is known to involve the cytoskeleton-linking membrane protein 63 (CLIMP-63), though its regulation and the depth of its involvement remain unknown. Here we show that the post-translational modification of CLIMP-63 by the palmitoyltransferase ZDHHC6 controls the relative distribution of CLIMP-63 between the ER and the plasma membrane. By combining data-driven mathematical modeling, predictions, and experimental validation, we found that the attachment of a medium chain fatty acid, so-called S-palmitoylation, to the unique CLIMP-63 cytoplasmic cysteine residue drastically reduces its turnover rate, and thereby controls its abundance. Light microscopy and focused ion beam electron microcopy further revealed that enhanced CLIMP-63 palmitoylation leads to strong ER-sheet proliferation. Altogether, we show that ZDHHC6-mediated S-palmitoylation regulates the cellular localization of CLIMP-63, the morphology of the ER, and the interconversion of ER structural elements in mammalian cells through its action on the CLIMP-63 protein.\textless/p\textgreater\textlessh3\textgreaterSignificance Statement\textless/h3\textgreater \textlessp\textgreaterEukaryotic cells subcompartmentalize their various functions into organelles, the shape of each being specific and necessary for its proper role. However, how these shapes are generated and controlled is poorly understood. The endoplasmic reticulum is the largest membrane-bound intracellular compartment, accounting for more than 50% of all cellular membranes. We found that the shape and quantity of its sheet-like structures are controlled by a specific protein, cytoskeleton-linking membrane protein 63, through the acquisition of a lipid chain attached by an enzyme called ZDHHC6. Thus, by modifying the ZDHHC6 amounts, a cell can control the shape of its ER. The modeling and prediction technique used herein also provides a method for studying the interconnected function of other post-translational modifications in organelles.\textless/p\textgreater
  197. Persistent Homology Index as a Robust Quantitative Measure of Immunohistochemical Scoring (2017)

    Akihiro Takiyama, Takashi Teramoto, Hiroaki Suzuki, Katsushige Yamashiro, Shinya Tanaka
    Abstract Immunohistochemical data (IHC) plays an important role in clinical practice, and is typically gathered in a semi-quantitative fashion that relies on some degree of visual scoring. However, visual scoring by a pathologist is inherently subjective and manifests both intra-observer and inter-observer variability. In this study, we introduce a novel computer-aided quantification methodology for immunohistochemical scoring that uses the algebraic concept of persistent homology. Using 8 bit grayscale image data derived from 90 specimens of invasive ductal carcinoma of the breast, stained for the replicative marker Ki-67, we computed homology classes. These were then compared to nuclear grades and the Ki-67 labeling indices obtained by visual scoring. Three metrics for IHC staining were newly defined: Persistent Homology Index (PHI), center coordinates of positive and negative groups, and the sum of squares within groups (WSS). This study demonstrates that PHI, a novel index for immunohistochemical labeling using persistent homology, can produce highly similar data to that generated by a pathologist using visual evaluation. The potential benefits associated with our novel technology include both improved quantification and reproducibility. Since our method reflects cellularity and nuclear atypia, it carries a greater quantity of biologic data compared to conventional evaluation using Ki-67.
  198. Persistence Images: A Stable Vector Representation of Persistent Homology (2017)

    Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, Lori Ziegelmeier
    Abstract Many data sets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a data set. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite-dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs.
  199. Persistent Homology of Time-Dependent Functional Networks Constructed From Coupled Time Series (2017)

    Bernadette J. Stolz, Heather A. Harrington, Mason A. Porter
    Abstract We use topological data analysis to study “functional networks” that we construct from time-series data from both experimental and synthetic sources. We use persistent homology with a weight rank clique filtration to gain insights into these functional networks, and we use persistence landscapes to interpret our results. Our first example uses time-series output from networks of coupled Kuramoto oscillators. Our second example consists of biological data in the form of functional magnetic resonance imaging data that were acquired from human subjects during a simple motor-learning task in which subjects were monitored for three days during a five-day period. With these examples, we demonstrate that (1) using persistent homology to study functional networks provides fascinating insights into their properties and (2) the position of the features in a filtration can sometimes play a more vital role than persistence in the interpretation of topological features, even though conventionally the latter is used to distinguish between signal and noise. We find that persistent homology can detect differences in synchronization patterns in our data sets over time, giving insight both on changes in community structure in the networks and on increased synchronization between brain regions that form loops in a functional network during motor learning. For the motor-learning data, persistence landscapes also reveal that on average the majority of changes in the network loops take place on the second of the three days of the learning process.
  200. Positive Alexander Duality for Pursuit and Evasion (2017)

    Robert Ghrist, Sanjeevi Krishnan
    Abstract Considered is a class of pursuit-evasion games, in which an evader tries to avoid detection. Such games can be formulated as the search for sections to the complement of a coverage region in a Euclidean space over time. Prior results give homological criteria for evasion in the general case that are not necessary and sufficient. This paper provides a necessary and sufficient positive cohomological criterion for evasion in the general case. The principal tools are (1) a refinement of the Čech cohomology of a coverage region with a positive cone encoding spatial orientation, (2) a refinement of the Borel--Moore homology of the coverage gaps with a positive cone encoding time orientation, and (3) a positive variant of Alexander Duality. Positive cohomology decomposes as the global sections of a sheaf of local positive cohomology over the time axis; we show how this decomposition makes positive cohomology computable using techniques of computational polyhedral geometry and linear programming.
  201. Congestion Barcodes: Exploring the Topology of Urban Congestion Using Persistent Homology (2017)

    Yu Wu, Gabriel Shindnes, Vaibhav Karve, Derrek Yager, Daniel B. Work, Arnab Chakraborty, Richard B. Sowers
    Abstract This work presents a new method to quantify connectivity in transportation networks. Inspired by the field of topological data analysis, we propose a novel approach to explore the robustness of road network connectivity in the presence of congestion on the roadway. The robustness of the pattern is summarized in a congestion barcode, which can be constructed directly from traffic datasets commonly used for navigation. As an initial demonstration, we illustrate the main technique on a publicly available traffic dataset in a neighborhood in New York City.
  202. Optimal Topological Cycles and Their Application in Cardiac Trabeculae Restoration (2017)

    Pengxiang Wu, Chao Chen, Yusu Wang, Shaoting Zhang, Changhe Yuan, Zhen Qian, Dimitris Metaxas, Leon Axel
    Abstract In cardiac image analysis, it is important yet challenging to reconstruct the trabeculae, namely, fine muscle columns whose ends are attached to the ventricular walls. To extract these fine structures, traditional image segmentation methods are insufficient. In this paper, we propose a novel method to jointly detect salient topological handles and compute the optimal representations of them. The detected handles are considered hypothetical trabeculae structures. They are further screened using a classifier and are then included in the final segmentation. We show in experiments the significance of our contribution compared with previous standard segmentation methods without topological priors, as well as with previous topological method in which non-optimal representations of topological handles are used.
  203. Modelling Topological Features of Swarm Behaviour in Space and Time With Persistence Landscapes (2017)

    P. Corcoran, C. B. Jones
    Abstract This paper presents a model of swarm behavior that encodes the spatial-temporal characteristics of topological features, such as holes and connected components. Specifically, the persistence of topological features with respect to time is computed using zig-zag persistent homology. This information is in turn modelled as a persistence landscape, which forms a normed vector space and facilitates the application of statistical and data mining techniques. Validation of the proposed model is performed using a real data set corresponding to a swarm of fish. It is demonstrated that the proposed model may be used to perform retrieval and clustering of swarm behavior in terms of topological features. In fact, it is discovered that clustering returns clusters corresponding to the swarm behaviors of flock, torus, and disordered. These are the most frequently occurring types of behavior exhibited by swarms in general.
  204. Shape Terra: Mechanical Feature Recognition Based on a Persistent Heat Signature (2017)

    Ramy Harik, Yang Shi, Stephen Baek
    Abstract This paper presents a novel approach to recognizing mechanical features through a multiscale persistent heat signature similarity identification technique. First, heat signature is computed using a modified Laplacian in the application of the heat kernel. Regularly, matrices tend to include an indicator to the manifold curvature (the cotangent in our case), but we add a mesh uniformity factor to overcome mesh proportionality and skewness. Second, once heat retention values are computed, we apply persistent homology to extract significant subsets of the global mesh at different time intervals. Subsets are computed based on similarity of heat retention levels and/or retention values. Third, we present a multiscale persistence identification approach where we scan the part at different persistence levels to detect the presence of a feature. Once features are recognized and their geometrical descriptors identified, the next stage in future work will be feature matching.
  205. Constructing Shape Spaces From a Topological Perspective (2017)

    Christoph Hofer, Roland Kwitt, Marc Niethammer, Yvonne Höller, Eugen Trinka, Andreas Uhl
    Abstract We consider the task of constructing (metric) shape space(s) from a topological perspective. In particular, we present a generic construction scheme and demonstrate how to apply this scheme when shape is interpreted as the differences that remain after factoring out translation, scaling and rotation. This is achieved by leveraging a recently proposed injective functional transform of 2D/3D (binary) objects, based on persistent homology. The resulting shape space is then equipped with a similarity measure that is (1) by design robust to noise and (2) fulfills all metric axioms. From a practical point of view, analyses of object shape can then be carried out directly on segmented objects obtained from some imaging modality without any preprocessing, such as alignment, smoothing, or landmark selection. We demonstrate the utility of the approach on the problem of distinguishing segmented hippocampi from normal controls vs. patients with Alzheimer’s disease in a challenging setup where volume changes are no longer discriminative.
  206. Raw Material Flow Optimization as a Capacitated Vehicle Routing Problem: A Visual Benchmarking Approach for Sustainable Manufacturing (2017)

    Michele Dassisti, Yasamin Eslami, Matin Mohaghegh
    Abstract Optimisation problem concerning material flows, to increase the efficiency while reducing relative resource consumption is one of the most pressing problems today. The focus point of this study is to propose a new visual benchmarking approach to select the best material-flow path from the depot to the production lines, referring to the well-known Capacitated Vehicle Routing Problem (CVRP). An example industrial case study is considered to this aim. Two different solution techniques were adopted (namely Mixed Integer Linear Programming and the Ant Colony Optimization) in searching optimal solutions to the CVRP. The visual benchmarking proposed, based on the persistent homology approach, allowed to support the comparison of the optimal solutions based on the entropy of the output in different scenarios. Finally, based on the non-standard measurements of Crossing Length Percentage (CLP), the visual benchmarking procedure makes it possible to find the most practical and applicable solution to CVRP by considering the visual attractiveness and the quality of the routes.
  207. Segmentation of Biomedical Images by a Computational Topology Framework (2017)

    Rodrigo Rojas Moraleda, Wei Xiong, Niels Halama, Katja Breitkopf-Heinlein, Steven Steven, Luis Salinas, Dieter W. Heermann, Nektarios A. Valous
    Abstract The segmentation of cell nuclei is an important step towards the automated analysis of histological images. The presence of a large number of nuclei in whole-slide images necessitates methods that are computationally tractable in addition to being effective. In this work, a method is developed for the robust segmentation of cell nuclei in histological images based on the principles of persistent homology. More specifically, an abstract simplicial homology approach for image segmentation is established. Essentially, the approach deals with the persistence of disconnected sets in the image, thus identifying salient regions that express patterns of persistence. By introducing an image representation based on topological features, the task of segmentation is less dependent on variations of color or texture. This results in a novel approach that generalizes well and provides stable performance. The method conceptualizes regions of interest (cell nuclei) pertinent to their topological features in a successful manner. The time cost of the proposed approach is lower-bounded by an almost linear behavior and upper-bounded by O(n2) in a worst-case scenario. Time complexity matches a quasilinear behavior which is O(n1+ɛ) for ε \textless 1. Images acquired from histological sections of liver tissue are used as a case study to demonstrate the effectiveness of the approach. The histological landscape consists of hepatocytes and non-parenchymal cells. The accuracy of the proposed methodology is verified against an automated workflow created by the output of a conventional filter bank (validated by experts) and the supervised training of a random forest classifier. The results are obtained on a per-object basis. The proposed workflow successfully detected both hepatocyte and non-parenchymal cell nuclei with an accuracy of 84.6%, and hepatocyte cell nuclei only with an accuracy of 86.2%. A public histological dataset with supplied ground-truth data is also used for evaluating the performance of the proposed approach (accuracy: 94.5%). Further validations are carried out with a publicly available dataset and ground-truth data from the Gland Segmentation in Colon Histology Images Challenge (GlaS) contest. The proposed method is useful for obtaining unsupervised robust initial segmentations that can be further integrated in image/data processing and management pipelines. The development of a fully automated system supporting a human expert provides tangible benefits in the context of clinical decision-making.
  208. Topological Data Analysis of Financial Time Series: Landscapes of Crashes (2017)

    Marian Gidea, Yuri Katz
    Abstract We explore the evolution of daily returns of four major US stock market indices during the technology crash of 2000, and the financial crisis of 2007-2009. Our methodology is based on topological data analysis (TDA). We use persistence homology to detect and quantify topological patterns that appear in multidimensional time series. Using a sliding window, we extract time-dependent point cloud data sets, to which we associate a topological space. We detect transient loops that appear in this space, and we measure their persistence. This is encoded in real-valued functions referred to as a 'persistence landscapes'. We quantify the temporal changes in persistence landscapes via their \$L\textasciicircump\$-norms. We test this procedure on multidimensional time series generated by various non-linear and non-equilibrium models. We find that, in the vicinity of financial meltdowns, the \$L\textasciicircump\$-norms exhibit strong growth prior to the primary peak, which ascends during a crash. Remarkably, the average spectral density at low frequencies of the time series of \$L\textasciicircump\$-norms of the persistence landscapes demonstrates a strong rising trend for 250 trading days prior to either dotcom crash on 03/10/2000, or to the Lehman bankruptcy on 09/15/2008. Our study suggests that TDA provides a new type of econometric analysis, which goes beyond the standard statistical measures. The method can be used to detect early warning signals of imminent market crashes. We believe that this approach can be used beyond the analysis of financial time series presented here.
  209. Pore Configuration Landscape of Granular Crystallization (2017)

    Mohammad Saadatfar, Hiroshi Takeuchi, Vanessa Robins, Nicolas Francois, Yisuaki Hiraoka
    Abstract Emergence and growth of crystalline domains in granular media remains under-explored. Here, the authors analyse tomographic snapshots from partially recrystallized packings of spheres using persistent homology and find agreement with proposed transitions based on continuous deformation of octahedral and tetrahedral voids.
  210. The Topology of the Cosmic Web in Terms of Persistent Betti Numbers (2017)

    Pratyush Pranav, Herbert Edelsbrunner, Rien van de Weygaert, Gert Vegter, Michael Kerber, Bernard J. T. Jones, Mathijs Wintraecken
    Abstract Abstract. We introduce a multiscale topological description of the Megaparsec web-like cosmic matter distribution. Betti numbers and topological persistence of
  211. What Can Topology Tell Us About the Neural Code? (2017)

    Carina Curto
    Abstract Neuroscience is undergoing a period of rapid experimental progress and expansion. New mathematical tools, previously unknown in the neuroscience community, are now being used to tackle fundamental questions and analyze emerging data sets. Consistent with this trend, the last decade has seen an uptick in the use of topological ideas and methods in neuroscience. In this paper I will survey recent applications of topology in neuroscience, and explain why topology is an especially natural tool for understanding neural codes.
  212. Single-Cell Topological RNA-Seq Analysis Reveals Insights Into Cellular Differentiation and Development (2017)

    Abbas H. Rizvi, Pablo G. Camara, Elena K. Kandror, Thomas J. Roberts, Ira Schieren, Tom Maniatis, Raul Rabadan
    Abstract Transcriptional programs control cellular lineage commitment and differentiation during development. Understanding cell fate has been advanced by studying single-cell RNA-seq, but is limited by the assumptions of current analytic methods regarding the structure of data. We present single-cell topological data analysis (scTDA), an algorithm for topology-based computational analyses to study temporal, unbiased transcriptional regulation. Compared to other methods, scTDA is a non-linear, model-independent, unsupervised statistical framework that can characterize transient cellular states. We applied scTDA to the analysis of murine embryonic stem cell (mESC) differentiation in vitro in response to inducers of motor neuron differentiation. scTDA resolved asynchrony and continuity in cellular identity over time, and identified four transient states (pluripotent, precursor, progenitor, and fully differentiated cells) based on changes in stage-dependent combinations of transcription factors, RNA-binding proteins and long non-coding RNAs. scTDA can be applied to study asynchronous cellular responses to either developmental cues or environmental perturbations.
  213. Uncovering Precision Phenotype-Biomarker Associations in Traumatic Brain Injury Using Topological Data Analysis (2017)

    Jessica L. Nielson, Shelly R. Cooper, John K. Yue, Marco D. Sorani, Tomoo Inoue, Esther L. Yuh, Pratik Mukherjee, Tanya C. Petrossian, Jesse Paquette, Pek Y. Lum, Gunnar E. Carlsson, Mary J. Vassar, Hester F. Lingsma, Wayne A. Gordon, Alex B. Valadka, David O. Okonkwo, Geoffrey T. Manley, Adam R. Ferguson, Track-Tbi Investigators
    Abstract Background Traumatic brain injury (TBI) is a complex disorder that is traditionally stratified based on clinical signs and symptoms. Recent imaging and molecular biomarker innovations provide unprecedented opportunities for improved TBI precision medicine, incorporating patho-anatomical and molecular mechanisms. Complete integration of these diverse data for TBI diagnosis and patient stratification remains an unmet challenge. Methods and findings The Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Pilot multicenter study enrolled 586 acute TBI patients and collected diverse common data elements (TBI-CDEs) across the study population, including imaging, genetics, and clinical outcomes. We then applied topology-based data-driven discovery to identify natural subgroups of patients, based on the TBI-CDEs collected. Our hypothesis was two-fold: 1) A machine learning tool known as topological data analysis (TDA) would reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery, and 2) TDA-identified biomarkers would significantly predict patient outcome recovery after TBI using more traditional methods of univariate statistical tests. TDA algorithms organized and mapped the data of TBI patients in multidimensional space, identifying a subset of mild TBI patients with a specific multivariate phenotype associated with unfavorable outcome at 3 and 6 months after injury. Further analyses revealed that this patient subset had high rates of post-traumatic stress disorder (PTSD), and enrichment in several distinct genetic polymorphisms associated with cellular responses to stress and DNA damage (PARP1), and in striatal dopamine processing (ANKK1, COMT, DRD2). Conclusions TDA identified a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms. Machine learning methods such as TDA may provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials in TBI patients. Trial Registration ClinicalTrials.gov Identifier NCT01565551
  214. Object-Oriented Persistent Homology (2016)

    Bao Wang, Guo-Wei Wei
    Abstract Persistent homology provides a new approach for the topological simplification of big data via measuring the life time of intrinsic topological features in a filtration process and has found its success in scientific and engineering applications. However, such a success is essentially limited to qualitative data classification and analysis. Indeed, persistent homology has rarely been employed for quantitative modeling and prediction. Additionally, the present persistent homology is a passive tool, rather than a proactive technique, for classification and analysis. In this work, we outline a general protocol to construct object-oriented persistent homology methods. By means of differential geometry theory of surfaces, we construct an objective functional, namely, a surface free energy defined on the data of interest. The minimization of the objective functional leads to a Laplace-Beltrami operator which generates a multiscale representation of the initial data and offers an objective oriented filtration process. The resulting differential geometry based object-oriented persistent homology is able to preserve desirable geometric features in the evolutionary filtration and enhances the corresponding topological persistence. The cubical complex based homology algorithm is employed in the present work to be compatible with the Cartesian representation of the Laplace-Beltrami flow. The proposed Laplace-Beltrami flow based persistent homology method is extensively validated. The consistence between Laplace-Beltrami flow based filtration and Euclidean distance based filtration is confirmed on the Vietoris-Rips complex for a large amount of numerical tests. The convergence and reliability of the present Laplace-Beltrami flow based cubical complex filtration approach are analyzed over various spatial and temporal mesh sizes. The Laplace-Beltrami flow based persistent homology approach is utilized to study the intrinsic topology of proteins and fullerene molecules. Based on a quantitative model which correlates the topological persistence of fullerene central cavity with the total curvature energy of the fullerene structure, the proposed method is used for the prediction of fullerene isomer stability. The efficiency and robustness of the present method are verified by more than 500 fullerene molecules. It is shown that the proposed persistent homology based quantitative model offers good predictions of total curvature energies for ten types of fullerene isomers. The present work offers the first example to design object-oriented persistent homology to enhance or preserve desirable features in the original data during the filtration process and then automatically detect or extract the corresponding topological traits from the data.
  215. Analysis of Kolmogorov Flow and Rayleigh–Bénard Convection Using Persistent Homology (2016)

    Miroslav Kramár, Rachel Levanger, Jeffrey Tithof, Balachandra Suri, Mu Xu, Mark Paul, Michael F. Schatz, Konstantin Mischaikow
    Abstract We use persistent homology to build a quantitative understanding of large complex systems that are driven far-from-equilibrium. In particular, we analyze image time series of flow field patterns from numerical simulations of two important problems in fluid dynamics: Kolmogorov flow and Rayleigh–Bénard convection. For each image we compute a persistence diagram to yield a reduced description of the flow field; by applying different metrics to the space of persistence diagrams, we relate characteristic features in persistence diagrams to the geometry of the corresponding flow patterns. We also examine the dynamics of the flow patterns by a second application of persistent homology to the time series of persistence diagrams. We demonstrate that persistent homology provides an effective method both for quotienting out symmetries in families of solutions and for identifying multiscale recurrent dynamics. Our approach is quite general and it is anticipated to be applicable to a broad range of open problems exhibiting complex spatio-temporal behavior.
  216. Using Persistent Homology and Dynamical Distances to Analyze Protein Binding (2016)

    Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolić, Giseon Heo
    Abstract Persistent homology captures the evolution of topological features of a model as a parameter changes. The most commonly used summary statistics of persistent homology are the barcode and the persistence diagram. Another summary statistic, the persistence landscape, was recently introduced by Bubenik. It is a functional summary, so it is easy to calculate sample means and variances, and it is straightforward to construct various test statistics. Implementing a permutation test we detect conformational changes between closed and open forms of the maltose-binding protein, a large biomolecule consisting of 370 amino acid residues. Furthermore, persistence landscapes can be applied to machine learning methods. A hyperplane from a support vector machine shows the clear separation between the closed and open proteins conformations. Moreover, because our approach captures dynamical properties of the protein our results may help in identifying residues susceptible to ligand binding; we show that the majority of active site residues and allosteric pathway residues are located in the vicinity of the most persistent loop in the corresponding filtered Vietoris-Rips complex. This finding was not observed in the classical anisotropic network model.
  217. Persistent Homology on Grassmann Manifolds for Analysis of Hyperspectral Movies (2016)

    Sofya Chepushtanova, Michael Kirby, Chris Peterson, Lori Ziegelmeier
    Abstract The existence of characteristic structure, or shape, in complex data sets has been recognized as increasingly important for mathematical data analysis. This realization has motivated the development of new tools such as persistent homology for exploring topological invariants, or features, in large data sets. In this paper, we apply persistent homology to the characterization of gas plumes in time dependent sequences of hyperspectral cubes, i.e. the analysis of 4-way arrays. We investigate hyperspectral movies of Long-Wavelength Infrared data monitoring an experimental release of chemical simulant into the air. Our approach models regions of interest within the hyperspectral data cubes as points on the real Grassmann manifold Gk,ï źn whose points parameterize the k-dimensional subspaces of \$\$\mathbb \R\\textasciicircumn\$\$Rn, contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows a sequence of time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmann manifold. This motivates the search for topological features, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the Grassmann manifold. The proposed mathematical model affords the processing of large data sets while retaining valuable discriminatory information. In this paper, we discuss how embedding our data in the Grassmann manifold, together with topological data analysis, captures dynamical events that occur as the chemical plume is released and evolves.
  218. Chatter Detection in Turning Using Persistent Homology (2016)

    Firas A. Khasawneh, Elizabeth Munch
    Abstract This paper describes a new approach for ascertaining the stability of stochastic dynamical systems in their parameter space by examining their time series using topological data analysis (TDA). We illustrate the approach using a nonlinear delayed model that describes the tool oscillations due to self-excited vibrations in turning. Each time series is generated using the Euler-Maruyama method and a corresponding point cloud is obtained using the Takens embedding. The point cloud can then be analyzed using a tool from TDA known as persistent homology. The results of this study show that the described approach can be used for analyzing datasets of delay dynamical systems generated both from numerical simulation and experimental data. The contributions of this paper include presenting for the first time a topological approach for investigating the stability of a class of nonlinear stochastic delay equations, and introducing a new application of TDA to machining processes.
  219. The Classification of Endoscopy Images With Persistent Homology (2016)

    Olga Dunaeva, Herbert Edelsbrunner, Anton Lukyanov, Michael Machin, Daria Malkova, Roman Kuvaev, Sergey Kashin
    Abstract Aiming at the automatic diagnosis of tumors using narrow band imaging (NBI) magnifying endoscopic (ME) images of the stomach, we combine methods from image processing, topology, geometry, and machine learning to classify patterns into three classes: oval, tubular and irregular. Training the algorithm on a small number of images of each type, we achieve a high rate of correct classifications. The analysis of the learning algorithm reveals that a handful of geometric and topological features are responsible for the overwhelming majority of decisions.
  220. Persistent Homology Analysis of Brain Artery Trees (2016)

    Paul Bendich, J. S. Marron, Ezra Miller, Alex Pieloch, Sean Skwerer
    Abstract New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries.
  221. Hierarchical Structures of Amorphous Solids Characterized by Persistent Homology (2016)

    Yasuaki Hiraoka, Takenobu Nakamura, Akihiko Hirata, Emerson G. Escolar, Kaname Matsue, Yasumasa Nishiura
    Abstract This article proposes a topological method that extracts hierarchical structures of various amorphous solids. The method is based on the persistence diagram (PD), a mathematical tool for capturing shapes of multiscale data. The input to the PDs is given by an atomic configuration and the output is expressed as 2D histograms. Then, specific distributions such as curves and islands in the PDs identify meaningful shape characteristics of the atomic configuration. Although the method can be applied to a wide variety of disordered systems, it is applied here to silica glass, the Lennard-Jones system, and Cu-Zr metallic glass as standard examples of continuous random network and random packing structures. In silica glass, the method classified the atomic rings as short-range and medium-range orders and unveiled hierarchical ring structures among them. These detailed geometric characterizations clarified a real space origin of the first sharp diffraction peak and also indicated that PDs contain information on elastic response. Even in the Lennard-Jones system and Cu-Zr metallic glass, the hierarchical structures in the atomic configurations were derived in a similar way using PDs, although the glass structures and properties substantially differ from silica glass. These results suggest that the PDs provide a unified method that extracts greater depth of geometric information in amorphous solids than conventional methods.
  222. Topological Data Analysis Generates High-Resolution, Genome-Wide Maps of Human Recombination (2016)

    Pablo G. Camara, Daniel I. S. Rosenbloom, Kevin J. Emmett, Arnold J. Levine, Raul Rabadan
    Abstract Meiotic recombination is a fundamental evolutionary process driving diversity in eukaryotes. In mammals, recombination is known to occur preferentially at specific genomic regions. Using topological data analysis (TDA), a branch of applied topology that extracts global features from large data sets, we developed an efficient method for mapping recombination at fine scales. When compared to standard linkage-based methods, TDA can deal with a larger number of SNPs and genomes without incurring prohibitive computational costs. We applied TDA to 1,000 Genomes Project data and constructed high-resolution whole-genome recombination maps of seven human populations. Our analysis shows that recombination is generally under-represented within transcription start sites. However, the binding sites of specific transcription factors are enriched for sites of recombination. These include transcription factors that regulate the expression of meiosis- and gametogenesis-specific genes, cell cycle progression, and differentiation blockage. Additionally, our analysis identifies an enrichment for sites of recombination at repeat-derived loci matched by piwi-interacting RNAs.
  223. WDR76 Co-Localizes With Heterochromatin Related Proteins and Rapidly Responds to DNA Damage (2016)

    Joshua M. Gilmore, Mihaela E. Sardiu, Brad D. Groppe, Janet L. Thornton, Xingyu Liu, Gerald Dayebgadoh, Charles A. Banks, Brian D. Slaughter, Jay R. Unruh, Jerry L. Workman, Laurence Florens, Michael P. Washburn
    Abstract Proteins that respond to DNA damage play critical roles in normal and diseased states in human biology. Studies have suggested that the S. cerevisiae protein CMR1/YDL156w is associated with histones and is possibly associated with DNA repair and replication processes. Through a quantitative proteomic analysis of affinity purifications here we show that the human homologue of this protein, WDR76, shares multiple protein associations with the histones H2A, H2B, and H4. Furthermore, our quantitative proteomic analysis of WDR76 associated proteins demonstrated links to proteins in the DNA damage response like PARP1 and XRCC5 and heterochromatin related proteins like CBX1, CBX3, and CBX5. Co-immunoprecipitation studies validated these interactions. Next, quantitative imaging studies demonstrated that WDR76 was recruited to laser induced DNA damage immediately after induction, and we compared the recruitment of WDR76 to laser induced DNA damage to known DNA damage proteins like PARP1, XRCC5, and RPA1. In addition, WDR76 co-localizes to puncta with the heterochromatin proteins CBX1 and CBX5, which are also recruited to DNA damage but much less intensely than WDR76. This work demonstrates the chromatin and DNA damage protein associations of WDR76 and demonstrates the rapid response of WDR76 to laser induced DNA damage.
  224. Inference of Ancestral Recombination Graphs Through Topological Data Analysis (2016)

    Pablo G. Cámara, Arnold J. Levine, Raúl Rabadán
    Abstract The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands., Evolution occurs through different mechanisms, including point mutations, gene duplication, horizontal gene transfer, and recombinations. Some of these mechanisms cannot be captured by tree graphs. We present a framework, based on the mathematical tools of computational topology, that can explicitly accommodate both recombination and mutation events across the evolutionary history of a sample of genomic sequences. This approach generates a new type of summary graph and algebraic structures that provide quantitative information on the evolutionary scale and frequency of recombination events. The accompanying software, TARGet, is applied to several examples, including migration between sexually-reproducing populations, human recombination, and recombination in Darwin’s finches.
  225. Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)

    Kelin Xia, Zhixiong Zhao, Guo-Wei Wei
    Abstract Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.
  226. Persistent Homology for Path Planning in Uncertain Environments (2015)

    S. Bhattacharya, R. Ghrist, V. Kumar
    Abstract We address the fundamental problem of goal-directed path planning in an uncertain environment represented as a probability (of occupancy) map. Most methods generally use a threshold to reduce the grayscale map to a binary map before applying off-the-shelf techniques to find the best path. This raises the somewhat ill-posed question, what is the right (optimal) value to threshold the map? We instead suggest a persistent homology approach to the problem-a topological approach in which we seek the homology class of trajectories that is most persistent for the given probability map. In other words, we want the class of trajectories that is free of obstacles over the largest range of threshold values. In order to make this problem tractable, we use homology in ℤ2 coefficients (instead of the standard ℤ coefficients), and describe how graph search-based algorithms can be used to find trajectories in different homology classes. Our simulation results demonstrate the efficiency and practical applicability of the algorithm proposed in this paper.paper.
  227. Using Persistent Homology to Reveal Hidden Information in Neural Data (2015)

    Gard Spreemann, Benjamin Dunn, Magnus Bakke Botnan, Nils A. Baas
    Abstract We propose a method, based on persistent homology, to uncover topological properties of a priori unknown covariates of neuron activity. Our input data consist of spike train measurements of a set of neurons of interest, a candidate list of the known stimuli that govern neuron activity, and the corresponding state of the animal throughout the experiment performed. Using a generalized linear model for neuron activity and simple assumptions on the effects of the external stimuli, we infer away any contribution to the observed spike trains by the candidate stimuli. Persistent homology then reveals useful information about any further, unknown, covariates.
  228. Evasion Paths in Mobile Sensor Networks (2015)

    Henry Adams, Gunnar Carlsson
    Abstract Suppose that ball-shaped sensors wander in a bounded domain. A sensor does not know its location but does know when it overlaps a nearby sensor. We say that an evasion path exists in this sensor network if a moving intruder can avoid detection. In ‘Coordinate-free coverage in sensor networks with controlled boundaries via homology', Vin de Silva and Robert Ghrist give a necessary condition, depending only on the time-varying connectivity data of the sensors, for an evasion path to exist. Using zigzag persistent homology, we provide an equivalent condition that moreover can be computed in a streaming fashion. However, no method with time-varying connectivity data as input can give necessary and sufficient conditions for the existence of an evasion path. Indeed, we show that the existence of an evasion path depends not only on the fibrewise homotopy type of the region covered by sensors but also on its embedding in spacetime. For planar sensors that also measure weak rotation and distance information, we provide necessary and sufficient conditions for the existence of an evasion path.
  229. Skeletonization and Partitioning of Digital Images Using Discrete Morse Theory (2015)

    Olaf Delgado-Friedrichs, Vanessa Robins, Adrian Sheppard
    Abstract We show how discrete Morse theory provides a rigorous and unifying foundation for defining skeletons and partitions of grayscale digital images. We model a grayscale image as a cubical complex with a real-valued function defined on its vertices (the voxel values). This function is extended to a discrete gradient vector field using the algorithm presented in Robins, Wood, Sheppard TPAMI 33:1646 (2011). In the current paper we define basins (the building blocks of a partition) and segments of the skeleton using the stable and unstable sets associated with critical cells. The natural connection between Morse theory and homology allows us to prove the topological validity of these constructions; for example, that the skeleton is homotopic to the initial object. We simplify the basins and skeletons via Morse-theoretic cancellation of critical cells in the discrete gradient vector field using a strategy informed by persistent homology. Simple working Python code for our algorithms for efficient vector field traversal is included. Example data are taken from micro-CT images of porous materials, an application area where accurate topological models of pore connectivity are vital for fluid-flow modelling.
  230. Multidimensional Persistence in Biomolecular Data (2015)

    Kelin Xia, Guo-Wei Wei
    Abstract Persistent homology has emerged as a popular technique for the topological simplification of big data, including biomolecular data. Multidimensional persistence bears considerable promise to bridge the gap between geometry and topology. However, its practical and robust construction has been a challenge. We introduce two families of multidimensional persistence, namely pseudo-multidimensional persistence and multiscale multidimensional persistence. The former is generated via the repeated applications of persistent homology filtration to high dimensional data, such as results from molecular dynamics or partial differential equations. The latter is constructed via isotropic and anisotropic scales that create new simiplicial complexes and associated topological spaces. The utility, robustness and efficiency of the proposed topological methods are demonstrated via protein folding, protein flexibility analysis, the topological denoising of cryo-electron microscopy data, and the scale dependence of nano particles. Topological transition between partial folded and unfolded proteins has been observed in multidimensional persistence. The separation between noise topological signatures and molecular topological fingerprints is achieved by the Laplace-Beltrami flow. The multiscale multidimensional persistent homology reveals relative local features in Betti-0 invariants and the relatively global characteristics of Betti-1 and Betti-2 invariants.
  231. Persistent Topology for Cryo-Em Data Analysis (2015)

    Kelin Xia, Guo-Wei Wei
    Abstract SummaryIn this work, we introduce persistent homology for the analysis of cryo-electron microscopy (cryo-EM) density maps. We identify the topological fingerprint or topological signature of noise, which is widespread in cryo-EM data. For low signal-to-noise ratio (SNR) volumetric data, intrinsic topological features of biomolecular structures are indistinguishable from noise. To remove noise, we employ geometric flows that are found to preserve the intrinsic topological fingerprints of cryo-EM structures and diminish the topological signature of noise. In particular, persistent homology enables us to visualize the gradual separation of the topological fingerprints of cryo-EM structures from those of noise during the denoising process, which gives rise to a practical procedure for prescribing a noise threshold to extract cryo-EM structure information from noise contaminated data after certain iterations of the geometric flow equation. To further demonstrate the utility of persistent homology for cryo-EM data analysis, we consider a microtubule intermediate structure Electron Microscopy Data (EMD 1129). Three helix models, an alpha-tubulin monomer model, an alpha-tubulin and beta-tubulin model, and an alpha-tubulin and beta-tubulin dimer model, are constructed to fit the cryo-EM data. The least square fitting leads to similarly high correlation coefficients, which indicates that structure determination via optimization is an ill-posed inverse problem. However, these models have dramatically different topological fingerprints. Especially, linkages or connectivities that discriminate one model from another, play little role in the traditional density fitting or optimization but are very sensitive and crucial to topological fingerprints. The intrinsic topological features of the microtubule data are identified after topological denoising. By a comparison of the topological fingerprints of the original data and those of three models, we found that the third model is topologically favored. The present work offers persistent homology based new strategies for topological denoising and for resolving ill-posed inverse problems. Copyright © 2015 John Wiley & Sons, Ltd.
  232. Fruit Flies and Moduli: Interactions Between Biology and Mathematics (2015)

    Ezra Miller
    Abstract Possibilities for using geometry and topology to analyze statistical problems in biology raise a host of novel questions in geometry, probability, algebra, and combinatorics that demonstrate the power of biology to influence the future of pure mathematics. This expository article is a tour through some biological explorations and their mathematical ramifications. The article starts with evolution of novel topological features in wing veins of fruit flies, which are quantified using the algebraic structure of multiparameter persistent homology. The statistical issues involved highlight mathematical implications of sampling from moduli spaces. These lead to geometric probability on stratified spaces, including the sticky phenomenon for Frechet means and the origin of this mathematical area in the reconstruction of phylogenetic trees.
  233. Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis (2015)

    Jose A. Perea, John Harer
    Abstract We develop in this paper a theoretical framework for the topological study of time series data. Broadly speaking, we describe geometrical and topological properties of sliding window embeddings, as seen through the lens of persistent homology. In particular, we show that maximum persistence at the point-cloud level can be used to quantify periodicity at the signal level, prove structural and convergence theorems for the resulting persistence diagrams, and derive estimates for their dependency on window size and embedding dimension. We apply this methodology to quantifying periodicity in synthetic data sets and compare the results with those obtained using state-of-the-art methods in gene expression analysis. We call this new method SW1PerS, which stands for Sliding Windows and 1-Dimensional Persistence Scoring.
  234. Persistent Homology and Many-Body Atomic Structure for Medium-Range Order in the Glass (2015)

    Takenobu Nakamura, Yasuaki Hiraoka, Akihiko Hirata, Emerson G. Escolar, Yasumasa Nishiura
    Abstract The characterization of the medium-range (MRO) order in amorphous materials and its relation to the short-range order is discussed. A new topological approach to extract a hierarchical structure of amorphous materials is presented, which is robust against small perturbations and allows us to distinguish it from periodic or random configurations. This method is called the persistence diagram (PD) and introduces scales to many-body atomic structures to facilitate size and shape characterization. We first illustrate the representation of perfect crystalline and random structures in PDs. Then, the MRO in amorphous silica is characterized using the appropriate PD. The PD approach compresses the size of the data set significantly, to much smaller geometrical summaries, and has considerable potential for application to a wide range of materials, including complex molecular liquids, granular materials, and metallic glasses.
  235. Statistical Topological Data Analysis - A Kernel Perspective (2015)

    Roland Kwitt, Stefan Huber, Marc Niethammer, Weili Lin, Ulrich Bauer
    Abstract We consider the problem of statistical computations with persistence diagrams, a summary representation of topological features in data. These diagrams encode persistent homology, a widely used invariant in topological data analysis. While several avenues towards a statistical treatment of the diagrams have been explored recently, we follow an alternative route that is motivated by the success of methods based on the embedding of probability measures into reproducing kernel Hilbert spaces. In fact, a positive definite kernel on persistence diagrams has recently been proposed, connecting persistent homology to popular kernel-based learning techniques such as support vector machines. However, important properties of that kernel enabling a principled use in the context of probability measure embeddings remain to be explored. Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data.
  236. Conserved Abundance and Topological Features in Chromatin-Remodeling Protein Interaction Networks (2015)

    Mihaela E Sardiu, Joshua M Gilmore, Brad D Groppe, Damir Herman, Sreenivasa R Ramisetty, Yong Cai, Jingji Jin, Ronald C Conaway, Joan W Conaway, Laurence Florens, Michael P Washburn
    Abstract Abstract The study of conserved protein interaction networks seeks to better understand the evolution and regulation of protein interactions. Here, we present a quantitative proteomic analysis of 18 orthologous baits from three distinct chromatin-remodeling complexes in Saccharomyces cerevisiae and Homo sapiens. We demonstrate that abundance levels of orthologous proteins correlate strongly between the two organisms and both networks have highly similar topologies. We therefore used the protein abundances in one species to cross-predict missing protein abundance levels in the other species. Lastly, we identified a novel conserved low-abundance subnetwork further demonstrating the value of quantitative analysis of networks.
  237. A Stable Multi-Scale Kernel for Topological Machine Learning (2015)

    Jan Reininghaus, Stefan Huber, Ulrich Bauer, Roland Kwitt
    Abstract Topological data analysis offers a rich source of valuable information to study vision problems. Yet, so far we lack a theoretically sound connection to popular kernel-based learning techniques, such as kernel SVMs or kernel PCA. In this work, we establish such a connection by designing a multi-scale kernel for persistence diagrams, a stable summary representation of topological features in data. We show that this kernel is positive definite and prove its stability with respect to the 1-Wasserstein distance. Experiments on two benchmark datasets for 3D shape classification/retrieval and texture recognition show considerable performance gains of the proposed method compared to an alternative approach that is based on the recently introduced persistence landscapes.
  238. Topological Data Analysis of Contagion Maps for Examining Spreading Processes on Networks (2015)

    Dane Taylor, Florian Klimm, Heather A. Harrington, Miroslav Kramár, Konstantin Mischaikow, Mason A. Porter, Peter J. Mucha
    Abstract Social and biological contagions are influenced by the spatial embeddedness of networks. Historically, many epidemics spread as a wave across part of the Earth’s surface; however, in modern contagions long-range edges—for example, due to airline transportation or communication media—allow clusters of a contagion to appear in distant locations. Here we study the spread of contagions on networks through a methodology grounded in topological data analysis and nonlinear dimension reduction. We construct ‘contagion maps’ that use multiple contagions on a network to map the nodes as a point cloud. By analysing the topology, geometry and dimensionality of manifold structure in such point clouds, we reveal insights to aid in the modelling, forecast and control of spreading processes. Our approach highlights contagion maps also as a viable tool for inferring low-dimensional structure in networks.
  239. Topological Data Analysis of Biological Aggregation Models (2015)

    Chad M. Topaz, Lori Ziegelmeier, Tom Halverson
    Abstract We apply tools from topological data analysis to two mathematical models inspired by biological aggregations such as bird flocks, fish schools, and insect swarms. Our data consists of numerical simulation output from the models of Vicsek and D'Orsogna. These models are dynamical systems describing the movement of agents who interact via alignment, attraction, and/or repulsion. Each simulation time frame is a point cloud in position-velocity space. We analyze the topological structure of these point clouds, interpreting the persistent homology by calculating the first few Betti numbers. These Betti numbers count connected components, topological circles, and trapped volumes present in the data. To interpret our results, we introduce a visualization that displays Betti numbers over simulation time and topological persistence scale. We compare our topological results to order parameters typically used to quantify the global behavior of aggregations, such as polarization and angular momentum. The topological calculations reveal events and structure not captured by the order parameters.
  240. A Topological Measurement of Protein Compressibility (2015)

    Marcio Gameiro, Yasuaki Hiraoka, Shunsuke Izumi, Miroslav Kramar, Konstantin Mischaikow, Vidit Nanda
    Abstract In this paper we partially clarify the relation between the compressibility of a protein and its molecular geometric structure. To identify and understand the relevant topological features within a given protein, we model its molecule as an alpha filtration and hence obtain multi-scale insight into the structure of its tunnels and cavities. The persistence diagrams of this alpha filtration capture the sizes and robustness of such tunnels and cavities in a compact and meaningful manner. From these persistence diagrams, we extract a measure of compressibility derived from those topological features whose relevance is suggested by physical and chemical properties. Due to recent advances in combinatorial topology, this measure is efficiently and directly computable from information found in the Protein Data Bank (PDB). Our main result establishes a clear linear correlation between the topological measure and the experimentally-determined compressibility of most proteins for which both PDB information and experimental compressibility data are available. Finally, we establish that both the topological measurement and the linear correlation are stable with respect to small perturbations in the input data, such as those arising from experimental errors in compressibility and X-ray crystallography experiments.
  241. Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

    Paolo Bajardi, Matteo Delfino, André Panisson, Giovanni Petri, Michele Tizzoni
    Abstract We analyse a large mobile phone activity dataset provided by Telecom Italia for the Telecom Big Data Challenge contest. The dataset reports the international country codes of every call/SMS made and received by mobile phone users in Milan, Italy, between November and December 2013, with a spatial resolution of about 200 meters. We first show that the observed spatial distribution of international codes well matches the distribution of international communities reported by official statistics, confirming the value of mobile phone data for demographic research. Next, we define an entropy function to measure the heterogeneity of the international phone activity in space and time. By comparing the entropy function to empirical data, we show that it can be used to identify the city’s hotspots, defined by the presence of points of interests. Eventually, we use the entropy function to characterize the spatial distribution of international communities in the city. Adopting a topological data analysis approach, we find that international mobile phone users exhibit some robust clustering patterns that correlate with basic socio-economic variables. Our results suggest that mobile phone records can be used in conjunction with topological data analysis tools to study the geography of migrant communities in a global city.
  242. Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology (2015)

    Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, Catherine Park
    Abstract DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer.
  243. Homological Scaffolds of Brain Functional Networks (2014)

    G. Petri, P. Expert, F. Turkheimer, R. Carhart-Harris, D. Nutt, P. J. Hellyer, F. Vaccarino
    Abstract Networks, as efficient representations of complex systems, have appealed to scientists for a long time and now permeate many areas of science, including neuroimaging (Bullmore and Sporns 2009 Nat. Rev. Neurosci.10, 186–198. (doi:10.1038/nrn2618)). Traditionally, the structure of complex networks has been studied through their statistical properties and metrics concerned with node and link properties, e.g. degree-distribution, node centrality and modularity. Here, we study the characteristics of functional brain networks at the mesoscopic level from a novel perspective that highlights the role of inhomogeneities in the fabric of functional connections. This can be done by focusing on the features of a set of topological objects—homological cycles—associated with the weighted functional network. We leverage the detected topological information to define the homological scaffolds, a new set of objects designed to represent compactly the homological features of the correlation network and simultaneously make their homological properties amenable to networks theoretical methods. As a proof of principle, we apply these tools to compare resting-state functional brain activity in 15 healthy volunteers after intravenous infusion of placebo and psilocybin—the main psychoactive component of magic mushrooms. The results show that the homological structure of the brain's functional patterns undergoes a dramatic change post-psilocybin, characterized by the appearance of many transient structures of low stability and of a small number of persistent ones that are not observed in the case of placebo.
  244. Topological Pattern Recognition for Point Cloud Data* (2014)

    Gunnar Carlsson
    Abstract In this paper we discuss the adaptation of the methods of homology from algebraic topology to the problem of pattern recognition in point cloud data sets. The method is referred to as persistent homology, and has numerous applications to scientific problems. We discuss the definition and computation of homology in the standard setting of simplicial complexes and topological spaces, then show how one can obtain useful signatures, called barcodes, from finite metric spaces, thought of as sampled from a continuous object. We present several different cases where persistent homology is used, to illustrate the different ways in which the method can be applied.
  245. Morse Theory and Persistent Homology for Topological Analysis of 3D Images of Complex Materials (2014)

    O. Delgado-Friedrichs, V. Robins, A. Sheppard
    Abstract We develop topologically accurate and compatible definitions for the skeleton and watershed segmentation of a 3D digital object that are computed by a single algorithm. These definitions are based on a discrete gradient vector field derived from a signed distance transform. This gradient vector field is amenable to topological analysis and simplification via For-man's discrete Morse theory and provides a filtration that can be used as input to persistent homology algorithms. Efficient implementations allow us to process large-scale x-ray micro-CT data of rock cores and other materials.
  246. Persistent Homology Analysis of Protein Structure, Flexibility, and Folding (2014)

    Kelin Xia, Guo-Wei Wei
    Abstract SUMMARYProteins are the most important biomolecules for living organisms. The understanding of protein structure, function, dynamics, and transport is one of the most challenging tasks in biological science. In the present work, persistent homology is, for the first time, introduced for extracting molecular topological fingerprints (MTFs) based on the persistence of molecular topological invariants. MTFs are utilized for protein characterization, identification, and classification. The method of slicing is proposed to track the geometric origin of protein topological invariants. Both all-atom and coarse-grained representations of MTFs are constructed. A new cutoff-like filtration is proposed to shed light on the optimal cutoff distance in elastic network models. On the basis of the correlation between protein compactness, rigidity, and connectivity, we propose an accumulated bar length generated from persistent topological invariants for the quantitative modeling of protein flexibility. To this end, a correlation matrix-based filtration is developed. This approach gives rise to an accurate prediction of the optimal characteristic distance used in protein B-factor analysis. Finally, MTFs are employed to characterize protein topological evolution during protein folding and quantitatively predict the protein folding stability. An excellent consistence between our persistent homology prediction and molecular dynamics simulation is found. This work reveals the topology–function relationship of proteins. Copyright © 2014 John Wiley & Sons, Ltd.
  247. Parametric Inference Using Persistence Diagrams: a Case Study in Population Genetics (2014)

    Kevin Emmett, Daniel Rosenbloom, Pablo Camara, Raul Rabadan
    Abstract Persistent homology computes topological invariants from point cloud data. Recent work has focused on developing statistical methods for data analysis in this framework. We show that, in certain models, parametric inference can be performed using statistics defined on the computed invariants. We develop this idea with a model from population genetics, the coalescent with recombination. We apply our model to an influenza dataset, identifying two scales of topological structure which have a distinct biological interpretation.
  248. Coverage Criterion in Sensor Networks Stable Under Perturbation (2014)

    Yasuaki Hiraoka, Genki Kusano
    Abstract To the coverage problem of sensor networks, V. de Silva and R. Ghrist (2007) developed several approaches based on (persistent) homology theory. Their criteria for the coverage are formulated on the Rips complexes constructed by the sensors, in which their locations are supposed to be fixed. However, the sensors are in general affected by perturbations (e.g., natural phenomena), and hence the stability of the coverage criteria should be also discussed. In this paper, we present a coverage theorem stable under perturbation. Furthermore, we also introduce a method of eliminating redundant cover after perturbation. The coverage theorem is derived by extending the Rips interleaving theorem studied by F. Chazal, V. de Silva, and S. Oudot (2013) into an appropriate relative version.
  249. Characterizing Scales of Genetic Recombination and Antibiotic Resistance in Pathogenic Bacteria Using Topological Data Analysis (2014)

    Kevin J. Emmett, Raul Rabadan
    Abstract Pathogenic bacteria present a large disease burden on human health. Control of these pathogens is hampered by rampant lateral gene transfer, whereby pathogenic strains may acquire genes conferring resistance to common antibiotics. Here we introduce tools from topological data analysis to characterize the frequency and scale of lateral gene transfer in bacteria, focusing on a set of pathogens of significant public health relevance. As a case study, we examine the spread of antibiotic resistance in Staphylococcus aureus. Finally, we consider the possible role of the human microbiome as a reservoir for antibiotic resistance genes.
  250. Applications of Persistent Homology to Time Varying Systems (2013)

    Elizabeth Munch
    Abstract \textlessp\textgreaterThis dissertation extends the theory of persistent homology to time varying systems. Most of the previous work has been dedicated to using this powerful tool in topological data analysis to study static point clouds. In particular, given a point cloud, we can construct its persistence diagram. Since the diagram varies continuously as the point cloud varies continuously, we study the space of time varying persistence diagrams, called vineyards when they were introduced by Cohen-Steiner, Edelsbrunner, and Morozov.\textless/p\textgreater\textlessp\textgreaterWe will first show that with a good choice of metric, these vineyards are stable for small perturbations of their associated point clouds. We will also define a new mean for a set of persistence diagrams based on the work of Mileyko et al. which, unlike the previously defined mean, is continuous for geodesic vineyards. \textless/p\textgreater\textlessp\textgreaterNext, we study the sensor network problem posed by Ghrist and de Silva, and their application of persistent homology to understand when a set of sensors covers a given region. Giving each of these sensors a probability of failure over time, we show that an exact computation of the probability of failure of the whole system is NP-hard, but give an algorithm which can predict failure in the case of a monitored system.\textless/p\textgreater\textlessp\textgreaterFinally, we apply these methods to an automated system which can cluster agents moving in aerial images by their behaviors. We build a data structure for storing and querying the information in real-time, and define behavior vectors which quantify behaviors of interest. This clustering by behavior can be used to find groups of interest, for which we can also quantify behaviors in order to determine whether the group is working together to achieve a common goal, and we speculate that this work can be extended to improving tracking algorithms as well as behavioral predictors.\textless/p\textgreater
  251. Grasping Objects With Holes: A Topological Approach (2013)

    F. T. Pokorny, J. A. Stork, D. Kragic
    Abstract This work proposes a topologically inspired approach for generating robot grasps on objects with `holes'. Starting from a noisy point-cloud, we generate a simplicial representation of an object of interest and use a recently developed method for approximating shortest homology generators to identify graspable loops. To control the movement of the robot hand, a topologically motivated coordinate system is used in order to wrap the hand around such loops. Finally, another concept from topology - namely the Gauss linking integral - is adapted to serve as evidence for secure caging grasps after a grasp has been executed. We evaluate our approach in simulation on a Barrett hand using several target objects of different sizes and shapes and present an initial experiment with real sensor data.
  252. A Topology-Based Object Representation for Clasping, Latching and Hooking (2013)

    J. A. Stork, F. T. Pokorny, D. Kragic
    Abstract We present a loop-based topological object representation for objects with holes. The representation is used to model object parts suitable for grasping, e.g. handles, and it incorporates local volume information about these. Furthermore, we present a grasp synthesis framework that utilizes this representation for synthesizing caging grasps that are robust under measurement noise. The approach is complementary to a local contact-based force-closure analysis as it depends on global topological features of the object. We perform an extensive evaluation with four robotic hands on synthetic data. Additionally, we provide real world experiments using a Kinect sensor on two robotic platforms: a Schunk dexterous hand attached to a Kuka robot arm as well as a Nao humanoid robot. In the case of the Nao platform, we provide initial experiments showing that our approach can be used to plan whole arm hooking as well as caging grasps involving only one hand.
  253. Topology of Viral Evolution (2013)

    Joseph Minhow Chan, Gunnar Carlsson, Raul Rabadan
    Abstract The tree structure is currently the accepted paradigm to represent evolutionary relationships between organisms, species or other taxa. However, horizontal, or reticulate, genomic exchanges are pervasive in nature and confound characterization of phylogenetic trees. Drawing from algebraic topology, we present a unique evolutionary framework that comprehensively captures both clonal and reticulate evolution. We show that whereas clonal evolution can be summarized as a tree, reticulate evolution exhibits nontrivial topology of dimension greater than zero. Our method effectively characterizes clonal evolution, reassortment, and recombination in RNA viruses. Beyond detecting reticulate evolution, we succinctly recapitulate the history of complex genetic exchanges involving more than two parental strains, such as the triple reassortment of H7N9 avian influenza and the formation of circulating HIV-1 recombinants. In addition, we identify recurrent, large-scale patterns of reticulate evolution, including frequent PB2-PB1-PA-NP cosegregation during avian influenza reassortment. Finally, we bound the rate of reticulate events (i.e., 20 reassortments per year in avian influenza). Our method provides an evolutionary perspective that not only captures reticulate events precluding phylogeny, but also indicates the evolutionary scales where phylogenetic inference could be accurate.
  254. Persistent Brain Network Homology From the Perspective of Dendrogram (2012)

    Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, Dong Soo Lee
    Abstract The brain network is usually constructed by estimating the connectivity matrix and thresholding it at an arbitrary level. The problem with this standard method is that we do not have any generally accepted criteria for determining a proper threshold. Thus, we propose a novel multiscale framework that models all brain networks generated over every possible threshold. Our approach is based on persistent homology and its various representations such as the Rips filtration, barcodes, and dendrograms. This new persistent homological framework enables us to quantify various persistent topological features at different scales in a coherent manner. The barcode is used to quantify and visualize the evolutionary changes of topological features such as the Betti numbers over different scales. By incorporating additional geometric information to the barcode, we obtain a single linkage dendrogram that shows the overall evolution of the network. The difference between the two networks is then measured by the Gromov-Hausdorff distance over the dendrograms. As an illustration, we modeled and differentiated the FDG-PET based functional brain networks of 24 attention-deficit hyperactivity disorder children, 26 autism spectrum disorder children, and 11 pediatric control subjects.