🍩 Database of Original & Non-Theoretical Uses of Topology

(found 495 matches in 0.052423s)
  1. Topology-Aware Segmentation Using Discrete Morse Theory (2021)

    Xiaoling Hu, Yusu Wang, Li Fuxin, Dimitris Samaras, Chao Chen
    Abstract In the segmentation of fine-scale structures from natural and biomedical images, per-pixel accuracy is not the only metric of concern. Topological correctness, such as vessel connectivity and membrane closure, is crucial for downstream analysis tasks. In this paper, we propose a new approach to train deep image segmentation networks for better topological accuracy. In particular, leveraging the power of discrete Morse theory (DMT), we identify global structures, including 1D skeletons and 2D patches, which are important for topological accuracy. Trained with a novel loss based on these global structures, the network performance is significantly improved especially near topologically challenging locations (such as weak spots of connections and membranes). On diverse datasets, our method achieves superior performance on both the DICE score and topological metrics.
  2. Branching and Circular Features in High Dimensional Data (2011)

    B. Wang, B. Summa, V. Pascucci, M. Vejdemo-Johansson
    Abstract Large observations and simulations in scientific research give rise to high-dimensional data sets that present many challenges and opportunities in data analysis and visualization. Researchers in application domains such as engineering, computational biology, climate study, imaging and motion capture are faced with the problem of how to discover compact representations of highdimensional data while preserving their intrinsic structure. In many applications, the original data is projected onto low-dimensional space via dimensionality reduction techniques prior to modeling. One problem with this approach is that the projection step in the process can fail to preserve structure in the data that is only apparent in high dimensions. Conversely, such techniques may create structural illusions in the projection, implying structure not present in the original high-dimensional data. Our solution is to utilize topological techniques to recover important structures in high-dimensional data that contains non-trivial topology. Specifically, we are interested in high-dimensional branching structures. We construct local circle-valued coordinate functions to represent such features. Subsequently, we perform dimensionality reduction on the data while ensuring such structures are visually preserved. Additionally, we study the effects of global circular structures on visualizations. Our results reveal never-before-seen structures on real-world data sets from a variety of applications.
  3. Weighted Persistent Homology for Osmolyte Molecular Aggregation and Hydrogen-Bonding Network Analysis (2020)

    D. Vijay Anand, Zhenyu Meng, Kelin Xia, Yuguang Mu
    Abstract It has long been observed that trimethylamine N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work on these two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). Boltzmann persistent entropy (BPE) is proposed to quantitatively characterize the topological features from LPH and IPH, together with persistent Betti number (PBN). More specifically, from the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO is found to exhibit a local network structure. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 Å and a few global circle elements at around 12 Å. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function. Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4 Å, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5 Å to 10 Å. These differences are also reflected in the PBNs and BPEs of the local-scale IPH. These localized topological information has never been revealed before. Since graphs can be transferred into simplicial complexes by the clique complex, our weighted persistent homology models can be used in the analysis of various networks and graphs from any molecular structures and aggregation systems.
  4. Fast and Accurate Tumor Segmentation of Histology Images Using Persistent Homology and Deep Convolutional Features (2019)

    Talha Qaiser, Yee-Wah Tsang, Daiki Taniyama, Naoya Sakamoto, Kazuaki Nakane, David Epstein, Nasir Rajpoot
    Abstract Tumor segmentation in whole-slide images of histology slides is an important step towards computer-assisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. We propose an efficient way of computing topological persistence of an image, alternative to simplicial homology. The PHPs are devised to distinguish tumor regions from their normal counterparts by modeling the atypical characteristics of tumor nuclei. We propose two variants of our method for tumor segmentation: one that targets speed without compromising accuracy and the other that targets higher accuracy. The fast version is based on a selection of exemplar image patches from a convolution neural network (CNN) and patch classification by quantifying the divergence between the PHPs of exemplars and the input image patch. Detailed comparative evaluation shows that the proposed algorithm is significantly faster than competing algorithms while achieving comparable results. The accurate version combines the PHPs and high-level CNN features and employs a multi-stage ensemble strategy for image patch labeling. Experimental results demonstrate that the combination of PHPs and CNN features outperform competing algorithms. This study is performed on two independently collected colorectal datasets containing adenoma, adenocarcinoma, signet, and healthy cases. Collectively, the accurate tumor segmentation produces the highest average patch-level F1-score, as compared with competing algorithms, on malignant and healthy cases from both the datasets. Overall the proposed framework highlights the utility of persistent homology for histopathology image analysis.
  5. Topological Data Analysis of C. Elegans Locomotion and Behavior (2021)

    Ashleigh Thomas, Kathleen Bates, Alex Elchesen, Iryna Hartsock, Hang Lu, Peter Bubenik
    Abstract Video of nematodes/roundworms was analyzed using persistent homology to study locomotion and behavior. In each frame, an organism's body posture was represented by a high-dimensional vector. By concatenating points in fixed-duration segments of this time series, we created a sliding window embedding (sometimes called a time delay embedding) where each point corresponds to a sequence of postures of an organism. Persistent homology on the points in this time series detected behaviors and comparisons of these persistent homology computations detected variation in their corresponding behaviors. We used average persistence landscapes and machine learning techniques to study changes in locomotion and behavior in varying environments.
  6. Persistent Homology Analysis of Osmolyte Molecular Aggregation and Their Hydrogen-Bonding Networks (2019)

    Kelin Xia, D. Vijay Anand, Saxena Shikhar, Yuguang Mu
    Abstract Dramatically different properties have been observed for two types of osmolytes, i.e., trimethylamine N-oxide (TMAO) and urea, in a protein folding process. Great progress has been made in revealing the potential underlying mechanism of these two osmolyte systems. However, many problems still remain unsolved. In this paper, we propose to use the persistent homology to systematically study the osmolytes’ molecular aggregation and their hydrogen-bonding network from a global topological perspective. It has been found that, for the first time, TMAO and urea show two extremely different topological behaviors, i.e., an extensive network and local clusters, respectively. In general, TMAO forms highly consistent large loop or circle structures in high concentrations. In contrast, urea is more tightly aggregated locally. Moreover, the resulting hydrogen-bonding networks also demonstrate distinguishable features. With a concentration increase, TMAO hydrogen-bonding networks vary greatly in their total number of loop structures and large-sized loop structures consistently increase. In contrast, urea hydrogen-bonding networks remain relatively stable with slight reduction of the total loop number. Moreover, the persistent entropy (PE) is, for the first time, used in characterization of the topological information of the aggregation and hydrogen-bonding networks. The average PE systematically increases with the concentration for both TMAO and urea, and decreases in their hydrogen-bonding networks. But their PE variances have totally different behaviors. Finally, topological features of the hydrogen-bonding networks are found to be highly consistent with those from the ion aggregation systems, indicating that our topological invariants can characterize intrinsic features of the “structure making” and “structure breaking” systems.
  7. A Topological Measurement of Protein Compressibility (2015)

    Marcio Gameiro, Yasuaki Hiraoka, Shunsuke Izumi, Miroslav Kramar, Konstantin Mischaikow, Vidit Nanda
    Abstract In this paper we partially clarify the relation between the compressibility of a protein and its molecular geometric structure. To identify and understand the relevant topological features within a given protein, we model its molecule as an alpha filtration and hence obtain multi-scale insight into the structure of its tunnels and cavities. The persistence diagrams of this alpha filtration capture the sizes and robustness of such tunnels and cavities in a compact and meaningful manner. From these persistence diagrams, we extract a measure of compressibility derived from those topological features whose relevance is suggested by physical and chemical properties. Due to recent advances in combinatorial topology, this measure is efficiently and directly computable from information found in the Protein Data Bank (PDB). Our main result establishes a clear linear correlation between the topological measure and the experimentally-determined compressibility of most proteins for which both PDB information and experimental compressibility data are available. Finally, we establish that both the topological measurement and the linear correlation are stable with respect to small perturbations in the input data, such as those arising from experimental errors in compressibility and X-ray crystallography experiments.
  8. Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

    Monica Nicolau, Arnold J. Levine, Gunnar Carlsson
    Abstract High-throughput biological data, whether generated as sequencing, transcriptional microarrays, proteomic, or other means, continues to require analytic methods that address its high dimensional aspects. Because the computational part of data analysis ultimately identifies shape characteristics in the organization of data sets, the mathematics of shape recognition in high dimensions continues to be a crucial part of data analysis. This article introduces a method that extracts information from high-throughput microarray data and, by using topology, provides greater depth of information than current analytic techniques. The method, termed Progression Analysis of Disease (PAD), first identifies robust aspects of cluster analysis, then goes deeper to find a multitude of biologically meaningful shape characteristics in these data. Additionally, because PAD incorporates a visualization tool, it provides a simple picture or graph that can be used to further explore these data. Although PAD can be applied to a wide range of high-throughput data types, it is used here as an example to analyze breast cancer transcriptional data. This identified a unique subgroup of Estrogen Receptor-positive (ER+) breast cancers that express high levels of c-MYB and low levels of innate inflammatory genes. These patients exhibit 100% survival and no metastasis. No supervised step beyond distinction between tumor and healthy patients was used to identify this subtype. The group has a clear and distinct, statistically significant molecular signature, it highlights coherent biology but is invisible to cluster methods, and does not fit into the accepted classification of Luminal A/B, Normal-like subtypes of ER+ breast cancers. We denote the group as c-MYB+ breast cancer.
  9. Efficient Planning of Multi-Robot Collective Transport Using Graph Reinforcement Learning With Higher Order Topological Abstraction (2023)

    Steve Paul, Wenyuan Li, Brian Smyth, Yuzhou Chen, Yulia Gel, Souma Chowdhury
    Abstract Efficient multi-robot task allocation (MRTA) is fundamental to various time-sensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call MRTA-collective transport or MRTA-CT - here tasks present varying workloads and deadlines, and robots are subject to flight range, communication range, and payload constraints. For large instances of these problems involving 100s-1000's of tasks and 10s-100s of robots, traditional non-learning solvers are often time-inefficient, and emerging learning-based policies do not scale well to larger-sized problems without costly retraining. To address this gap, we use a recently proposed encoder-decoder graph neural network involving Capsule networks and multi-head attention mechanism, and innovatively add topological descriptors (TD) as new features to improve transferability to unseen problems of similar and larger size. Persistent homology is used to derive the TD, and proximal policy optimization is used to train our TD-augmented graph neural network. The resulting policy model compares favorably to state-of-the-art non-learning baselines while being much faster. The benefit of using TD is readily evident when scaling to test problems of size larger than those used in training.
  10. Atom-Specific Persistent Homology and Its Application to Protein Flexibility Analysis (2020)

    David Bramer, Guo-Wei Wei
    Abstract Recently, persistent homology has had tremendous success in biomolecular data analysis. It works by examining the topological relationship or connectivity of a group of atoms in a molecule at a variety of scales, then rendering a family of topological representations of the molecule. However, persistent homology is rarely employed for the analysis of atomic properties, such as biomolecular flexibility analysis or B-factor prediction. This work introduces atom-specific persistent homology to provide a local atomic level representation of a molecule via a global topological tool. This is achieved through the construction of a pair of conjugated sets of atoms and corresponding conjugated simplicial complexes, as well as conjugated topological spaces. The difference between the topological invariants of the pair of conjugated sets is measured by Bottleneck and Wasserstein metrics and leads to an atom-specific topological representation of individual atomic properties in a molecule. Atom-specific topological features are integrated with various machine learning algorithms, including gradient boosting trees and convolutional neural network for protein thermal fluctuation analysis and B-factor prediction. Extensive numerical results indicate the proposed method provides a powerful topological tool for analyzing and predicting localized information in complex macromolecules.
  11. Topological Regularization for Dense Prediction (2021)

    Deqing Fu, Bradley J. Nelson
    Abstract Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with these topological descriptions. Experimental results show that the output topology can also appear in the internal activations of trained neural networks which allows for a novel use of topological regularization to the internal states of neural networks during training, reducing the computational cost of the regularization. We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures.
  12. A Probabilistic Topological Approach to Feature Identification Using a Stochastic Robotic Swarm (2018)

    Ragesh K. Ramachandran, Sean Wilson, Spring Berman
    Abstract This paper presents a novel automated approach to quantifying the topological features of an unknown environment using a swarm of robots with local sensing and limited or no access to global position information. The robots randomly explore the environment and record a time series of their estimated position and the covariance matrix associated with this estimate. After the robots’ deployment, a point cloud indicating the free space of the environment is extracted from their aggregated data. Tools from topological data analysis, in particular the concept of persistent homology, are applied to a subset of the point cloud to construct barcode diagrams, which are used to determine the numbers of different types of features in the domain. We demonstrate that our approach can correctly identify the number of topological features in simulations with zero to four features and in multi-robot experiments with one to three features.
  13. Interpretable Phase Detection and Classification With Persistent Homology (2020)

    Alex Cole, Gregory J. Loges, Gary Shiu
    Abstract We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for conducting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortex-antivortex structure are identified as relevant features for characterizing phase transitions.
  14. A Topological Data Analysis Approach On Predicting Phenotypes From Gene Expression Data (2020)

    Sayan Mandal, Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida
    Abstract The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis., We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods., This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
  15. Topological Persistence for Relating Microstructure and Capillary Fluid Trapping in Sandstones (2019)

    A. L. Herring, V. Robins, A. P. Sheppard
    Abstract Results from a series of two-phase fluid flow experiments in Leopard, Berea, and Bentheimer sandstones are presented. Fluid configurations are characterized using laboratory-based and synchrotron based 3-D X-ray computed tomography. All flow experiments are conducted under capillary-dominated conditions. We conduct geometry-topology analysis via persistent homology and compare this to standard topological and watershed-partition-based pore-network statistics. Metrics identified as predictors of nonwetting fluid trapping are calculated from the different analytical methods and are compared to levels of trapping measured during drainage-imbibition cycles in the experiments. Metrics calculated from pore networks (i.e., pore body-throat aspect ratio and coordination number) and topological analysis (Euler characteristic) do not correlate well with trapping in these samples. In contrast, a new metric derived from the persistent homology analysis, which incorporates counts of topological features as well as their length scale and spatial distribution, correlates very well (R2 = 0.97) to trapping for all systems. This correlation encompasses a wide range of porous media and initial fluid configurations, and also applies to data sets of different imaging and image processing protocols.
  16. Unsupervised Topological Learning Approach of Crystal Nucleation in Pure Tantalum (2021)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unraveled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometer length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to a monatomic metal, namely Tantalum (Ta), it shows that both translational and orientational ordering always come into play simultaneously when homogeneous nucleation starts in regions with low five-fold symmetry.
  17. Characterising Epithelial Tissues Using Persistent Entropy (2019)

    N. Atienza, L. M. Escudero, M. J. Jimenez, M. Soriano-Trigueros
    Abstract In this paper, we apply persistent entropy, a novel topological statistic, for characterization of images of epithelial tissues. We have found out that persistent entropy is able to summarize topological and geometric information encoded by \$\$\alpha \$\$α-complexes and persistent homology. After using some statistical tests, we can guarantee the existence of significant differences in the studied tissues.
  18. Reviews: Topological Distances and Losses for Brain Networks (2021)

    Moo K. Chung, Alexander Smith, Gary Shiu
    Abstract Almost all statistical and machine learning methods in analyzing brain networks rely on distances and loss functions, which are mostly Euclidean or matrix norms. The Euclidean or matrix distances may fail to capture underlying subtle topological differences in brain networks. Further, Euclidean distances are sensitive to outliers. A few extreme edge weights may severely affect the distance. Thus it is necessary to use distances and loss functions that recognize topology of data. In this review paper, we survey various topological distance and loss functions from topological data analysis (TDA) and persistent homology that can be used in brain network analysis more effectively. Although there are many recent brain imaging studies that are based on TDA methods, possibly due to the lack of method awareness, TDA has not taken as the mainstream tool in brain imaging field yet. The main purpose of this paper is provide the relevant technical survey of these powerful tools that are immediately applicable to brain network data.
  19. Topological Descriptors of Histology Images (2014)

    Nikhil Singh, Heather D. Couture, J. S. Marron, Charles Perou, Marc Niethammer
    Abstract The purpose of this study is to investigate architectural characteristics of cell arrangements in breast cancer histology images. We propose the use of topological data analysis to summarize the geometric information inherent in tumor cell arrangements. Our goal is to use this information as signatures that encode robust summaries of cell arrangements in tumor tissue as captured through histology images. In particular, using ideas from algebraic topology we construct topological descriptors based on cell nucleus segmentations such as persistency charts and Betti sequences. We assess their performance on the task of discriminating the breast cancer subtypes Basal, Luminal A, Luminal B and HER2. We demonstrate that the topological features contain useful complementary information to image-appearance based features that can improve discriminatory performance of classifiers.
  20. Airway Pathological Heterogeneity in Asthma: Visualization of Disease Microclusters Using Topological Data Analysis (2018)

    Salman Siddiqui, Aarti Shikotra, Matthew Richardson, Emma Doran, David Choy, Alex Bell, Cary D. Austin, Jeffrey Eastham-Anderson, Beverley Hargadon, Joseph R. Arron, Andrew Wardlaw, Christopher E. Brightling, Liam G. Heaney, Peter Bradding
    Abstract Background Asthma is a complex chronic disease underpinned by pathological changes within the airway wall. How variations in structural airway pathology and cellular inflammation contribute to the expression and severity of asthma are poorly understood. Objectives Therefore we evaluated pathological heterogeneity using topological data analysis (TDA) with the aim of visualizing disease clusters and microclusters. Methods A discovery population of 202 adult patients (142 asthmatic patients and 60 healthy subjects) and an external replication population (59 patients with severe asthma) were evaluated. Pathology and gene expression were examined in bronchial biopsy samples. TDA was applied by using pathological variables alone to create pathology-driven visual networks. Results In the discovery cohort TDA identified 4 groups/networks with multiple microclusters/regions of interest that were masked by group-level statistics. Specifically, TDA group 1 consisted of a high proportion of healthy subjects, with a microcluster representing a topological continuum connecting healthy subjects to patients with mild-to-moderate asthma. Three additional TDA groups with moderate-to-severe asthma (Airway Smooth MuscleHigh, Reticular Basement MembraneHigh, and RemodelingLow groups) were identified and contained numerous microclusters with varying pathological and clinical features. Mutually exclusive TH2 and TH17 tissue gene expression signatures were identified in all pathological groups. Discovery and external replication applied to the severe asthma subgroup identified only highly similar “pathological data shapes” through analyses of persistent homology. Conclusions We have identified and replicated novel pathological phenotypes of asthma using TDA. Our methodology is applicable to other complex chronic diseases.
  21. Computing Robustness and Persistence for Images (2010)

    P. Bendich, H. Edelsbrunner, M. Kerber
    Abstract We are interested in 3-dimensional images given as arrays of voxels with intensity values. Extending these values to a continuous function, we study the robustness of homology classes in its level and interlevel sets, that is, the amount of perturbation needed to destroy these classes. The structure of the homology classes and their robustness, over all level and interlevel sets, can be visualized by a triangular diagram of dots obtained by computing the extended persistence of the function. We give a fast hierarchical algorithm using the dual complexes of oct-tree approximations of the function. In addition, we show that for balanced oct-trees, the dual complexes are geometrically realized in R3 and can thus be used to construct level and interlevel sets. We apply these tools to study 3-dimensional images of plant root systems.
  22. Topological Electronic Structure and Weyl Points in Nonsymmorphic Hexagonal Materials (2020)

    Rafael González-Hernández, Erick Tuiran, Bernardo Uribe
    Abstract Using topological band theory analysis we show that the nonsymmorphic symmetry operations in hexagonal lattices enforce Weyl points at the screw-invariant high-symmetry lines of the band structure. The corepresentation theory and connectivity group theory show that Weyl points are generated by band crossings in accordion-like and hourglass-like dispersion relations. These Weyl points are stable against weak perturbations and are protected by the screw rotation symmetry. Based on first-principles calculations we found a complete agreement between the topological predicted energy dispersion relations and real hexagonal materials. Topological charge (chirality) and Berry curvature calculations show the simultaneous formation of Weyl points and nodal-lines in 4d transition-metal trifluorides such as AgF3 and AuF3. Furthermore, a large intrinsic spin-Hall conductivity was found due to the combined strong spin-orbit coupling and multiple Weyl-point crossings in the electronic structure. These materials could be used to the spin/charge conversion in more energy-efficient spintronic devices.
  23. A Novel Quality Clustering Methodology on Fab-Wide Wafer Map Images in Semiconductor Manufacturing (2022)

    Yuan-Ming Hsu, Xiaodong Jia, Wenzhe Li, Jay Lee
    Abstract Abstract. In semiconductor manufacturing, clustering the fab-wide wafer map images is of critical importance for practitioners to understand the subclusters of wafer defects, recognize novel clusters or anomalies, and develop fast reactions to quality issues. However, due to the high-mix manufacturing of diversified wafer products of different sizes and technologies, it is difficult to cluster the wafer map images across the fab. This paper addresses this challenge by proposing a novel methodology for fab-wide wafer map data clustering. In the proposed methodology, a well-known deep learning technique, vision transformer with multi-head attention is first trained to convert binary wafer images of different sizes into condensed feature vectors for efficient clustering. Then, the Topological Data Analysis (TDA), which is widely used in biomedical applications, is employed to visualize the data clusters and identify the anomalies. The TDA yields a topological representation of high-dimensional big data as well as its local clusters by creating a graph that shows nodes corresponding to the clusters within the data. The effectiveness of the proposed methodology is demonstrated by clustering the public wafer map dataset WM-811k from the real application which has a total of 811,457 wafer map images. We further demonstrate the potential applicability of topology data analytics in the semiconductor area by visualization.
  24. Topological Pattern Recognition for Point Cloud Data* (2014)

    Gunnar Carlsson
    Abstract In this paper we discuss the adaptation of the methods of homology from algebraic topology to the problem of pattern recognition in point cloud data sets. The method is referred to as persistent homology, and has numerous applications to scientific problems. We discuss the definition and computation of homology in the standard setting of simplicial complexes and topological spaces, then show how one can obtain useful signatures, called barcodes, from finite metric spaces, thought of as sampled from a continuous object. We present several different cases where persistent homology is used, to illustrate the different ways in which the method can be applied.
  25. Topological Data Analysis of Biological Aggregation Models (2015)

    Chad M. Topaz, Lori Ziegelmeier, Tom Halverson
    Abstract We apply tools from topological data analysis to two mathematical models inspired by biological aggregations such as bird flocks, fish schools, and insect swarms. Our data consists of numerical simulation output from the models of Vicsek and D'Orsogna. These models are dynamical systems describing the movement of agents who interact via alignment, attraction, and/or repulsion. Each simulation time frame is a point cloud in position-velocity space. We analyze the topological structure of these point clouds, interpreting the persistent homology by calculating the first few Betti numbers. These Betti numbers count connected components, topological circles, and trapped volumes present in the data. To interpret our results, we introduce a visualization that displays Betti numbers over simulation time and topological persistence scale. We compare our topological results to order parameters typically used to quantify the global behavior of aggregations, such as polarization and angular momentum. The topological calculations reveal events and structure not captured by the order parameters.
  26. Gene Coexpression Network Comparison via Persistent Homology (2018)

    Ali Nabi Duman, Harun Pirim
    Abstract Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies.
  27. Development of the Functional Connectome Topology in Adolescence: Evidence From Topological Data Analysis (2021)

    Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Martha Beatriz Moreno, Fernando A. Barrios, Sarael Alcauter
    Abstract Adolescence is a crucial developmental period in terms of behavior and mental health. Therefore, understanding how the brain develops during this stage is a fundamental challenge for neuroscience. Recent studies have modelled the brain as a network or connectome, mainly applying measures from graph theory, showing a change in its functional organization such as an increase in its segregation and integration. Topological Data Analysis (TDA) complements such modelling by extracting high-dimensional features across the whole range of connectivity values, instead of exploring a fixed set of connections. This study enquiries into the developmental trajectories of such properties using a longitudinal sample of typically developing participants (N = 98; 53/45 F/M; 6.7-18.1 years), applying TDA into their functional connectomes. In addition, we explore the effect of puberty on the individual developmental trajectories. Results showed that compared to random networks, the adolescent brain is more segregated at the global level, but more densely connected at the local level. Furthermore, developmental effects showed nonlinear trajectories for the integration of the whole brain and fronto-parietal networks, with an inflection point and increasing trajectories after puberty onset. These results add to the insights in the development of the functional organization of the adolescent. Significance Statement Topological Data Analysis may be used to explore the topology of the brain along the whole range of connectivity values instead of selecting only a fixed set of connectivity thresholds. Here, we explored some properties of the topology of the brain functional connectome, and how they develop in adolescence. First, we show that developmental trajectories are nonlinear and better explained by the puberty status than chronological age, with an inflection point around the puberty onset. The greatest effect is the increase in functional integration for the whole brain, and particularly for the Fronto-Parietal Network when exploring functional subnetworks.
  28. Object-Oriented Persistent Homology (2016)

    Bao Wang, Guo-Wei Wei
    Abstract Persistent homology provides a new approach for the topological simplification of big data via measuring the life time of intrinsic topological features in a filtration process and has found its success in scientific and engineering applications. However, such a success is essentially limited to qualitative data classification and analysis. Indeed, persistent homology has rarely been employed for quantitative modeling and prediction. Additionally, the present persistent homology is a passive tool, rather than a proactive technique, for classification and analysis. In this work, we outline a general protocol to construct object-oriented persistent homology methods. By means of differential geometry theory of surfaces, we construct an objective functional, namely, a surface free energy defined on the data of interest. The minimization of the objective functional leads to a Laplace-Beltrami operator which generates a multiscale representation of the initial data and offers an objective oriented filtration process. The resulting differential geometry based object-oriented persistent homology is able to preserve desirable geometric features in the evolutionary filtration and enhances the corresponding topological persistence. The cubical complex based homology algorithm is employed in the present work to be compatible with the Cartesian representation of the Laplace-Beltrami flow. The proposed Laplace-Beltrami flow based persistent homology method is extensively validated. The consistence between Laplace-Beltrami flow based filtration and Euclidean distance based filtration is confirmed on the Vietoris-Rips complex for a large amount of numerical tests. The convergence and reliability of the present Laplace-Beltrami flow based cubical complex filtration approach are analyzed over various spatial and temporal mesh sizes. The Laplace-Beltrami flow based persistent homology approach is utilized to study the intrinsic topology of proteins and fullerene molecules. Based on a quantitative model which correlates the topological persistence of fullerene central cavity with the total curvature energy of the fullerene structure, the proposed method is used for the prediction of fullerene isomer stability. The efficiency and robustness of the present method are verified by more than 500 fullerene molecules. It is shown that the proposed persistent homology based quantitative model offers good predictions of total curvature energies for ten types of fullerene isomers. The present work offers the first example to design object-oriented persistent homology to enhance or preserve desirable features in the original data during the filtration process and then automatically detect or extract the corresponding topological traits from the data.
  29. Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)

    Kelin Xia, Zhixiong Zhao, Guo-Wei Wei
    Abstract Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.
  30. Topology Highlights Mesoscopic Functional Equivalence Between Imagery and Perception: The Case of Hypnotizability (2019)

    Esther Ibáñez-Marcelo, Lisa Campioni, Angkoon Phinyomark, Giovanni Petri, Enrica L. Santarcangelo
    Abstract The functional equivalence (FE) between imagery and perception or motion has been proposed on the basis of neuroimaging evidence of large spatially overlapping activations between real and imagined sensori-motor conditions. However, similar local activation patterns do not imply the same mesoscopic integration of brain regions, which can be described by tools from Topological Data Analysis (TDA). On the basis of behavioral findings, stronger FE has been hypothesized in the individuals with high scores of hypnotizability scores (highs) with respect to low hypnotizable participants (lows) who differ between each other in the proneness to modify memory, perception and behavior according to specific imaginative suggestions. Here we present the first EEG evidence of stronger FE in highs. In fact, persistent homology shows that the highs EEG topological asset during real and imagined sensory conditions is significantly more similar than the lows. As a corollary finding, persistent homology shows lower restructuring of the EEG asset in highs than in lows during both sensory and imagery tasks with respect to basal conditions. Present findings support the view that greater embodiment of mental images may be responsible for the highs greater proneness to respond to sensori-motor suggestions and to report involuntariness in action. In addition, findings indicate hypnotizability-related sensory and cognitive information processing and suggest that the psycho-physiological trait of hypnotizability may modulate more than one aspect of the everyday life.
  31. Path Homology as a Stronger Analogue of Cyclomatic Complexity (2020)

    Steve Huntsman
    Abstract Cyclomatic complexity is an incompletely specified but mathematically principled software metric that can be usefully applied to both source and binary code. We consider the application of path homology as a stronger analogue of cyclomatic complexity. We have implemented an algorithm to compute path homology in arbitrary dimension and applied it to several classes of relevant flow graphs, including randomly generated flow graphs representing structured and unstructured control flow. We also compared path homology and cyclomatic complexity on a set of disassembled binaries obtained from the grep utility. There exist control flow graphs realizable at the assembly level with nontrivial path homology in arbitrary dimension. We exhibit several classes of examples in this vein while also experimentally demonstrating that path homology gives identicial results to cyclomatic complexity for at least one detailed notion of structured control flow. We also experimentally demonstrate that the two notions differ on disassembled binaries, and we highlight an example of extreme disagreement. Path homology empirically generalizes cyclomatic complexity for an elementary notion of structured code and appears to identify more structurally relevant features of control flow in general. Path homology therefore has the potential to substantially improve upon cyclomatic complexity.
  32. Pore Configuration Landscape of Granular Crystallization (2017)

    Mohammad Saadatfar, Hiroshi Takeuchi, Vanessa Robins, Nicolas Francois, Yisuaki Hiraoka
    Abstract Emergence and growth of crystalline domains in granular media remains under-explored. Here, the authors analyse tomographic snapshots from partially recrystallized packings of spheres using persistent homology and find agreement with proposed transitions based on continuous deformation of octahedral and tetrahedral voids.
  33. Conserved Abundance and Topological Features in Chromatin-Remodeling Protein Interaction Networks (2015)

    Mihaela E Sardiu, Joshua M Gilmore, Brad D Groppe, Damir Herman, Sreenivasa R Ramisetty, Yong Cai, Jingji Jin, Ronald C Conaway, Joan W Conaway, Laurence Florens, Michael P Washburn
    Abstract Abstract The study of conserved protein interaction networks seeks to better understand the evolution and regulation of protein interactions. Here, we present a quantitative proteomic analysis of 18 orthologous baits from three distinct chromatin-remodeling complexes in Saccharomyces cerevisiae and Homo sapiens. We demonstrate that abundance levels of orthologous proteins correlate strongly between the two organisms and both networks have highly similar topologies. We therefore used the protein abundances in one species to cross-predict missing protein abundance levels in the other species. Lastly, we identified a novel conserved low-abundance subnetwork further demonstrating the value of quantitative analysis of networks.
  34. Topological Data Analysis of Financial Time Series: Landscapes of Crashes (2017)

    Marian Gidea, Yuri Katz
    Abstract We explore the evolution of daily returns of four major US stock market indices during the technology crash of 2000, and the financial crisis of 2007-2009. Our methodology is based on topological data analysis (TDA). We use persistence homology to detect and quantify topological patterns that appear in multidimensional time series. Using a sliding window, we extract time-dependent point cloud data sets, to which we associate a topological space. We detect transient loops that appear in this space, and we measure their persistence. This is encoded in real-valued functions referred to as a 'persistence landscapes'. We quantify the temporal changes in persistence landscapes via their \$L\textasciicircump\$-norms. We test this procedure on multidimensional time series generated by various non-linear and non-equilibrium models. We find that, in the vicinity of financial meltdowns, the \$L\textasciicircump\$-norms exhibit strong growth prior to the primary peak, which ascends during a crash. Remarkably, the average spectral density at low frequencies of the time series of \$L\textasciicircump\$-norms of the persistence landscapes demonstrates a strong rising trend for 250 trading days prior to either dotcom crash on 03/10/2000, or to the Lehman bankruptcy on 09/15/2008. Our study suggests that TDA provides a new type of econometric analysis, which goes beyond the standard statistical measures. The method can be used to detect early warning signals of imminent market crashes. We believe that this approach can be used beyond the analysis of financial time series presented here.
  35. From Topological Analyses to Functional Modeling: The Case of Hippocampus (2021)

    Yuri Dabaghian
    Abstract Topological data analyses are widely used for describing and conceptualizing large volumes of neurobiological data, e.g., for quantifying spiking outputs of large neuronal ensembles and thus understanding the functions of the corresponding networks. Below we discuss an approach in which convergent topological analyses produce insights into how information may be processed in mammalian hippocampus—a brain part that plays a key role in learning and memory. The resulting functional model provides a unifying framework for integrating spiking data at different timescales and following the course of spatial learning at different levels of spatiotemporal granularity. This approach allows accounting for contributions from various physiological phenomena into spatial cognition—the neuronal spiking statistics, the effects of spiking synchronization by different brain waves, the roles played by synaptic efficacies and so forth. In particular, it is possible to demonstrate that networks with plastic and transient synaptic architectures can encode stable cognitive maps, revealing the characteristic timescales of memory processing.
  36. Loops Abound in the Cosmic Microwave Background: A \$4\sigma\$ Anomaly on Super-Horizon Scales (2021)

    Pratyush Pranav
    Abstract We present a topological analysis of the temperature fluctuation maps from the \emph\Planck 2020\ Data release 4 (DR4) based on the \texttt\NPIPE\ data processing pipeline. For comparison, we also present the topological characteristics of the maps from \emph\Planck 2018\ Data release 3 (DR3). We perform our analysis in terms of the homology characteristics of the maps, invoking relative homology to account for analysis in the presence of masks. We perform our analysis for a range of smoothing scales spanning sub- and super-horizon scales corresponding to \$FWHM = 5', 10', 20', 40', 80', 160', 320', 640'\$. Our main result indicates a significantly anomalous behavior of the loops in the observed maps compared to simulations that are modeled as isotopic and homogeneous Gaussian random fields. Specifically, we observe a \$4\sigma\$ deviation between the observation and simulations in the number of loops at \$FWHM = 320'\$ and \$FWHM = 640'\$, corresponding to super-horizon scales of \$5\$ degrees and larger. In addition, we also notice a mildly significant deviation at \$2\sigma\$ for all the topological descriptors for almost all the scales analyzed. Our results show a consistency across different data releases, and therefore, the anomalous behavior deserves a careful consideration regarding its origin and ramifications. Disregarding the unlikely source of the anomaly being instrumental systematics, the origin of the anomaly may be genuinely astrophysical -- perhaps due to a yet unresolved foreground, or truly primordial in nature. Given the nature of the topological descriptors, that potentially encodes information of all orders, non-Gaussianities, of either primordial or late-type nature, may be potential candidates. Alternate possibilities include the Universe admitting a non-trivial global topology, including effects induced by large-scale topological defects.
  37. The (Homological) Persistence of Gerrymandering (2021)

    Moon Duchin, Tom Needham, Thomas Weighill
    Abstract \textlessp style='text-indent:20px;'\textgreaterWe apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. We begin by combining geographic and electoral data from a districting plan to produce a persistence diagram. Then, to see beyond a particular plan and understand the possibilities afforded by the choices made in redistricting, we build methods to visualize and analyze large ensembles of alternative plans. Our detailed case studies use zero-dimensional homology (persistent components) of filtered graphs constructed from voting data to analyze redistricting in Pennsylvania and North Carolina. We find that, across large ensembles of partitions, the features cluster in the persistence diagrams in a way that corresponds strongly to geographic location, so that we can construct an average diagram for an ensemble, with each point identified with a geographical region. Using this localization lets us produce zonings of each state at Congressional, state Senate, and state House scales, show the regional non-uniformity of election shifts, and identify attributes of partitions that tend to correspond to partisan advantage.\textless/p\textgreater\textlessp style='text-indent:20px;'\textgreaterThe methods here are set up to be broadly applicable to the use of TDA on large ensembles of data. Many studies will benefit from interpretable summaries of large sets of samples or simulations, and the work here on localization and zoning will readily generalize to other partition problems, which are abundant in scientific applications. For the mathematically and politically rich problem of redistricting in particular, TDA provides a powerful and elegant summarization tool whose findings will be useful for practitioners.\textless/p\textgreater
  38. Persistent Betti Numbers for a Noise Tolerant Shape-Based Approach to Image Retrieval (2011)

    Patrizio Frosini, Claudia Landi
    Abstract In content-based image retrieval a major problem is the presence of noisy shapes. It is well known that persistent Betti numbers are a shape descriptor that admits a dissimilarity distance, the matching distance, stable under continuous shape deformations. In this paper we focus on the problem of dealing with noise that changes the topology of the studied objects. We present a general method to turn persistent Betti numbers into stable descriptors also in the presence of topological changes. Retrieval tests on the Kimia-99 database show the effectiveness of the method.
  39. Topological Analysis of Population Activity in Visual Cortex (2008)

    Gurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, Dario L. Ringach
    Abstract Information in the cortex is thought to be represented by the joint activity of neurons. Here we describe how fundamental questions about neural representation can be cast in terms of the topological structure of population activity. A new method, based on the concept of persistent homology, is introduced and applied to the study of population activity in primary visual cortex (V1). We found that the topological structure of activity patterns when the cortex is spontaneously active is similar to those evoked by natural image stimulation and consistent with the topology of a two sphere. We discuss how this structure could emerge from the functional organization of orientation and spatial frequency maps and their mutual relationship. Our findings extend prior results on the relationship between spontaneous and evoked activity in V1 and illustrates how computational topology can help tackle elementary questions about the representation of information in the nervous system.
  40. Using Multidimensional Topological Data Analysis to Identify Traits of Hip Osteoarthritis (2018)

    Jasmine Rossi‐deVries, Valentina Pedoia, Michael A. Samaan, Adam R. Ferguson, Richard B. Souza, Sharmila Majumdar
    Abstract Background Osteoarthritis (OA) is a multifaceted disease with many variables affecting diagnosis and progression. Topological data analysis (TDA) is a state-of-the-art big data analytics tool that can combine all variables into multidimensional space. TDA is used to simultaneously analyze imaging and gait analysis techniques. Purpose To identify biochemical and biomechanical biomarkers able to classify different disease progression phenotypes in subjects with and without radiographic signs of hip OA. Study Type Longitudinal study for comparison of progressive and nonprogressive subjects. Population In all, 102 subjects with and without radiographic signs of hip osteoarthritis. Field Strength/Sequence 3T, SPGR 3D MAPSS T1ρ/T2, intermediate-weighted fat-suppressed fast spin-echo (FSE). Assessment Multidimensional data analysis including cartilage composition, bone shape, Kellgren–Lawrence (KL) classification of osteoarthritis, scoring hip osteoarthritis with MRI (SHOMRI), hip disability and osteoarthritis outcome score (HOOS). Statistical Tests Analysis done using TDA, Kolmogorov–Smirnov (KS) testing, and Benjamini-Hochberg to rank P-value results to correct for multiple comparisons. Results Subjects in the later stages of the disease had an increased SHOMRI score (P \textless 0.0001), increased KL (P = 0.0012), and older age (P \textless 0.0001). Subjects in the healthier group showed intact cartilage and less pain. Subjects found between these two groups had a range of symptoms. Analysis of this subgroup identified knee biomechanics (P \textless 0.0001) as an initial marker of the disease that is noticeable before the morphological progression and degeneration. Further analysis of an OA subgroup with femoroacetabular impingement (FAI) showed anterior labral tears to be the most significant marker (P = 0.0017) between those FAI subjects with and without OA symptoms. Data Conclusion The data-driven analysis obtained with TDA proposes new phenotypes of these subjects that partially overlap with the radiographic-based classical disease status classification and also shows the potential for further examination of an early onset biomechanical intervention. Level of Evidence: 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;48:1046–1058.
  41. Signal Enrichment With Strain-Level Resolution in Metagenomes Using Topological Data Analysis (2019)

    Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida
    Abstract Background A metagenome is a collection of genomes, usually in a micro-environment, and sequencing a metagenomic sample en masse is a powerful means for investigating the community of the constituent microorganisms. One of the challenges is in distinguishing between similar organisms due to rampant multiple possible assignments of sequencing reads, resulting in false positive identifications. We map the problem to a topological data analysis (TDA) framework that extracts information from the geometric structure of data. Here the structure is defined by multi-way relationships between the sequencing reads using a reference database. Results Based primarily on the patterns of co-mapping of the reads to multiple organisms in the reference database, we use two models: one a subcomplex of a Barycentric subdivision complex and the other a Čech complex. The Barycentric subcomplex allows a natural mapping of the reads along with their coverage of organisms while the Čech complex takes simply the number of reads into account to map the problem to homology computation. Using simulated genome mixtures we show not just enrichment of signal but also microbe identification with strain-level resolution. Conclusions In particular, in the most refractory of cases where alternative algorithms that exploit unique reads (i.e., mapped to unique organisms) fail, we show that the TDA approach continues to show consistent performance. The Čech model that uses less information is equally effective, suggesting that even partial information when augmented with the appropriate structure is quite powerful.
  42. From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives (2020)

    Lida Kanari, Adélie Garin, Kathryn Hess
    Abstract Methods of topological data analysis have been successfully applied in a wide range of fields to provide useful summaries of the structure of complex data sets in terms of topological descriptors, such as persistence diagrams. While there are many powerful techniques for computing topological descriptors, the inverse problem, i.e., recovering the input data from topological descriptors, has proved to be challenging. In this article we study in detail the Topological Morphology Descriptor (TMD), which assigns a persistence diagram to any tree embedded in Euclidean space, and a sort of stochastic inverse to the TMD, the Topological Neuron Synthesis (TNS) algorithm, gaining both theoretical and computational insights into the relation between the two. We propose a new approach to classify barcodes using symmetric groups, which provides a concrete language to formulate our results. We investigate to what extent the TNS recovers a geometric tree from its TMD and describe the effect of different types of noise on the process of tree generation from persistence diagrams. We prove moreover that the TNS algorithm is stable with respect to specific types of noise.
  43. Persistent Homology to Quantify the Quality of Surface-Supported Covalent Networks (2019)

    Abraham Gutierrez, Mickaël Buchet, Sylvain Clair
    Abstract Covalent networks formed by on-surface synthesis usually suffer from the presence of a large number of defects. We report on a methodology to characterize such two-dimensional networks from their experimental images obtained by scanning probe microscopy. The computation is based on a persistent homology approach and provides a quantitative score indicative of the network homogeneity. We compare our scoring method with results previously obtained using minimal spanning tree analyses and we apply it to some molecular systems appearing in the existing literature.
  44. Felix: A Topology Based Framework for Visual Exploration of Cosmic Filaments (2016)

    Nithin Shivshankar, Pratyush Pranav, Vijay Natarajan, Rien van de Weygaert, E. G. Patrick Bos, Steven Rieder
    Abstract The large-scale structure of the universe is comprised of virialized blob-like clusters, linear filaments, sheet-like walls and huge near empty three-dimensional voids. Characterizing the large scale universe is essential to our understanding of the formation and evolution of galaxies. The density range of clusters, walls and voids are relatively well separated, when compared to filaments, which span a relatively larger range. The large scale filamentary network thus forms an intricate part of the cosmic web. In this paper, we describe Felix, a topology based framework for visual exploration of filaments in the cosmic web. The filamentary structure is represented by the ascending manifold geometry of the 2-saddles in the Morse-Smale complex of the density field. We generate a hierarchy of Morse-Smale complexes and query for filaments based on the density ranges at the end points of the filaments. The query is processed efficiently over the entire hierarchical Morse-Smale complex, allowing for interactive visualization. We apply Felix to computer simulations based on the heuristic Voronoi kinematic model and the standard \$\Lambda\$CDM cosmology, and demonstrate its usefulness through two case studies. First, we extract cosmic filaments within and across cluster like regions in Voronoi kinematic simulation datasets. We demonstrate that we produce similar results to existing structure finders. Filaments that form the spine of the cosmic web, which exist in high density regions in the current epoch, are isolated using Felix. Also, filaments present in void-like regions are isolated and visualized. These filamentary structures are often over shadowed by higher density range filaments and are not easily characterizable and extractable using other filament extraction methodologies.
  45. Exploring the Geometry and Topology of Neural Network Loss Landscapes (2022)

    Stefan Horoi, Jessie Huang, Bastian Rieck, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy
    Abstract Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have proposed visualizing the loss landscape through the use of simple dimensionality reduction techniques. However, such visualization methods have been limited by their linear nature and only capture features in one or two dimensions, thus restricting sampling of the loss landscape to lines or planes. Here, we expand and improve upon these in three ways. First, we present a novel “jump and retrain” procedure for sampling relevant portions of the loss landscape. We show that the resulting sampled data holds more meaningful information about the network’s ability to generalize. Next, we show that non-linear dimensionality reduction of the jump and retrain trajectories via PHATE, a trajectory and manifold-preserving method, allows us to visualize differences between networks that are generalizing well vs poorly. Finally, we combine PHATE trajectories with a computational homology characterization to quantify trajectory differences.
  46. Positive Alexander Duality for Pursuit and Evasion (2017)

    Robert Ghrist, Sanjeevi Krishnan
    Abstract Considered is a class of pursuit-evasion games, in which an evader tries to avoid detection. Such games can be formulated as the search for sections to the complement of a coverage region in a Euclidean space over time. Prior results give homological criteria for evasion in the general case that are not necessary and sufficient. This paper provides a necessary and sufficient positive cohomological criterion for evasion in the general case. The principal tools are (1) a refinement of the Čech cohomology of a coverage region with a positive cone encoding spatial orientation, (2) a refinement of the Borel--Moore homology of the coverage gaps with a positive cone encoding time orientation, and (3) a positive variant of Alexander Duality. Positive cohomology decomposes as the global sections of a sheaf of local positive cohomology over the time axis; we show how this decomposition makes positive cohomology computable using techniques of computational polyhedral geometry and linear programming.
  47. Topological Data Analysis: A Promising Big Data Exploration Tool in Biology, Analytical Chemistry and Physical Chemistry (2016)

    Marc Offroy, Ludovic Duponchel
    Abstract An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data).
  48. Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

    Zixuan Cang, Lin Mu, Guo-Wei Wei
    Abstract This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
  49. Towards a New Approach to Reveal Dynamical Organization of the Brain Using Topological Data Analysis (2018)

    Manish Saggar, Olaf Sporns, Javier Gonzalez-Castillo, Peter A. Bandettini, Gunnar Carlsson, Gary Glover, Allan L. Reiss
    Abstract Approaches describing how the brain changes to accomplish cognitive tasks tend to rely on collapsed data. Here, authors present a new approach that maintains high dimensionality and use it to describe individual differences in how brain activity is represented and organized across different cognitive tasks.
  50. The Growing Topology of the C. Elegans Connectome (2020)

    Alec Helm, Ann S. Blevins, Danielle S. Bassett
    Abstract Probing the developing neural circuitry in Caenorhabditis elegans has enhanced our understanding of nervous systems. The C. elegans connectome, like those of other species, is characterized by a rich club of densely connected neurons embedded within a small-world architecture. This organization of neuronal connections, captured by quantitative network statistics, provides insight into the system's capacity to perform integrative computations. Yet these network measures are limited in their ability to detect weakly connected motifs, such as topological cavities, that may support the systems capacity to perform segregated computations. We address this limitation by using persistent homology to track the evolution of topological cavities in the growing C. elegans connectome throughout neural development, and assess the degree to which the growing connectomes topology is resistant to biological noise. We show that the developing connectome topology is both relatively robust to changes in neuron birth times and not captured by similar growth models. Additionally, we quantify the consequence of a neurons specific birth time and ask if this metric tracks other biological properties of neurons. Our results suggest that the connectomes growing topology is a robust feature of the developing connectome that is distinct from other network properties, and that the growing topology is particularly sensitive to the exact birth times of a small set of predominantly motor neurons. By utilizing novel measurements that track biological features, we anticipate that our study will be helpful in the construction of more accurate models of neuronal development in C. elegans
  51. Alpha, Betti and the Megaparsec Universe: On the Topology of the Cosmic Web (2011)

    Rien Van De Weygaert, Gert Vegter, Herbert Edelsbrunner, Bernard J. T. Jones, Pratyush Pranav, Changbom Park, Wojciech A. Hellwing, Bob Eldering, Nico Kruithof, E. G. P. Bos, Johan Hidding, Job Feldbrugge, Eline Ten Have, Matti Van Engelen, Manuel Caroli, Monique Teillaud
    Abstract We study the topology of the Megaparsec Cosmic Web in terms of the scale-dependent Betti numbers, which formalize the topological information content of...
  52. Persistent Topology for Cryo-Em Data Analysis (2015)

    Kelin Xia, Guo-Wei Wei
    Abstract SummaryIn this work, we introduce persistent homology for the analysis of cryo-electron microscopy (cryo-EM) density maps. We identify the topological fingerprint or topological signature of noise, which is widespread in cryo-EM data. For low signal-to-noise ratio (SNR) volumetric data, intrinsic topological features of biomolecular structures are indistinguishable from noise. To remove noise, we employ geometric flows that are found to preserve the intrinsic topological fingerprints of cryo-EM structures and diminish the topological signature of noise. In particular, persistent homology enables us to visualize the gradual separation of the topological fingerprints of cryo-EM structures from those of noise during the denoising process, which gives rise to a practical procedure for prescribing a noise threshold to extract cryo-EM structure information from noise contaminated data after certain iterations of the geometric flow equation. To further demonstrate the utility of persistent homology for cryo-EM data analysis, we consider a microtubule intermediate structure Electron Microscopy Data (EMD 1129). Three helix models, an alpha-tubulin monomer model, an alpha-tubulin and beta-tubulin model, and an alpha-tubulin and beta-tubulin dimer model, are constructed to fit the cryo-EM data. The least square fitting leads to similarly high correlation coefficients, which indicates that structure determination via optimization is an ill-posed inverse problem. However, these models have dramatically different topological fingerprints. Especially, linkages or connectivities that discriminate one model from another, play little role in the traditional density fitting or optimization but are very sensitive and crucial to topological fingerprints. The intrinsic topological features of the microtubule data are identified after topological denoising. By a comparison of the topological fingerprints of the original data and those of three models, we found that the third model is topologically favored. The present work offers persistent homology based new strategies for topological denoising and for resolving ill-posed inverse problems. Copyright © 2015 John Wiley & Sons, Ltd.
  53. Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)

    Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
    Abstract Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.
  54. Coverage Criterion in Sensor Networks Stable Under Perturbation (2014)

    Yasuaki Hiraoka, Genki Kusano
    Abstract To the coverage problem of sensor networks, V. de Silva and R. Ghrist (2007) developed several approaches based on (persistent) homology theory. Their criteria for the coverage are formulated on the Rips complexes constructed by the sensors, in which their locations are supposed to be fixed. However, the sensors are in general affected by perturbations (e.g., natural phenomena), and hence the stability of the coverage criteria should be also discussed. In this paper, we present a coverage theorem stable under perturbation. Furthermore, we also introduce a method of eliminating redundant cover after perturbation. The coverage theorem is derived by extending the Rips interleaving theorem studied by F. Chazal, V. de Silva, and S. Oudot (2013) into an appropriate relative version.
  55. A Topological Machine Learning Pipeline for Classification (2022)

    Francesco Conti, Davide Moroni, Maria Antonietta Pascali
    Abstract In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.
  56. Graph Filtration Learning (2020)

    Christoph Hofer, Florian Graf, Bastian Rieck, Marc Niethammer, Roland Kwitt
    Abstract We propose an approach to learning with graph-structured data in the problem domain of graph classification. In particular, we present a novel type of readout operation to aggregate node features into a graph-level representation. To this end, we leverage persistent homology computed via a real-valued, learnable, filter function. We establish the theoretical foundation for differentiating through the persistent homology computation. Empirically, we show that this type of readout operation compares favorably to previous techniques, especially when the graph connectivity structure is informative for the learning problem.
  57. Topological Data Analysis Reveals Robust Alterations in the Whole-Brain and Frontal Lobe Functional Connectomes in Attention-Deficit/Hyperactivity Disorder (2020)

    Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Sarael Alcauter
    Abstract Visual Abstract \textlessimg class="highwire-fragment fragment-image" alt="Figure" src="https://www.eneuro.org/content/eneuro/7/3/ENEURO.0543-19.2020/F1.medium.gif" width="369" height="440"/\textgreaterDownload figureOpen in new tabDownload powerpoint Attention-deficit/hyperactivity disorder (ADHD) is a developmental disorder characterized by difficulty to control the own behavior. Neuroimaging studies have related ADHD with the interplay of fronto-parietal attention systems with the default mode network (DMN; Castellanos and Aoki, 2016). However, some results have been inconsistent, potentially due to methodological differences in the analytical strategies when defining the brain functional network, i.e., the functional connectivity threshold and/or the brain parcellation scheme. Here, we make use of topological data analysis (TDA) to explore the brain connectome as a function of the filtration value (i.e., the connectivity threshold), instead of using a static connectivity threshold. Specifically, we characterized the transition from all nodes being isolated to being connected into a single component as a function of the filtration value. We explored the utility of such a method to identify differences between 81 children with ADHD (45 male, age: 7.26–17.61 years old) and 96 typically developing children (TDC; 59 male, age: 7.17–17.96 years old), using a public dataset of resting state (rs)fMRI in human subjects. Results were highly congruent when using four different brain segmentations (atlases), and exhibited significant differences for the brain topology of children with ADHD, both at the whole-brain network and the functional subnetwork levels, particularly involving the frontal lobe and the DMN. Therefore, this is a solid approach that complements connectomics-related methods and may contribute to identify the neurophysio-pathology of ADHD.
  58. Nonlinear Dynamic Approaches to Identify Atrial Fibrillation Progression Based on Topological Methods (2019)

    Bahareh Safarbali, Seyed Mohammad Reza Hashemi Golpayegani
    Abstract In recent years, atrial fibrillation (AF) development from paroxysmal to persistent or permanent forms has become an important issue in cardiovascular disorders. Information about AF pattern of presentation (paroxysmal, persistent, or permanent) was useful in the management of algorithms in each category. This management is aimed at reducing symptoms and stopping severe problems associated with AF. AF classification has been based on time duration and episodes until now. In particular, complexity changes in Heart Rate Variation (HRV) may contain clinically relevant signals of imminent systemic dysregulation. A number of nonlinear methods based on phase space and topological properties can give more insight into HRV abnormalities such as fibrillation. Aiming to provide a nonlinear tool to qualitatively classify AF stages, we proposed two geometrical indices (fractal dimension and persistent homology) based on HRV phase space, which can successfully replicate the changes in AF progression. The study population includes 38 lone AF patients and 20 normal subjects, which are collected from the Physio-Bank database. “Time of Life (TOL)” is proposed as a new feature based on the initial and final Čech radius in the persistent homology diagram. A neural network was implemented to prove the effectiveness of both TOL and fractal dimension as classification features. The accuracy of classification performance was 93%. The proposed indices provide a signal representation framework useful to understand the dynamic changes in AF cardiac patterns and to classify normal and pathological rhythms.
  59. ChainNet: Learning on Blockchain Graphs With Topological Features (2019)

    N. C. Abay, C. G. Akcora, Y. R. Gel, M. Kantarcioglu, U. D. Islambekov, Y. Tian, B. Thuraisingham
    Abstract The following topics are dealt with: learning (artificial intelligence); graph theory; neural nets; pattern classification; data mining; feature extraction; recommender systems; pattern clustering; social networking (online); optimisation.
  60. Persistent Brain Network Homology From the Perspective of Dendrogram (2012)

    Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, Dong Soo Lee
    Abstract The brain network is usually constructed by estimating the connectivity matrix and thresholding it at an arbitrary level. The problem with this standard method is that we do not have any generally accepted criteria for determining a proper threshold. Thus, we propose a novel multiscale framework that models all brain networks generated over every possible threshold. Our approach is based on persistent homology and its various representations such as the Rips filtration, barcodes, and dendrograms. This new persistent homological framework enables us to quantify various persistent topological features at different scales in a coherent manner. The barcode is used to quantify and visualize the evolutionary changes of topological features such as the Betti numbers over different scales. By incorporating additional geometric information to the barcode, we obtain a single linkage dendrogram that shows the overall evolution of the network. The difference between the two networks is then measured by the Gromov-Hausdorff distance over the dendrograms. As an illustration, we modeled and differentiated the FDG-PET based functional brain networks of 24 attention-deficit hyperactivity disorder children, 26 autism spectrum disorder children, and 11 pediatric control subjects.
  61. A Barcode Shape Descriptor for Curve Point Cloud Data (2004)

    Anne Collins, Afra Zomorodian, Gunnar Carlsson, Leonidas J. Guibas
    Abstract In this paper, we present a complete computational pipeline for extracting a compact shape descriptor for curve point cloud data (PCD). Our shape descriptor, called a barcode, is based on a blend of techniques from differential geometry and algebraic topology. We also provide a metric over the space of barcodes, enabling fast comparison of PCDs for shape recognition and clustering. To demonstrate the feasibility of our approach, we implement our pipeline and provide experimental evidence in shape classification and parametrization.
  62. Cell Complex Neural Networks (2020)

    Mustafa Hajij, Kyle Istvan, Ghada Zamzami
    Abstract Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. We propose a general, combinatorial, and unifying construction for performing neural network-type computations on cell complexes. Furthermore, we introduce inter-cellular message passing schemes, message passing schemes on cell complexes that take the topology of the underlying space into account. In particular, our method generalizes many of the most popular types of graph neural networks.
  63. The Weighted Euler Curve Transform for Shape and Image Analysis (2020)

    Qitong Jiang, Sebastian Kurtek, Tom Needham
    Abstract The Euler Curve Transform (ECT) of Turner et al. is a complete invariant of an embedded simplicial complex, which is amenable to statistical analysis. We generalize the ECT to provide a similarly convenient representation for weighted simplicial complexes, objects which arise naturally, for example, in certain medical imaging applications. We leverage work of Ghrist et al. on Euler integral calculus to prove that this invariant—dubbed the Weighted Euler Curve Transform (WECT)—is also complete. We explain how to transform a segmented region of interest in a grayscale image into a weighted simplicial complex and then into a WECT representation. This WECT representation is applied to study Glioblastoma Multiforme brain tumor shape and texture data. We show that the WECT representation is effective at clustering tumors based on qualitative shape and texture features and that this clustering correlates with patient survival time.
  64. Revealing Key Structural Features Hidden in Liquids and Glasses (2019)

    Hajime Tanaka, Hua Tong, Rui Shi, John Russo
    Abstract A great success of solid state physics comes from the characterization of crystal structures in the reciprocal (wave vector) space. The power of structural characterization in Fourier space originates from the breakdown of translational and rotational symmetries. However, unlike crystals, liquids and amorphous solids possess continuous translational and rotational symmetries on a macroscopic scale, which makes Fourier space analysis much less effective. Lately, several studies have revealed local breakdown of translational and rotational symmetries even for liquids and glasses. Here, we review several mathematical methods used to characterize local structural features of apparently disordered liquids and glasses in real space. We distinguish two types of local ordering in liquids and glasses: energy-driven and entropy-driven. The former, which is favoured energetically by symmetry-selective directional bonding, is responsible for anomalous behaviours commonly observed in water-type liquids such as water, silicon, germanium and silica. The latter, which is often favoured entropically, shows connections with the heterogeneous, slow dynamics found in hard-sphere-like glass-forming liquids. We also discuss the relationship between such local ordering and crystalline structures and its impact on glass-forming ability.
  65. Graph Classification via Heat Diffusion on Simplicial Complexes (2020)

    Mehmet Emin Aktas, Esra Akbas
    Abstract In this paper, we study the graph classification problem in vertex-labeled graphs. Our main goal is to classify the graphs comparing their higher-order structures thanks to heat diffusion on their simplices. We first represent vertex-labeled graphs as simplex-weighted super-graphs. We then define the diffusion Frechet function over their simplices to encode the higher-order network topology and finally reach our goal by combining the function values with machine learning algorithms. Our experiments on real-world bioinformatics networks show that using diffusion Fr\éḩet function on simplices is promising in graph classification and more effective than the baseline methods. To the best of our knowledge, this paper is the first paper in the literature using heat diffusion on higher-dimensional simplices in a graph mining problem. We believe that our method can be extended to different graph mining domains, not only the graph classification problem.
  66. Improving Health Care Management Through Persistent Homology of Time-Varying Variability of Emergency Department Patient Flow (2018)

    Mael Dugast, Guillaume Bouleux, Olivier Mory, Eric Marcon
    Abstract Excessive admissions at the Emergency Department (ED) is a phenomenon very closely linked to the propagation of viruses. It is a cause of overcrowding for EDs and a public health problem. The aim of this work is to give EDs’ leaders more time for decision making during this period. Based on the admissions time series associated with specific clinical diagnoses, we will first perform a Detrended Fluctuation Analysis (DFA) to obtain the corresponding variability time series. Next, we will embed this time series on a manifold to obtain a point cloud representation and use Topological Data Analysis (TDA) through persistent homology technic to propose two early realtime indicators. One is the early indicator of abnormal arrivals at the ED whereas the second gives the information on the time index of the maximum number of arrivals. The performance of the detectors is parameter dependent and it can evolve each year. That is why we also propose to solve a bi-objective optimization problem to track the variations of this parameter.
  67. Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

    Arjuna P. H. Don, James F. Peters
    Abstract This article introduces an application of Ghrist barcodes in the study of persistent Betti numbers derived from vortex nerve complexes found in triangulations of video frames. A Ghrist barcode (also called a persistence barcode) is a topology of data pic- tograph useful in representing the persistence of the features of changing shapes. The basic approach is to introduce a free Abelian group representation of intersecting filled polygons on the barycenters of the triangles of Alexandroff nerves. An Alexandroff nerve is a maximal collection of triangles of a common vertex in the triangulation of a finite, bounded planar region. In our case, the planar region is a video frame. A Betti number is a count of the number of generators is a finite Abelian group. The focus here is on the persistent Betti numbers across sequences of triangulated video frames. Each Betti number is mapped to an entry in a Ghrist barcode. Two main results are given, namely, vortex nerves are Edelsbrunner-Harer nerve complexes and the Betti number of a vortex nerve equals k + 2 for a vortex nerve containing k edges attached between a pair of vortex cycles in the nerve.
  68. Using Persistent Homology as a New Approach for Super-Resolution Localization Microscopy Data Analysis and Classification of γH2AX Foci/Clusters (2018)

    Andreas Hofmann, Matthias Krufczik, Dieter W. Heermann, Michael Hausmann
    Abstract DNA double strand breaks (DSB) are the most severe damages in chromatin induced by ionizing radiation. In response to such environmentally determined stress situations, cells have developed repair mechanisms. Although many investigations have contributed to a detailed understanding of repair processes, e.g., homologous recombination repair or non-homologous end-joining, the question is not sufficiently answered, how a cell decides to apply a certain repair process at a certain damage site, since all different repair pathways could simultaneously occur in the same cell nucleus. One of the first processes after DSB induction is phosphorylation of the histone variant H2AX to γH2AX in the given surroundings of the damaged locus. Since the spatial organization of chromatin is not random, it may be conclusive that the spatial organization of γH2AX foci is also not random, and rather, contributes to accessibility of special repair proteins to the damaged site, and thus, to the following repair pathway at this given site. The aim of this article is to demonstrate a new approach to analyze repair foci by their topology in order to obtain a cell independent method of categorization. During the last decade, novel super-resolution fluorescence light microscopic techniques have enabled new insights into genome structure and spatial organization on the nano-scale in the order of 10 nm. One of these techniques is single molecule localization microscopy (SMLM) with which the spatial coordinates of single fluorescence molecules can precisely be determined and density and distance distributions can be calculated. This method is an appropriate tool to quantify complex changes of chromatin and to describe repair foci on the single molecule level. Based on the pointillist information obtained by SMLM from specifically labeled heterochromatin and γH2AX foci reflecting the chromatin morphology and repair foci topology, we have developed a new analytical methodology of foci or foci cluster characterization, respectively, by means of persistence homology. This method allows, for the first time, a cell independent comparison of two point distributions (here the point distributions of two γH2AX clusters) with each other of a selected ensample and to give a mathematical measure of their similarity. In order to demonstrate the feasibility of this approach, cells were irradiated by low LET (linear energy transfer) radiation with different doses and the heterochromatin and γH2AX foci were fluorescently labeled by antibodies for SMLM. By means of our new analysis method, we were able to show that the topology of clusters of γH2AX foci can be categorized depending on the distance to heterochromatin. This method opens up new possibilities to categorize spatial organization of point patterns by parameterization of topological similarity.
  69. Hierarchical Structures of Amorphous Solids Characterized by Persistent Homology (2016)

    Yasuaki Hiraoka, Takenobu Nakamura, Akihiko Hirata, Emerson G. Escolar, Kaname Matsue, Yasumasa Nishiura
    Abstract This article proposes a topological method that extracts hierarchical structures of various amorphous solids. The method is based on the persistence diagram (PD), a mathematical tool for capturing shapes of multiscale data. The input to the PDs is given by an atomic configuration and the output is expressed as 2D histograms. Then, specific distributions such as curves and islands in the PDs identify meaningful shape characteristics of the atomic configuration. Although the method can be applied to a wide variety of disordered systems, it is applied here to silica glass, the Lennard-Jones system, and Cu-Zr metallic glass as standard examples of continuous random network and random packing structures. In silica glass, the method classified the atomic rings as short-range and medium-range orders and unveiled hierarchical ring structures among them. These detailed geometric characterizations clarified a real space origin of the first sharp diffraction peak and also indicated that PDs contain information on elastic response. Even in the Lennard-Jones system and Cu-Zr metallic glass, the hierarchical structures in the atomic configurations were derived in a similar way using PDs, although the glass structures and properties substantially differ from silica glass. These results suggest that the PDs provide a unified method that extracts greater depth of geometric information in amorphous solids than conventional methods.
  70. Using Persistent Homology to Reveal Hidden Information in Neural Data (2015)

    Gard Spreemann, Benjamin Dunn, Magnus Bakke Botnan, Nils A. Baas
    Abstract We propose a method, based on persistent homology, to uncover topological properties of a priori unknown covariates of neuron activity. Our input data consist of spike train measurements of a set of neurons of interest, a candidate list of the known stimuli that govern neuron activity, and the corresponding state of the animal throughout the experiment performed. Using a generalized linear model for neuron activity and simple assumptions on the effects of the external stimuli, we infer away any contribution to the observed spike trains by the candidate stimuli. Persistent homology then reveals useful information about any further, unknown, covariates.
  71. A Framework for Topological Music Analysis (TMA) (2022)

    Alberto Alcalá-Alvarez, Pablo Padilla-Longoria
    Abstract In the present article we describe and discuss a framework for applying different topological data analysis (TDA) techniques to a music fragment given as a score in traditional Western notation. We first consider different sets of points in Euclidean spaces of different dimensions that correspond to musical events in the score, and obtain their persistent homology features. Then we introduce two families of simplicial complexes that can be associated to chord sequences, and calculate their main homological descriptors. These complexes lead us to the definition of dynamical systems modeling harmonic progressions. Finally, we show the results of applying the described methods to the analysis and stylistic comparison of fragments from three Brandenburg Concertos by J.S. Bach and two Graffiti by Mexican composer Armando Luna.
  72. Decoding of Neural Data Using Cohomological Feature Extraction (2019)

    Erik Rybakken, Nils Baas, Benjamin Dunn
    Abstract We introduce a novel data-driven approach to discover and decode features in the neural code coming from large population neural recordings with minimal assumptions, using cohomological feature extraction. We apply our approach to neural recordings of mice moving freely in a box, where we find a circular feature. We then observe that the decoded value corresponds well to the head direction of the mouse. Thus, we capture head direction cells and decode the head direction from the neural population activity without having to process the mouse's behavior. Interestingly, the decoded values convey more information about the neural activity than the tracked head direction does, with differences that have some spatial organization. Finally, we note that the residual population activity, after the head direction has been accounted for, retains some low-dimensional structure that is correlated with the speed of the mouse.
  73. Unifying Immunology With Informatics and Multiscale Biology (2014)

    Brian A Kidd, Lauren A Peters, Eric E Schadt, Joel T Dudley
    Abstract The immune system is a highly complex and dynamic system. Historically, the most common scientific and clinical practice has been to evaluate its individual components. This kind of approach cannot always expose the interconnecting pathways that control immune-system responses and does not reveal how the immune system works across multiple biological systems and scales. High-throughput technologies can be used to measure thousands of parameters of the immune system at a genome-wide scale. These system-wide surveys yield massive amounts of quantitative data that provide a means to monitor and probe immune-system function. New integrative analyses can help synthesize and transform these data into valuable biological insight. Here we review some of the computational analysis tools for high-dimensional data and how they can be applied to immunology.
  74. Feature Detection and Hypothesis Testing for Extremely Noisy Nanoparticle Images Using Topological Data Analysis (2023)

    Andrew M. Thomas, Peter A. Crozier, Yuchen Xu, David S. Matteson
    Abstract We propose a flexible algorithm for feature detection and hypothesis testing in images with ultra-low signal-to-noise ratio using cubical persistent homology. Our main application is in the identification of atomic columns and other features in Transmission Electron Microscopy (TEM). Cubical persistent homology is used to identify local minima and their size in subregions in the frames of nanoparticle videos, which are hypothesized to correspond to relevant atomic features. We compare the performance of our algorithm to other employed methods for the detection of columns and their intensity. Additionally, Monte Carlo goodness-of-fit testing using real-valued summaries of persistence diagrams derived from smoothed images (generated from pixels residing in the vacuum region of an image) is developed and employed to identify whether or not the proposed atomic features generated by our algorithm are due to noise. Using these summaries derived from the generated persistence diagrams, one can produce univariate time series for the nanoparticle videos, thus, providing a means for assessing fluxional behavior. A guarantee on the false discovery rate for multiple Monte Carlo testing of identical hypotheses is also established.

    Community Resources

  75. Automatic Tree Ring Detection Using Jacobi Sets (2020)

    Kayla Makela, Tim Ophelders, Michelle Quigley, Elizabeth Munch, Daniel Chitwood, Asia Dowtin
    Abstract Tree ring widths are an important source of climatic and historical data, but measuring these widths typically requires extensive manual work. Computer vision techniques provide promising directions towards the automation of tree ring detection, but most automated methods still require a substantial amount of user interaction to obtain high accuracy. We perform analysis on 3D X-ray CT images of a cross-section of a tree trunk, known as a tree disk. We present novel automated methods for locating the pith (center) of a tree disk, and ring boundaries. Our methods use a combination of standard image processing techniques and tools from topological data analysis. We evaluate the efficacy of our method for two different CT scans by comparing its results to manually located rings and centers and show that it is better than current automatic methods in terms of correctly counting each ring and its location. Our methods have several parameters, which we optimize experimentally by minimizing edit distances to the manually obtained locations.
  76. HERMES: Persistent Spectral Graph Software (2020)

    Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei Wei
    Abstract Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacians (PLs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.
  77. Parametric Inference Using Persistence Diagrams: a Case Study in Population Genetics (2014)

    Kevin Emmett, Daniel Rosenbloom, Pablo Camara, Raul Rabadan
    Abstract Persistent homology computes topological invariants from point cloud data. Recent work has focused on developing statistical methods for data analysis in this framework. We show that, in certain models, parametric inference can be performed using statistics defined on the computed invariants. We develop this idea with a model from population genetics, the coalescent with recombination. We apply our model to an influenza dataset, identifying two scales of topological structure which have a distinct biological interpretation.
  78. Topological Data Analysis for Genomics and Evolution: Topology in Biology (2019)

    Raul Rabadan, Andrew J. Blumberg
    Abstract Biology has entered the age of Big Data. A technical revolution has transformed the field, and extracting meaningful information from large biological data sets is now a central methodological challenge. Algebraic topology is a well-established branch of pure mathematics that studies qualitative descriptors of the shape of geometric objects. It aims to reduce comparisons of shape to a comparison of algebraic invariants, such as numbers, which are typically easier to work with. Topological data analysis is a rapidly developing subfield that leverages the tools of algebraic topology to provide robust multiscale analysis of data sets. This book introduces the central ideas and techniques of topological data analysis and its specific applications to biology, including the evolution of viruses, bacteria and humans, genomics of cancer, and single cell characterization of developmental processes. Bridging two disciplines, the book is for researchers and graduate students in genomics and evolutionary biology as well as mathematicians interested in applied topology.
  79. Model Comparison via Simplicial Complexes and Persistent Homology (2020)

    Sean T. Vittadello, Michael P. H. Stumpf
    Abstract In many scientific and technological contexts we have only a poor understanding of the structure and details of appropriate mathematical models. We often need to compare different models. With available data we can use formal statistical model selection to compare and contrast the ability of different mathematical models to describe such data. But there is a lack of rigorous methods to compare different models \emph\a priori\. Here we develop and illustrate two such approaches that allow us to compare model structures in a systematic way. Using well-developed and understood concepts from simplicial geometry we are able to define a distance based on the persistent homology applied to the simplicial complexes that captures the model structure. In this way we can identify shared topological features of different models. We then expand this, and move from a distance between simplicial complexes to studying equivalences between models in order to determine their functional relatedness.
  80. Specimen-Based Analysis of Morphology and the Environment in Ecologically Dominant Grasses: The Power of the Herbarium (2019)

    Christine A. McAllister, Michael R. McKain, Mao Li, Bess Bookout, Elizabeth A. Kellogg
    Abstract Herbaria contain a cumulative sample of the world's flora, assembled by thousands of people over centuries. To capitalize on this resource, we conducted a specimen-based analysis of a major clade in the grass tribe Andropogoneae, including the dominant species of the world's grasslands in the genera Andropogon, Schizachyrium, Hyparrhenia and several others. We imaged 186 of the 250 named species of the clade, georeferenced the specimens and extracted climatic variables for each. Using semi- and fully automated image analysis techniques, we extracted spikelet morphological characters and correlated these with environmental variables. We generated chloroplast genome sequences to correct for phylogenetic covariance and here present a new phylogeny for 81 of the species. We confirm and extend earlier studies to show that Andropogon and Schizachyrium are not monophyletic. In addition, we find all morphological and ecological characters are homoplasious but variable among clades. For example, sessile spikelet length is positively correlated with awn length when all accessions are considered, but when separated by clade, the relationship is positive for three sub-clades and negative for three others. Climate variables showed no correlation with morphological variation in the spikelet pair; only very weak effects of temperature and precipitation were detected on macrohair density. This article is part of the theme issue ‘Biological collections for understanding biodiversity in the Anthropocene'.
  81. Morse Theory and Persistent Homology for Topological Analysis of 3D Images of Complex Materials (2014)

    O. Delgado-Friedrichs, V. Robins, A. Sheppard
    Abstract We develop topologically accurate and compatible definitions for the skeleton and watershed segmentation of a 3D digital object that are computed by a single algorithm. These definitions are based on a discrete gradient vector field derived from a signed distance transform. This gradient vector field is amenable to topological analysis and simplification via For-man's discrete Morse theory and provides a filtration that can be used as input to persistent homology algorithms. Efficient implementations allow us to process large-scale x-ray micro-CT data of rock cores and other materials.
  82. Topological Feature Extraction for Comparison of Terascale Combustion Simulation Data (2011)

    Ajith Mascarenhas, Ray W. Grout, Peer-Timo Bremer, Evatt R. Hawkes, Valerio Pascucci, Jacqueline H. Chen
    Abstract We describe a combinatorial streaming algorithm to extract features which identify regions of local intense rates of mixing in twoterascale turbulent combustion simulations. Our algorithm allows simulation data comprised of scalar fields represented on 728x896x512 or 2025x1600x400 grids to be processed on a single relatively lightweight machine. The turbulence-induced mixing governs the rate of reaction and hence is of principal interest in these combustion simulations. We use our feature extraction algorithm to compare two very different simulations and find that in both the thickness of the extracted features grows with decreasing turbulence intensity. Simultaneous consideration of results of applying the algorithm to the HO2 mass fraction field indicates that autoignition kernels near the base of a lifted flame tend not to overlap with the high mixing rate regions.
  83. Visualizing Emergent Identity of Assemblages in the Consumer Internet of Things: A Topological Data Analysis Approach (2016)

    Thomas Novak, Donna L. Hoffman
    Abstract The identity of a consumer Internet of Things (IoT) assemblage emerges through a historical process of ongoing interactions among consumers, smart devices, and digital information. Topological Data Analysis (TDA), consistent with mathematical aspects of assemblage theory, is used to visualize the underlying possibility space from which individual IoT assemblages emerge.
  84. Topological Singularity Detection at Multiple Scales (2023)

    Julius von Rohrscheidt, Bastian Rieck
    Abstract The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.
  85. Towards a Philological Metric Through a Topological Data Analysis Approach (2020)

    Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Miguel A. Gutiérrez-Naranjo
    Abstract The canon of the baroque Spanish literature has been thoroughly studied with philological techniques. The major representatives of the poetry of this epoch are Francisco de Quevedo and Luis de Góngora y Argote. They are commonly classified by the literary experts in two different streams: Quevedo belongs to the Conceptismo and G\ńgora to the Culteranismo. Besides, traditionally, even if Quevedo is considered the most representative of the Conceptismo, Lope de Vega is also considered to be, at least, closely related to this literary trend. In this paper, we use Topological Data Analysis techniques to provide a first approach to a metric distance between the literary style of these poets. As a consequence, we reach results that are under the literary experts' criteria, locating the literary style of Lope de Vega, closer to the one of Quevedo than to the one of G\'ǵora.
  86. Delineation of a Conserved Arrestin-Biased Signaling Repertoire in Vivo (2015)

    Stuart Maudsley, Bronwen Martin, Diane Gesty-Palmer, Huey Cheung, Calvin Johnson, Shamit Patel, Kevin G. Becker, William H. Wood, Yongqing Zhang, Elin Lehrmann, Louis M. Luttrell
    Abstract Biased G protein–coupled receptor agonists engender a restricted repertoire of downstream events from their cognate receptors, permitting them to produce mixed agonist-antagonist effects in vivo. While this opens the possibility of novel therapeutics, it complicates rational drug design, since the in vivo response to a biased agonist cannot be reliably predicted from its in cellula efficacy. We have employed novel informatic approaches to characterize the in vivo transcriptomic signature of the arrestin pathway-selective parathyroid hormone analog [d-Trp12, Tyr34]bovine PTH(7-34) in six different murine tissues after chronic drug exposure. We find that [d-Trp12, Tyr34]bovine PTH(7-34) elicits a distinctive arrestin-signaling focused transcriptomic response that is more coherently regulated across tissues than that of the pluripotent agonist, human PTH(1-34). This arrestin-focused network is closely associated with transcriptional control of cell growth and development. Our demonstration of a conserved arrestin-dependent transcriptomic signature suggests a framework within which the in vivo outcomes of arrestin-biased signaling may be generalized.
  87. Quantitative Analysis of Phase Transitions in Two-Dimensional XY Models Using Persistent Homology (2022)

    Nicholas Sale, Jeffrey Giansiracusa, Biagio Lucini
    Abstract We use persistent homology and persistence images as an observable of three different variants of the two-dimensional XY model in order to identify and study their phase transitions. We examine models with the classical XY action, a topological lattice action, and an action with an additional nematic term. In particular, we introduce a new way of computing the persistent homology of lattice spin model configurations and, by considering the fluctuations in the output of logistic regression and k-nearest neighbours models trained on persistence images, we develop a methodology to extract estimates of the critical temperature and the critical exponent of the correlation length. We put particular emphasis on finite-size scaling behaviour and producing estimates with quantifiable error. For each model we successfully identify its phase transition(s) and are able to get an accurate determination of the critical temperatures and critical exponents of the correlation length.
  88. A Novel Approach for Wafer Defect Pattern Classification Based on Topological Data Analysis (2023)

    Seungchan Ko, Dowan Koo
    Abstract In semiconductor manufacturing, wafer map defect pattern provides critical information for facility maintenance and yield management, so the classification of defect patterns is one of the most important tasks in the manufacturing process. In this paper, we propose a novel way to represent the shape of the defect pattern as a finite-dimensional vector, which will be used as an input for a neural network algorithm for classification. The main idea is to extract the topological features of each pattern by using the theory of persistent homology from topological data analysis (TDA). Through some experiments with a simulated dataset, we show that the proposed method is faster and much more efficient in training with higher accuracy, compared with the method using convolutional neural networks (CNN) which is the most common approach for wafer map defect pattern classification. Moreover, it was shown that our method outperforms the CNN-based method when the number of training data is not enough and is imbalanced.
  89. Severe Slugging Flow Identification From Topological Indicators (2022)

    Simone Casolo
    Abstract In this work, topological data analysis is used to identify the onset of severe slug flow in offshore petroleum production systems. Severe slugging is a multiphase flow regime known to be very inefficient and potentially harmful to process equipment and it is characterized by large oscillations in the production fluid pressure. Time series from pressure sensors in subsea oil wells are processed by means of Takens embedding to produce point clouds of data. Embedded sensor data is then analyzed using persistent homology to obtain topological indicators capable of revealing the occurrence of severe slugging in a condition-based monitoring approach. A large dataset of well events consisting of both real and simulated data is used to demonstrate the possibilty of authomatizing severe slugging detection from live data via topological data analysis. Methods based on persistence diagrams are shown to accurately identify severe slugging and to classify different flow regimes from pressure signals of producing wells with supervised machine learning.
  90. Contagion Dynamics for Manifold Learning (2020)

    Barbara I. Mahler
    Abstract Contagion maps exploit activation times in threshold contagions to assign vectors in high-dimensional Euclidean space to the nodes of a network. A point cloud that is the image of a contagion map reflects both the structure underlying the network and the spreading behaviour of the contagion on it. Intuitively, such a point cloud exhibits features of the network's underlying structure if the contagion spreads along that structure, an observation which suggests contagion maps as a viable manifold-learning technique. We test contagion maps as a manifold-learning tool on a number of different real-world and synthetic data sets, and we compare their performance to that of Isomap, one of the most well-known manifold-learning algorithms. We find that, under certain conditions, contagion maps are able to reliably detect underlying manifold structure in noisy data, while Isomap fails due to noise-induced error. This consolidates contagion maps as a technique for manifold learning.
  91. Crystallographic Interacting Topological Phases and Equvariant Cohomology: To Assume or Not to Assume (2020)

    Daniel Sheinbaum, Omar Antolín Camarena
    Abstract For symmorphic crystalline interacting gapped systems we derive a classification under adiabatic evolution. This classification is complete for non-degenerate ground states. For the degenerate case we discuss some invariants given by equivariant characteristic classes. We do not assume an emergent relativistic field theory nor that phases form a topological spectrum. We also do not assume short-range entanglement nor the existence of quasi-particles as is done in SPT and SET classifications respectively. Using a slightly generalized Bloch decomposition and Grassmanians made out of ground state spaces, we show that the \$P\$-equivariant cohomology of a \$d\$-dimensional torus gives rise to different interacting phases. We compare our results to bosonic symmorphic crystallographic SPT phases and to non-interacting fermionic crystallographic phases in class A. Finally we discuss the relation of our assumptions to those made for crystallographic SPT and SET phases.
  92. Go With the Flow? A Large-Scale Analysis of Health Care Delivery Networks in the United States Using Hodge Theory (2021)

    Thomas Gebhart, Xiaojun Fu, Russell J. Funk
    Abstract Health care delivery is a collaborative process, requiring close coordination among networks of providers with specialized expertise. Yet in the United States, care is often spread across multiple disconnected providers (e.g., primary care physicians, specialists), leading to fragmented care delivery networks, and contributing to higher costs and lower quality. While this problem is well known, there are relatively few quantitative tools available for characterizing the dynamics of care delivery networks at scale, thereby inhibiting deeper understanding of care fragmentation and efforts to address it. In this, study, we conduct a large-scale analysis of care delivery networks across the United States using the discrete Hodge decomposition, an emerging method of topological data analysis. Using this technique, we decompose networks of patient flows among physicians into three orthogonal subspaces: gradient (acyclic flow), harmonic (global cyclic flow), and curl (local cyclic flow). We document substantial variation in the relative importance of each subspace, suggesting that there may be systematic differences in the organization of care delivery networks across health care markets. Moreover, we find that the relative importance of each subspace is predictive of local care cost and quality, with outcomes tending to be better with greater curl flow and worse with greater harmonic flow.
  93. Barcodes Distinguish Morphology of Neuronal Tauopathy (2022)

    David Beers, Despoina Goniotaki, Diane P. Hanger, Alain Goriely, Heather A. Harrington
    Abstract The geometry of neurons is known to be important for their functions. Hence, neurons are often classified by their morphology. Two recent methods, persistent homology and the topological morphology descriptor, assign a morphology descriptor called a barcode to a neuron equipped with a given function, such as the Euclidean distance from the root of the neuron. These barcodes can be converted into matrices called persistence images, which can then be averaged across groups. We show that when the defining function is the path length from the root, both the topological morphology descriptor and persistent homology are equivalent. We further show that persistence images arising from the path length procedure provide an interpretable summary of neuronal morphology. We introduce \topological morphology functions\, a class of functions similar to Sholl functions, that can be recovered from the associated topological morphology descriptor. To demonstrate this topological approach, we compare healthy cortical and hippocampal mouse neurons to those affected by progressive tauopathy. We find a significant difference in the morphology of healthy neurons and those with a tauopathy at a postsymptomatic age. We use persistence images to conclude that the diseased group tends to have neurons with shorter branches as well as fewer branches far from the soma.
  94. Evasion Paths in Mobile Sensor Networks (2015)

    Henry Adams, Gunnar Carlsson
    Abstract Suppose that ball-shaped sensors wander in a bounded domain. A sensor does not know its location but does know when it overlaps a nearby sensor. We say that an evasion path exists in this sensor network if a moving intruder can avoid detection. In ‘Coordinate-free coverage in sensor networks with controlled boundaries via homology', Vin de Silva and Robert Ghrist give a necessary condition, depending only on the time-varying connectivity data of the sensors, for an evasion path to exist. Using zigzag persistent homology, we provide an equivalent condition that moreover can be computed in a streaming fashion. However, no method with time-varying connectivity data as input can give necessary and sufficient conditions for the existence of an evasion path. Indeed, we show that the existence of an evasion path depends not only on the fibrewise homotopy type of the region covered by sensors but also on its embedding in spacetime. For planar sensors that also measure weak rotation and distance information, we provide necessary and sufficient conditions for the existence of an evasion path.
  95. Topological Data Analysis of Zebrafish Patterns (2020)

    Melissa R. McGuirl, Alexandria Volkening, Björn Sandstede
    Abstract Self-organized pattern behavior is ubiquitous throughout nature, from fish schooling to collective cell dynamics during organism development. Qualitatively these patterns display impressive consistency, yet variability inevitably exists within pattern-forming systems on both microscopic and macroscopic scales. Quantifying variability and measuring pattern features can inform the underlying agent interactions and allow for predictive analyses. Nevertheless, current methods for analyzing patterns that arise from collective behavior capture only macroscopic features or rely on either manual inspection or smoothing algorithms that lose the underlying agent-based nature of the data. Here we introduce methods based on topological data analysis and interpretable machine learning for quantifying both agent-level features and global pattern attributes on a large scale. Because the zebrafish is a model organism for skin pattern formation, we focus specifically on analyzing its skin patterns as a means of illustrating our approach. Using a recent agent-based model, we simulate thousands of wild-type and mutant zebrafish patterns and apply our methodology to better understand pattern variability in zebrafish. Our methodology is able to quantify the differential impact of stochasticity in cell interactions on wild-type and mutant patterns, and we use our methods to predict stripe and spot statistics as a function of varying cellular communication. Our work provides an approach to automatically quantifying biological patterns and analyzing agent-based dynamics so that we can now answer critical questions in pattern formation at a much larger scale.
  96. On the Local Behavior of Spaces of Natural Images (2008)

    Gunnar Carlsson, Tigran Ishkhanov, Vin de Silva, Afra Zomorodian
    Abstract In this study we concentrate on qualitative topological analysis of the local behavior of the space of natural images. To this end, we use a space of 3 by 3 high-contrast patches ℳ. We develop a theoretical model for the high-density 2-dimensional submanifold of ℳ showing that it has the topology of the Klein bottle. Using our topological software package PLEX we experimentally verify our theoretical conclusions. We use polynomial representation to give coordinatization to various subspaces of ℳ. We find the best-fitting embedding of the Klein bottle into the ambient space of ℳ. Our results are currently being used in developing a compression algorithm based on a Klein bottle dictionary.
  97. Practical Joint Human-Machine Exploration of Industrial Time Series Using the Matrix Profile (2023)

    Felix Nilsson, Mohamed-Rafik Bouguelia, Thorsteinn Rögnvaldsson
    Abstract Technological advancements and widespread adaptation of new technology in industry have made industrial time series data more available than ever before. With this development grows the need for versatile methods for mining industrial time series data. This paper introduces a practical approach for joint human-machine exploration of industrial time series data using the Matrix Profile, and presents some challenges involved. The approach is demonstrated on three real-life industrial data sets to show how it enables the user to quickly extract semantic information, detect cycles, find deviating patterns, and gain a deeper understanding of the time series. A benchmark test is also presented on ECG (electrocardiogram) data, showing that the approach works well in comparison to previously suggested methods for extracting relevant time series motifs.
  98. Dynamic State Analysis of a Driven Magnetic Pendulum Using Ordinal Partition Networks and Topological Data Analysis (2020)

    Audun Myers, Firas A. Khasawneh
    Abstract Abstract. The use of complex networks for time series analysis has recently shown to be useful as a tool for detecting dynamic state changes for a wide variety of applications. In this work, we implement the commonly used ordinal partition network to transform a time series into a network for detecting these state changes for the simple magnetic pendulum. The time series that we used are obtained experimentally from a base-excited magnetic pendulum apparatus, and numerically from the corresponding governing equations. The magnetic pendulum provides a relatively simple, non-linear example demonstrating transitions from periodic to chaotic motion with the variation of system parameters. For our method, we implement persistent homology, a shape measuring tool from Topological Data Analysis (TDA), to summarize the shape of the resulting ordinal partition networks as a tool for detecting state changes. We show that this network analysis tool provides a clear distinction between periodic and chaotic time series. Another contribution of this work is the successful application of the networks-TDA pipeline, for the first time, to signals from non-autonomous nonlinear systems. This opens the door for our approach to be used as an automatic design tool for studying the effect of design parameters on the resulting system response. Other uses of this approach include fault detection from sensor signals in a wide variety of engineering operations.
  99. Single-Cell Topological RNA-Seq Analysis Reveals Insights Into Cellular Differentiation and Development (2017)

    Abbas H. Rizvi, Pablo G. Camara, Elena K. Kandror, Thomas J. Roberts, Ira Schieren, Tom Maniatis, Raul Rabadan
    Abstract Transcriptional programs control cellular lineage commitment and differentiation during development. Understanding cell fate has been advanced by studying single-cell RNA-seq, but is limited by the assumptions of current analytic methods regarding the structure of data. We present single-cell topological data analysis (scTDA), an algorithm for topology-based computational analyses to study temporal, unbiased transcriptional regulation. Compared to other methods, scTDA is a non-linear, model-independent, unsupervised statistical framework that can characterize transient cellular states. We applied scTDA to the analysis of murine embryonic stem cell (mESC) differentiation in vitro in response to inducers of motor neuron differentiation. scTDA resolved asynchrony and continuity in cellular identity over time, and identified four transient states (pluripotent, precursor, progenitor, and fully differentiated cells) based on changes in stage-dependent combinations of transcription factors, RNA-binding proteins and long non-coding RNAs. scTDA can be applied to study asynchronous cellular responses to either developmental cues or environmental perturbations.
  100. Applications of Persistent Homology to Time Varying Systems (2013)

    Elizabeth Munch
    Abstract \textlessp\textgreaterThis dissertation extends the theory of persistent homology to time varying systems. Most of the previous work has been dedicated to using this powerful tool in topological data analysis to study static point clouds. In particular, given a point cloud, we can construct its persistence diagram. Since the diagram varies continuously as the point cloud varies continuously, we study the space of time varying persistence diagrams, called vineyards when they were introduced by Cohen-Steiner, Edelsbrunner, and Morozov.\textless/p\textgreater\textlessp\textgreaterWe will first show that with a good choice of metric, these vineyards are stable for small perturbations of their associated point clouds. We will also define a new mean for a set of persistence diagrams based on the work of Mileyko et al. which, unlike the previously defined mean, is continuous for geodesic vineyards. \textless/p\textgreater\textlessp\textgreaterNext, we study the sensor network problem posed by Ghrist and de Silva, and their application of persistent homology to understand when a set of sensors covers a given region. Giving each of these sensors a probability of failure over time, we show that an exact computation of the probability of failure of the whole system is NP-hard, but give an algorithm which can predict failure in the case of a monitored system.\textless/p\textgreater\textlessp\textgreaterFinally, we apply these methods to an automated system which can cluster agents moving in aerial images by their behaviors. We build a data structure for storing and querying the information in real-time, and define behavior vectors which quantify behaviors of interest. This clustering by behavior can be used to find groups of interest, for which we can also quantify behaviors in order to determine whether the group is working together to achieve a common goal, and we speculate that this work can be extended to improving tracking algorithms as well as behavioral predictors.\textless/p\textgreater
  101. Persistent Homology Analysis of Brain Transcriptome Data in Autism (2019)

    Daniel Shnier, Mircea A. Voineagu, Irina Voineagu
    Abstract Persistent homology methods have found applications in the analysis of multiple types of biological data, particularly imaging data or data with a spatial and/or temporal component. However, few studies have assessed the use of persistent homology for the analysis of gene expression data. Here we apply persistent homology methods to investigate the global properties of gene expression in post-mortem brain tissue (cerebral cortex) of individuals with autism spectrum disorders (ASD) and matched controls. We observe a significant difference in the geometry of inter-sample relationships between autism and healthy controls as measured by the sum of the death times of zero-dimensional components and the Euler characteristic. This observation is replicated across two distinct datasets, and we interpret it as evidence for an increased heterogeneity of gene expression in autism. We also assessed the topology of gene-level point clouds and did not observe significant differences between ASD and control transcriptomes, suggesting that the overall transcriptome organization is similar in ASD and healthy cerebral cortex. Overall, our study provides a novel framework for persistent homology analyses of gene expression data for genetically complex disorders.
  102. Ultrahigh-Pressure Form of \$\Mathrm\Si\\\mathrm\O\\_\2\\$ Glass With Dense Pyrite-Type Crystalline Homology (2019)

    M. Murakami, S. Kohara, N. Kitamura, J. Akola, H. Inoue, A. Hirata, Y. Hiraoka, Y. Onodera, I. Obayashi, J. Kalikka, N. Hirao, T. Musso, A. S. Foster, Y. Idemoto, O. Sakata, Y. Ohishi
    Abstract High-pressure synthesis of denser glass has been a longstanding interest in condensed-matter physics and materials science because of its potentially broad industrial application. Nevertheless, understanding its nature under extreme pressures has yet to be clarified due to experimental and theoretical challenges. Here we reveal the formation of OSi4 tetraclusters associated with that of SiO7 polyhedra in SiO2 glass under ultrahigh pressures to 200 gigapascal confirmed both experimentally and theoretically. Persistent homology analyses with molecular dynamics simulations found increased packing fraction of atoms whose topological diagram at ultrahigh pressures is similar to a pyrite-type crystalline phase, although the formation of tetraclusters is prohibited in the crystalline phase. This critical difference would be caused by the potential structural tolerance in the glass for distortion of oxygen clusters. Furthermore, an expanded electronic band gap demonstrates that chemical bonds survive at ultrahigh pressure. This opens up the synthesis of topologically disordered dense oxide glasses.
  103. A Topological Data Analysis Based Classification Method for Multiple Measurements (2019)

    Henri Riihimäki, Wojciech Chachólski, Jakob Theorell, Jan Hillert, Ryan Ramanujam
    Abstract \textlessh3\textgreaterAbstract\textless/h3\textgreater \textlessh3\textgreaterBackground\textless/h3\textgreater \textlessp\textgreaterMachine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values.\textless/p\textgreater\textlessh3\textgreaterResults\textless/h3\textgreater \textlessp\textgreaterFor 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model.\textless/p\textgreater\textlessh3\textgreaterConclusions\textless/h3\textgreater \textlessp\textgreaterThis algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.\textless/p\textgreater
  104. Tuning Cavitation and Crazing in Polymer Nanocomposite Glasses Containing Bimodal Grafted Nanoparticles at the Nanoparticle/Polymer Interface (2019)

    Rui Shi, Hu-Jun Qian, Zhong-Yuan Lu
    Abstract It is widely accepted that adding nanoparticles (NPs) into polymer matrices can dramatically alter the mechanical properties of the material, and that the properties at the NP/polymer interface play a vital role. By performing coarse-grained molecular dynamics simulations, we study the stress–strain behaviour of polymer/NP composites (PNCs) in a glassy state under a triaxial tensile deformation, in which the NPs are well dispersed in the system via bimodal grafting. A ‘HOMO’ system, in which the short grafted chains are chemically identical to the matrix polymer, and a ‘HETERO’ system, in which the short grafted chains interact weakly with the matrix, are investigated. Our simulations demonstrate that the HOMO system behaves very similarly to the pure polymer system, with quick cavitation and a drop in stress after the yielding point, corresponding to a craze deformation process. While in the HETERO system, weak interactions between the short grafts and the matrix polymer induce a low local modulus, therefore, rather homogeneous void formation and consequently a slower cavitation process are observed at the surface of the well dispersed NPs during the tensile deformation. As a result, the depletion effect at the NP surface eventually leads to NP re-assembly at large strains. Moreover, the HETERO system undergoes a shear-deformation-tended tensile process rather than the craze deformation found in the HOMO system. At the same time, the HETERO system is more ductile, with a much slower drop in stress after yielding than the HOMO system. In addition, the homogeneous generation of voids at small strain in the HETERO system can be utilized in the fabrication of polymer films with desirable separation abilities for gases or small molecules. We hope that these simulation results will be helpful for the property regulation of PNC materials containing polymer grafted NPs.
  105. Persistent Homology Analysis of Protein Structure, Flexibility, and Folding (2014)

    Kelin Xia, Guo-Wei Wei
    Abstract SUMMARYProteins are the most important biomolecules for living organisms. The understanding of protein structure, function, dynamics, and transport is one of the most challenging tasks in biological science. In the present work, persistent homology is, for the first time, introduced for extracting molecular topological fingerprints (MTFs) based on the persistence of molecular topological invariants. MTFs are utilized for protein characterization, identification, and classification. The method of slicing is proposed to track the geometric origin of protein topological invariants. Both all-atom and coarse-grained representations of MTFs are constructed. A new cutoff-like filtration is proposed to shed light on the optimal cutoff distance in elastic network models. On the basis of the correlation between protein compactness, rigidity, and connectivity, we propose an accumulated bar length generated from persistent topological invariants for the quantitative modeling of protein flexibility. To this end, a correlation matrix-based filtration is developed. This approach gives rise to an accurate prediction of the optimal characteristic distance used in protein B-factor analysis. Finally, MTFs are employed to characterize protein topological evolution during protein folding and quantitatively predict the protein folding stability. An excellent consistence between our persistent homology prediction and molecular dynamics simulation is found. This work reveals the topology–function relationship of proteins. Copyright © 2014 John Wiley & Sons, Ltd.
  106. Chatter Detection in Turning Using Persistent Homology (2016)

    Firas A. Khasawneh, Elizabeth Munch
    Abstract This paper describes a new approach for ascertaining the stability of stochastic dynamical systems in their parameter space by examining their time series using topological data analysis (TDA). We illustrate the approach using a nonlinear delayed model that describes the tool oscillations due to self-excited vibrations in turning. Each time series is generated using the Euler-Maruyama method and a corresponding point cloud is obtained using the Takens embedding. The point cloud can then be analyzed using a tool from TDA known as persistent homology. The results of this study show that the described approach can be used for analyzing datasets of delay dynamical systems generated both from numerical simulation and experimental data. The contributions of this paper include presenting for the first time a topological approach for investigating the stability of a class of nonlinear stochastic delay equations, and introducing a new application of TDA to machining processes.
  107. Using Persistent Homology as Preprocessing of Early Warning Signals for Critical Transition in Flood (2021)

    Syed Mohamad Sadiq Syed Musa, Mohd Salmi Md Noorani, Fatimah Abdul Razak, Munira Ismail, Mohd Almie Alias, Saiful Izzuan Hussain
    Abstract Flood early warning systems (FLEWSs) contribute remarkably to reducing economic and life losses during a flood. The theory of critical slowing down (CSD) has been successfully used as a generic indicator of early warning signals in various fields. A new tool called persistent homology (PH) was recently introduced for data analysis. PH employs a qualitative approach to assess a data set and provide new information on the topological features of the data set. In the present paper, we propose the use of PH as a preprocessing step to achieve a FLEWS through CSD. We test our proposal on water level data of the Kelantan River, which tends to flood nearly every year. The results suggest that the new information obtained by PH exhibits CSD and, therefore, can be used as a signal for a FLEWS. Further analysis of the signal, we manage to establish an early warning signal for ten of the twelve flood events recorded in the river; the two other events are detected on the first day of the flood. Finally, we compare our results with those of a FLEWS constructed directly from water level data and find that FLEWS via PH creates fewer false alarms than the conventional technique.
  108. Construction of Symbolic Dynamics From Experimental Time Series (1999)

    K. Mischaikow, M. Mrozek, J. Reiss, A. Szymczak
    Abstract Symbolic dynamics play a central role in the description of the evolution of nonlinear systems. Yet there are few methods for determining symbolic dynamics of chaotic data. One difficulty is that the data contains random fluctuations associated with the experimental process. Using data obtained from a magnetoelastic ribbon experiment we show how a topological approach that allows for experimental error and bounded noise can be used to obtain a description of the dynamics in terms of subshift dynamics on a finite set of symbols.
  109. Topographical Transcriptome Mapping of the Mouse Medial Ganglionic Eminence by Spatially Resolved RNA-seq (2014)

    Sabrina Zechel, Pawel Zajac, Peter Lönnerberg, Carlos F. Ibáñez, Sten Linnarsson
    Abstract Cortical interneurons originating from the medial ganglionic eminence, MGE, are among the most diverse cells within the CNS. Different pools of proliferating progenitor cells are thought to exist in the ventricular zone of the MGE, but whether the underlying subventricular and mantle regions of the MGE are spatially patterned has not yet been addressed. Here, we combined laser-capture microdissection and multiplex RNA-sequencing to map the transcriptome of MGE cells at a spatial resolution of 50 μm.
  110. The Persistence of Large Scale Structures I: Primordial Non-Gaussianity (2020)

    Matteo Biagetti, Alex Cole, Gary Shiu
    Abstract We develop an analysis pipeline for characterizing the topology of large scale structure and extracting cosmological constraints based on persistent homology. Persistent homology is a technique from topological data analysis that quantifies the multiscale topology of a data set, in our context unifying the contributions of clusters, filament loops, and cosmic voids to cosmological constraints. We describe how this method captures the imprint of primordial local non-Gaussianity on the late-time distribution of dark matter halos, using a set of N-body simulations as a proxy for real data analysis. For our best single statistic, running the pipeline on several cubic volumes of size \$40~(\rm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes. Additionally we test our ability to resolve degeneracies between the topological signature of \$f_\\rm NL\\textasciicircum\\rm loc\\$ and variation of \$\sigma_8\$ and argue that correctly identifying nonzero \$f_\\rm NL\\textasciicircum\\rm loc\\$ in this case is possible via an optimal template method. Our method relies on information living at \$\mathcal\O\(10)\$ Mpc/h, a complementary scale with respect to commonly used methods such as the scale-dependent bias in the halo/galaxy power spectrum. Therefore, while still requiring a large volume, our method does not require sampling long-wavelength modes to constrain primordial non-Gaussianity. Moreover, our statistics are interpretable: we are able to reproduce previous results in certain limits and we make new predictions for unexplored observables, such as filament loops formed by dark matter halos in a simulation box.
  111. A Persistent Weisfeiler-Lehman Procedure for Graph Classification (2019)

    Bastian Rieck, Christian Bock, Karsten Borgwardt
    Abstract The Weisfeiler–Lehman graph kernel exhibits competitive performance in many graph classification tasks. However, its subtree features are not able to capture connected components and cycles, topological features known for characterising graphs. To extract such features, we leverage propagated node label information and transform unweighted graphs into metric ones. This permits us to augment the subtree features with topological information obtained using persistent homology, a concept from topological data analysis. Our method, which we formalise as a generalisation of Weisfeiler–Lehman subtree features, exhibits favourable classification accuracy and its improvements in predictive performance are mainly driven by including cycle information.
  112. Topological Data Analysis Quantifies Biological Nano-Structure From Single Molecule Localization Microscopy (2020)

    Jeremy A. Pike, Abdullah O. Khan, Chiara Pallini, Steven G. Thomas, Markus Mund, Jonas Ries, Natalie S. Poulter, Iain B. Styles
    Abstract AbstractMotivation. Localization microscopy data is represented by a set of spatial coordinates, each corresponding to a single detection, that form a point cl
  113. Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology (2021)

    David Pérez Fernández, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas
    Abstract Artificial Neural Networks (ANNs) are widely used for approximating complex functions. The process that is usually followed to define the most appropriate architecture for an ANN given a specific function is mostly empirical. Once this architecture has been defined, weights are usually optimized according to the error function. On the other hand, we observe that ANNs can be represented as graphs and their topological 'fingerprints' can be obtained using Persistent Homology (PH). In this paper, we describe a proposal focused on designing more principled architecture search procedures. To do this, different architectures for solving problems related to a heterogeneous set of datasets have been analyzed. The results of the evaluation corroborate that PH effectively characterizes the ANN invariants: when ANN density (layers and neurons) or sample feeding order is the only difference, PH topological invariants appear; in the opposite direction in different sub-problems (i.e. different labels), PH varies. This approach based on topological analysis helps towards the goal of designing more principled architecture search procedures and having a better understanding of ANNs.
  114. Revisiting Abnormalities in Brain Network Architecture Underlying Autism Using Topology-Inspired Statistical Inference (2018)

    Sourabh Palande, Vipin Jose, Brandon Zielinski, Jeffrey Anderson, P. Thomas Fletcher, Bei Wang
    Abstract A large body of evidence relates autism with abnormal structural and functional brain connectivity. Structural covariance magnetic resonance imaging (scMRI) is a technique that maps brain regions with covarying gray matter densities across subjects. It provides a way to probe the anatomical structure underlying intrinsic connectivity networks (ICNs) through analysis of gray matter signal covariance. In this article, we apply topological data analysis in conjunction with scMRI to explore network-specific differences in the gray matter structure in subjects with autism versus age-, gender-, and IQ-matched controls. Specifically, we investigate topological differences in gray matter structure captured by structural correlation graphs derived from three ICNs strongly implicated in autism, namely the salience network, default mode network, and executive control network. By combining topological data analysis with statistical inference, our results provide evidence of statistically significant network-specific structural abnormalities in autism.
  115. Persistence-Based Pooling for Shape Pose Recognition (2016)

    Thomas Bonis, Maks Ovsjanikov, Steve Oudot, Frédéric Chazal
    Abstract In this paper, we propose a novel pooling approach for shape classification and recognition using the bag-of-words pipeline, based on topological persistence, a recent tool from Topological Data Analysis. Our technique extends the standard max-pooling, which summarizes the distribution of a visual feature with a single number, thereby losing any notion of spatiality. Instead, we propose to use topological persistence, and the derived persistence diagrams, to provide significantly more informative and spatially sensitive characterizations of the feature functions, which can lead to better recognition performance. Unfortunately, despite their conceptual appeal, persistence diagrams are difficult to handle, since they are not naturally represented as vectors in Euclidean space and even the standard metric, the bottleneck distance is not easy to compute. Furthermore, classical distances between diagrams, such as the bottleneck and Wasserstein distances, do not allow to build positive definite kernels that can be used for learning. To handle this issue, we provide a novel way to transform persistence diagrams into vectors, in which comparisons are trivial. Finally, we demonstrate the performance of our construction on the Non-Rigid 3D Human Models SHREC 2014 dataset, where we show that topological pooling can provide significant improvements over the standard pooling methods for the shape pose recognition within the bag-of-words pipeline.
  116. Reconceiving the Hippocampal Map as a Topological Template (2014)

    Yuri Dabaghian, Vicky L. Brandt, Loren M. Frank
    Abstract The role of the hippocampus in spatial cognition is incontrovertible yet controversial. Place cells, initially thought to be location-specifiers, turn out to respond promiscuously to a wide range of stimuli. Here we test the idea, which we have recently demonstrated in a computational model, that the hippocampal place cells may ultimately be interested in a space's topological qualities (its connectivity) more than its geometry (distances and angles); such higher-order functioning would be more consistent with other known hippocampal functions. We recorded place cell activity in rats exploring morphing linear tracks that allowed us to dissociate the geometry of the track from its topology. The resulting place fields preserved the relative sequence of places visited along the track but did not vary with the metrical features of the track or the direction of the rat's movement. These results suggest a reinterpretation of previous studies and new directions for future experiments.
  117. Improved Understanding of Aqueous Solubility Modeling Through Topological Data Analysis (2018)

    Mariam Pirashvili, Lee Steinberg, Francisco Belchi Guillamon, Mahesan Niranjan, Jeremy G. Frey, Jacek Brodzki
    Abstract Topological data analysis is a family of recent mathematical techniques seeking to understand the ‘shape’ of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.
  118. Molecular Phenotyping Using Networks, Diffusion, and Topology: Soft Tissue Sarcoma (2019)

    James C. Mathews, Maryam Pouryahya, Caroline Moosmüller, Yannis G. Kevrekidis, Joseph O. Deasy, Allen Tannenbaum
    Abstract Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
  119. Simplicial Neural Networks (2020)

    Stefania Ebli, Michaël Defferrard, Gard Spreemann
    Abstract We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices - allowing us to consider richer data, including vector fields and \$n\$-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes.
  120. Topological Data Analysis of Single-Cell Hi-C Contact Maps (2020)

    Mathieu Carrière, Raúl Rabadán
    Abstract Due to recent breakthroughs in high-throughput sequencing, it is now possible to use chromosome conformation capture (CCC) to understand the three dimensional conformation of DNA at the whole genome level, and to characterize it with the so-called contact maps. This is very useful since many biological processes are correlated with DNA folding, such as DNA transcription. However, the methods for the analysis of such conformations are still lacking mathematical guarantees and statistical power. To handle this issue, we propose to use the Mapper, which is a standard tool of Topological Data Analysis (TDA) that allows one to efficiently encode the inherent continuity and topology of underlying biological processes in data, in the form of a graph with various features such as branches and loops. In this article, we show how recent statistical techniques developed in TDA for the Mapper algorithm can be extended and leveraged to formally define and statistically quantify the presence of topological structures coming from biological phenomena, such as the cell cyle, in datasets of CCC contact maps.
  121. The Importance of Forgetting: Limiting Memory Improves Recovery of Topological Characteristics From Neural Data (2018)

    Samir Chowdhury, Bowen Dai, Facundo Mémoli
    Abstract We develop of a line of work initiated by Curto and Itskov towards understanding the amount of information contained in the spike trains of hippocampal place cells via topology considerations. Previously, it was established that simply knowing which groups of place cells fire together in an animal’s hippocampus is sufficient to extract the global topology of the animal’s physical environment. We model a system where collections of place cells group and ungroup according to short-term plasticity rules. In particular, we obtain the surprising result that in experiments with spurious firing, the accuracy of the extracted topological information decreases with the persistence (beyond a certain regime) of the cell groups. This suggests that synaptic transience, or forgetting, is a mechanism by which the brain counteracts the effects of spurious place cell activity.
  122. Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

    Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny
    Abstract While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.
  123. Morphometrics Reveals Complex and Heritable Apple Leaf Shapes (2018)

    Zoë Migicovsky, Mao Li, Daniel H. Chitwood, Sean Myles
    Abstract Apple (Malus spp.) is a widely grown and valuable fruit crop. Leaf shape is important for flowering in apple and may also be an early indicator for other agriculturally valuable traits. We examined 9,000 leaves from 869 unique apple accessions using linear measurements and comprehensive morphometric techniques. We identified allometric variation as the result of differing length-to-width aspect ratios between accessions and species of apple. The allometric variation was due to variation in the width of the leaf blade, not the length. Aspect ratio was highly correlated with the first principal component (PC1) of morphometric variation quantified using elliptical Fourier descriptors (EFDs) and persistent homology (PH). While the primary source of variation was aspect ratio, subsequent PCs corresponded to complex shape variation not captured by linear measurements. After linking the morphometric information with over 122,000 genome-wide single nucleotide polymorphisms (SNPs), we found high SNP heritability values even at later PCs, indicating that comprehensive morphometrics can capture complex, heritable phenotypes. Thus, techniques such as EFDs and PH are capturing heritable biological variation that would be missed using linear measurements alone.
  124. Quantitative and Interpretable Order Parameters for Phase Transitions From Persistent Homology (2020)

    Alex Cole, Gregory J. Loges, Gary Shiu
    Abstract We apply modern methods in computational topology to the task of discovering and characterizing phase transitions. As illustrations, we apply our method to four two-dimensional lattice spin models: the Ising, square ice, XY, and fully-frustrated XY models. In particular, we use persistent homology, which computes the births and deaths of individual topological features as a coarse-graining scale or sublevel threshold is increased, to summarize multiscale and high-point correlations in a spin configuration. We employ vector representations of this information called persistence images to formulate and perform the statistical task of distinguishing phases. For the models we consider, a simple logistic regression on these images is sufficient to identify the phase transition. Interpretable order parameters are then read from the weights of the regression. This method suffices to identify magnetization, frustration, and vortex-antivortex structure as relevant features for phase transitions in our models. We also define "persistence" critical exponents and study how they are related to those critical exponents usually considered.
  125. Investigation of Flash Crash via Topological Data Analysis (2020)

    Wonse Kim, Younng-Jin Kim, Gihyun Lee, Woong Kook
    Abstract Topological data analysis has been acknowledged as one of the most successful mathematical data analytic methodologies in various fields including medicine, genetics, and image analysis. In this paper, we explore the potential of this methodology in finance by applying persistence landscape and dynamic time series analysis to analyze an extreme event in the stock market, known as Flash Crash. We will provide results of our empirical investigation to confirm the effectiveness of our new method not only for the characterization of this extreme event but also for its prediction purposes.
  126. When Remote Sensing Meets Topological Data Analysis (2018)

    Ludovic Duponchel
    Abstract Author Summary: Hyperspectral remote sensing plays an increasingly important role in many scientific domains and everyday life problems. Indeed, this imaging concept ends up in applications as varied as catching tax-evaders red-handed by locating new construction and building alterations, searching for aircraft and saving lives after fatal crashes, detecting oil spills for marine life and environmental preservation, spying on enemies with reconnaissance satellites, watching algae grow as an indicator of environmental health, forecasting weather to warn about natural disasters and much more. From an instrumental point of view, we can say that the actual spectrometers have rather good characteristics, even if we can always increase spatial resolution and spectral range. In order to extract ever more information from such experiments and develop new applications, we must, therefore, propose multivariate data analysis tools able to capture the shape of data sets and their specific features. Nevertheless, actual methods often impose a data model which implicitly defines the geometry of the data set. The aim of the paper is thus to introduce the concept of topological data analysis in the framework of remote sensing, making no assumptions about the global shape of the data set, but also allowing the capture of its local features.
  127. Topological Data Analysis of Contagion Maps for Examining Spreading Processes on Networks (2015)

    Dane Taylor, Florian Klimm, Heather A. Harrington, Miroslav Kramár, Konstantin Mischaikow, Mason A. Porter, Peter J. Mucha
    Abstract Social and biological contagions are influenced by the spatial embeddedness of networks. Historically, many epidemics spread as a wave across part of the Earth’s surface; however, in modern contagions long-range edges—for example, due to airline transportation or communication media—allow clusters of a contagion to appear in distant locations. Here we study the spread of contagions on networks through a methodology grounded in topological data analysis and nonlinear dimension reduction. We construct ‘contagion maps’ that use multiple contagions on a network to map the nodes as a point cloud. By analysing the topology, geometry and dimensionality of manifold structure in such point clouds, we reveal insights to aid in the modelling, forecast and control of spreading processes. Our approach highlights contagion maps also as a viable tool for inferring low-dimensional structure in networks.
  128. Clique Topology Reveals Intrinsic Geometric Structure in Neural Correlations (2015)

    Chad Giusti, Eva Pastalkova, Carina Curto, Vladimir Itskov
    Abstract Detecting structure in neural activity is critical for understanding the function of neural circuits. The coding properties of neurons are typically investigated by correlating their responses to external stimuli. It is not clear, however, if the structure of neural activity can be inferred intrinsically, without a priori knowledge of the relevant stimuli. We introduce a novel method, called clique topology, that detects intrinsic structure in neural activity that is invariant under nonlinear monotone transformations. Using pairwise correlations of neurons in the hippocampus, we demonstrate that our method is capable of detecting geometric structure from neural activity alone, without appealing to external stimuli or receptive fields.Detecting meaningful structure in neural activity and connectivity data is challenging in the presence of hidden nonlinearities, where traditional eigenvalue-based methods may be misleading. We introduce a novel approach to matrix analysis, called clique topology, that extracts features of the data invariant under nonlinear monotone transformations. These features can be used to detect both random and geometric structure, and depend only on the relative ordering of matrix entries. We then analyzed the activity of pyramidal neurons in rat hippocampus, recorded while the animal was exploring a 2D environment, and confirmed that our method is able to detect geometric organization using only the intrinsic pattern of neural correlations. Remarkably, we found similar results during nonspatial behaviors such as wheel running and rapid eye movement (REM) sleep. This suggests that the geometric structure of correlations is shaped by the underlying hippocampal circuits and is not merely a consequence of position coding. We propose that clique topology is a powerful new tool for matrix analysis in biological settings, where the relationship of observed quantities to more meaningful variables is often nonlinear and unknown.
  129. Hyperparameter Optimization of Topological Features for Machine Learning Applications (2019)

    Francis Motta, Christopher Tralie, Rossella Bedini, Fabiano Bini, Gilberto Bini, Hamed Eramian, Marcio Gameiro, Steve Haase, Hugh Haddox, John Harer, Nick Leiby, Franco Marinozzi, Scott Novotney, Gabe Rocklin, Jed Singer, Devin Strickland, Matt Vaughn
    Abstract This paper describes a general pipeline for generating optimal vector representations of topological features of data for use with machine learning algorithms. This pipeline can be viewed as a costly black-box function defined over a complex configuration space, each point of which specifies both how features are generated and how predictive models are trained on those features. We propose using state-of-the-art Bayesian optimization algorithms to inform the choice of topological vectorization hyperparameters while simultaneously choosing learning model parameters. We demonstrate the need for and effectiveness of this pipeline using two difficult biological learning problems, and illustrate the nontrivial interactions between topological feature generation and learning model hyperparameters.
  130. Topological Eulerian Synthesis of Slow Motion Periodic Videos (2018)

    Christopher Tralie, Matthew Berger
    Abstract We consider the problem of taking a video that is comprised of multiple periods of repetitive motion, and reordering the frames of the video into a single period, producing a detailed, single cycle video of motion. This problem is challenging, as such videos often contain noise, drift due to camera motion and from cycle to cycle, and irrelevant background motion/occlusions, and these factors can confound the relevant periodic motion we seek in the video. To address these issues in a simple and efficient manner, we introduce a tracking free Eulerian approach for synthesizing a single cycle of motion. Our approach is geometric: we treat each frame as a point in high-dimensional Euclidean space, and analyze the sliding window embedding formed by this sequence of points, which yields samples along a topological loop regardless of the type of periodic motion. We combine tools from topological data analysis and spectral geometric analysis to estimate the phase of each window, and we exploit the sliding window structure to robustly reorder frames. We show quantitative results that highlight the robustness of our technique to camera shake, noise, and occlusions, and qualitative results of single-cycle motion synthesis across a variety of scenarios.
  131. A Mayer–Vietoris Formula for Persistent Homology With an Application to Shape Recognition in the Presence of Occlusions (2011)

    Barbara Di Fabio, Claudia Landi
    Abstract In algebraic topology it is well known that, using the Mayer–Vietoris sequence, the homology of a space X can be studied by splitting X into subspaces A and B and computing the homology of A, B, and A∩B. A natural question is: To what extent does persistent homology benefit from a similar property? In this paper we show that persistent homology has a Mayer–Vietoris sequence that is generally not exact but only of order 2. However, we obtain a Mayer–Vietoris formula involving the ranks of the persistent homology groups of X, A, B, and A∩B plus three extra terms. This implies that persistent homological features of A and B can be found either as persistent homological features of X or of A∩B. As an application of this result, we show that persistence diagrams are able to recognize an occluded shape by showing a common subset of points.
  132. Unexpected Topology of the Temperature Fluctuations in the Cosmic Microwave Background (2019)

    Pratyush Pranav, Robert J. Adler, Thomas Buchert, Herbert Edelsbrunner, Bernard J. T. Jones, Armin Schwartzman, Hubert Wagner, Rien van de Weygaert
    Abstract We study the topology generated by the temperature fluctuations of the cosmic microwave background (CMB) radiation, as quantified by the number of components and holes, formally given by the Betti numbers, in the growing excursion sets. We compare CMB maps observed by the \textlessi\textgreaterPlanck\textlessi/\textgreater satellite with a thousand simulated maps generated according to the ΛCDM paradigm with Gaussian distributed fluctuations. The comparison is multi-scale, being performed on a sequence of degraded maps with mean pixel separation ranging from 0.05 to 7.33°. The survey of the CMB over 𝕊\textlesssup\textgreater2\textlesssup/\textgreater is incomplete due to obfuscation effects by bright point sources and other extended foreground objects like our own galaxy. To deal with such situations, where analysis in the presence of “masks” is of importance, we introduce the concept of relative homology. The parametric \textlessi\textgreaterχ\textlessi/\textgreater\textlesssup\textgreater2\textlesssup/\textgreater-test shows differences between observations and simulations, yielding \textlessi\textgreaterp\textlessi/\textgreater-values at percent to less than permil levels roughly between 2 and 7°, with the difference in the number of components and holes peaking at more than 3\textlessi\textgreaterσ\textlessi/\textgreater sporadically at these scales. The highest observed deviation between the observations and simulations for \textlessi\textgreaterb\textlessi/\textgreater\textlesssub\textgreater0\textlesssub/\textgreater and \textlessi\textgreaterb\textlessi/\textgreater\textlesssub\textgreater1\textlesssub/\textgreater is approximately between 3\textlessi\textgreaterσ\textlessi/\textgreater and 4\textlessi\textgreaterσ\textlessi/\textgreater at scales of 3–7°. There are reports of mildly unusual behaviour of the Euler characteristic at 3.66° in the literature, computed from independent measurements of the CMB temperature fluctuations by \textlessi\textgreaterPlanck\textlessi/\textgreater’s predecessor, the \textlessi\textgreaterWilkinson\textlessi/\textgreater Microwave Anisotropy Probe (WMAP) satellite. The mildly anomalous behaviour of the Euler characteristic is phenomenologically related to the strongly anomalous behaviour of components and holes, or the zeroth and first Betti numbers, respectively. Further, since these topological descriptors show consistent anomalous behaviour over independent measurements of \textlessi\textgreaterPlanck\textlessi/\textgreater and WMAP, instrumental and systematic errors may be an unlikely source. These are also the scales at which the observed maps exhibit low variance compared to the simulations, and approximately the range of scales at which the power spectrum exhibits a dip with respect to the theoretical model. Non-parametric tests show even stronger differences at almost all scales. Crucially, Gaussian simulations based on power-spectrum matching the characteristics of the observed dipped power spectrum are not able to resolve the anomaly. Understanding the origin of the anomalies in the CMB, whether cosmological in nature or arising due to late-time effects, is an extremely challenging task. Regardless, beyond the trivial possibility that this may still be a manifestation of an extreme Gaussian case, these observations, along with the super-horizon scales involved, may motivate the study of primordial non-Gaussianity. Alternative scenarios worth exploring may be models with non-trivial topology, including topological defect models.
  133. Optimal Topological Cycles and Their Application in Cardiac Trabeculae Restoration (2017)

    Pengxiang Wu, Chao Chen, Yusu Wang, Shaoting Zhang, Changhe Yuan, Zhen Qian, Dimitris Metaxas, Leon Axel
    Abstract In cardiac image analysis, it is important yet challenging to reconstruct the trabeculae, namely, fine muscle columns whose ends are attached to the ventricular walls. To extract these fine structures, traditional image segmentation methods are insufficient. In this paper, we propose a novel method to jointly detect salient topological handles and compute the optimal representations of them. The detected handles are considered hypothetical trabeculae structures. They are further screened using a classifier and are then included in the final segmentation. We show in experiments the significance of our contribution compared with previous standard segmentation methods without topological priors, as well as with previous topological method in which non-optimal representations of topological handles are used.
  134. Topological Echoes of Primordial Physics in the Universe at Large Scales (2020)

    Alex Cole, Matteo Biagetti, Gary Shiu
    Abstract We present a pipeline for characterizing and constraining initial conditions in cosmology via persistent homology. The cosmological observable of interest is the cosmic web of large scale structure, and the initial conditions in question are non-Gaussianities (NG) of primordial density perturbations. We compute persistence diagrams and derived statistics for simulations of dark matter halos with Gaussian and non-Gaussian initial conditions. For computational reasons and to make contact with experimental observations, our pipeline computes persistence in sub-boxes of full simulations and simulations are subsampled to uniform halo number. We use simulations with large NG (\$f_\\rm NL\\textasciicircum\\rm loc\=250\$) as templates for identifying data with mild NG (\$f_\\rm NL\\textasciicircum\\rm loc\=10\$), and running the pipeline on several cubic volumes of size \$40~(\textrm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes for our best single statistic. Throughout we benefit from the interpretability of topological features as input for statistical inference, which allows us to make contact with previous first-principles calculations and make new predictions.
  135. The Persistent Homology Mathematical Framework Provides Enhanced Genotype-to-Phenotype Associations for Plant Morphology (2018)

    Mao Li, Margaret H. Frank, Viktoriya Coneva, Washington Mio, Daniel H. Chitwood, Christopher N. Topp
    Abstract Efforts to understand the genetic and environmental conditioning of plant morphology are hindered by the lack of flexible and effective tools for quantifying morphology. Here, we demonstrate that persistent-homology-based topological methods can improve measurement of variation in leaf shape, serrations, and root architecture. We apply these methods to 2D images of leaves and root systems in field-grown plants of a domesticated introgression line population of tomato (Solanum pennellii). We find that compared with some commonly used conventional traits, (1) persistent-homology-based methods can more comprehensively capture morphological variation; (2) these techniques discriminate between genotypes with a larger normalized effect size and detect a greater number of unique quantitative trait loci (QTLs); (3) multivariate traits, whether statistically derived from univariate or persistent-homology-based traits, improve our ability to understand the genetic basis of phenotype; and (4) persistent-homology-based techniques detect unique QTLs compared to conventional traits or their multivariate derivatives, indicating that previously unmeasured aspects of morphology are now detectable. The QTL results further imply that genetic contributions to morphology can affect both the shoot and root, revealing a pleiotropic basis to natural variation in tomato. Persistent homology is a versatile framework to quantify plant morphology and developmental processes that complements and extends existing methods.
  136. Unsupervised Topological Learning for Identification of Atomic Structures (2022)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract We propose an unsupervised learning methodology with descriptors based on topological data analysis (TDA) concepts to describe the local structural properties of materials at the atomic scale. Based only on atomic positions and without a priori knowledge, our method allows for an autonomous identification of clusters of atomic structures through a Gaussian mixture model. We apply successfully this approach to the analysis of elemental Zr in the crystalline and liquid states as well as homogeneous nucleation events under deep undercooling conditions. This opens the way to deeper and autonomous study of complex phenomena in materials at the atomic scale.
  137. A Multi-Parameter Persistence Framework for Mathematical Morphology (2021)

    Yu-Min Chung, Sarah Day, Chuan-Shen Hu
    Abstract The field of mathematical morphology offers well-studied techniques for image processing. In this work, we view morphological operations through the lens of persistent homology, a tool at the heart of the field of topological data analysis. We demonstrate that morphological operations naturally form a multiparameter filtration and that persistent homology can then be used to extract information about both topology and geometry in the images as well as to automate methods for optimizing the study and rendering of structure in images. For illustration, we apply this framework to analyze noisy binary, grayscale, and color images.
  138. Persistent Homology Analysis of Brain Artery Trees (2016)

    Paul Bendich, J. S. Marron, Ezra Miller, Alex Pieloch, Sean Skwerer
    Abstract New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries.
  139. Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

    Mehmet Emin Aktas, Esra Akbas
    Abstract Due to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach.
  140. The Geometry of Synchronization Problems and Learning Group Actions (2019)

    Tingran Gao, Jacek Brodzki, Sayan Mukherjee
    Abstract We develop a geometric framework, based on the classical theory of fibre bundles, to characterize the cohomological nature of a large class of synchronization-type problems in the context of graph inference and combinatorial optimization. We identify each synchronization problem in topological group G on connected graph ΓΓ\Gamma with a flat principal G-bundle over ΓΓ\Gamma , thus establishing a classification result for synchronization problems using the representation variety of the fundamental group of ΓΓ\Gamma into G. We then develop a twisted Hodge theory on flat vector bundles associated with these flat principal G-bundles, and provide a geometric realization of the graph connection Laplacian as the lowest-degree Hodge Laplacian in the twisted de Rham–Hodge cochain complex. Motivated by these geometric intuitions, we propose to study the problem of learning group actions—partitioning a collection of objects based on the local synchronizability of pairwise correspondence relations—and provide a heuristic synchronization-based algorithm for solving this type of problems. We demonstrate the efficacy of this algorithm on simulated and real datasets.
  141. Toward Automated Prediction of Manufacturing Productivity Based on Feature Selection Using Topological Data Analysis (2016)

    Wei Guo, Ashis G. Banerjee
    Abstract In this paper, we extend the application of topological data analysis (TDA) to the field of manufacturing for the first time to the best of our knowledge. We apply a particular TDA method, known as the Mapper algorithm, on a benchmark chemical processing data set. The algorithm yields a topological network that captures the intrinsic clusters and connections among the clusters present in the high-dimensional data set, which are difficult to detect using traditional methods. We select key process variables or features that impact the final product yield by analyzing the shape of this network. We then use three prediction models to evaluate the impact of the selected features. Results show that the models achieve the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.
  142. Cybersecurity Challenges in Downstream Steel Production Processes (2022)

    Joaquín Ordieres-Meré, Andreas Wolff, Antonia Pacios-Álvarez, Antonio Bello-García
    Abstract The goal of this paper is to explore proposals coming from different EU-RFCS research funded projects, in such a way that cybersecurity inside the steel industry can be increased from the Operational Technology area, with the current level of adopted Information Technology solutions. The dissemination project Control In Steel has reviewed different projects with different strategies, including ideas to be developed inside the Auto Surveillance project. An advanced control process strategy is considered and cloud based solutions are the main analysed alternatives. The different steps in the model lifecycle are considered where different cloud configurations provide different solutions. Advanced techniques such as UMAP projection are proposed to be used as detectors for anomalous behaviour in the continuous development / continuous implementation strategy, suitable for integration in processing workflows
  143. Can Neural Networks Learn Persistent Homology Features? (2020)

    Guido Montúfar, Nina Otter, Yuguang Wang
    Abstract Topological data analysis uses tools from topology -- the mathematical area that studies shapes -- to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.
  144. Learning Representations of Persistence Barcodes (2019)

    Christoph D. Hofer, Roland Kwitt, Marc Niethammer
    Abstract We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals.
  145. Topological Data Analysis of Collective and Individual Epithelial Cells Using Persistent Homology of Loops (2021)

    Dhananjay Bhaskar, William Y. Zhang, Ian Y. Wong
    Abstract Interacting, self-propelled particles such as epithelial cells can dynamically self-organize into complex multicellular patterns, which are challenging to classify without a priori information. Classically, different phases and phase transitions have been described based on local ordering, which may not capture structural features at larger length scales. Instead, topological data analysis (TDA) determines the stability of spatial connectivity at varying length scales (i.e. persistent homology), and can compare different particle configurations based on the “cost” of reorganizing one configuration into another. Here, we demonstrate a topology-based machine learning approach for unsupervised profiling of individual and collective phases based on large-scale loops. We show that these topological loops (i.e. dimension 1 homology) are robust to variations in particle number and density, particularly in comparison to connected components (i.e. dimension 0 homology). We use TDA to map out phase diagrams for simulated particles with varying adhesion and propulsion, at constant population size as well as when proliferation is permitted. Next, we use this approach to profile our recent experiments on the clustering of epithelial cells in varying growth factor conditions, which are compared to our simulations. Finally, we characterize the robustness of this approach at varying length scales, with sparse sampling, and over time. Overall, we envision TDA will be broadly applicable as a model-agnostic approach to analyze active systems with varying population size, from cytoskeletal motors to motile cells to flocking or swarming animals.
  146. Congestion Barcodes: Exploring the Topology of Urban Congestion Using Persistent Homology (2017)

    Yu Wu, Gabriel Shindnes, Vaibhav Karve, Derrek Yager, Daniel B. Work, Arnab Chakraborty, Richard B. Sowers
    Abstract This work presents a new method to quantify connectivity in transportation networks. Inspired by the field of topological data analysis, we propose a novel approach to explore the robustness of road network connectivity in the presence of congestion on the roadway. The robustness of the pattern is summarized in a congestion barcode, which can be constructed directly from traffic datasets commonly used for navigation. As an initial demonstration, we illustrate the main technique on a publicly available traffic dataset in a neighborhood in New York City.
  147. Topology Identifies Emerging Adaptive Mutations in SARS-CoV-2 (2021)

    Michael Bleher, Lukas Hahn, Juan Angel Patino-Galindo, Mathieu Carriere, Ulrich Bauer, Raul Rabadan, Andreas Ott
    Abstract The COVID-19 pandemic has lead to a worldwide effort to characterize its evolution through the mapping of mutations in the genome of the coronavirus SARS-CoV-2. Ideally, one would like to quickly identify new mutations that could confer adaptive advantages (e.g. higher infectivity or immune evasion) by leveraging the large number of genomes. One way of identifying adaptive mutations is by looking at convergent mutations, mutations in the same genomic position that occur independently. However, the large number of currently available genomes precludes the efficient use of phylogeny-based techniques. Here, we establish a fast and scalable Topological Data Analysis approach for the early warning and surveillance of emerging adaptive mutations based on persistent homology. It identifies convergent events merely by their topological footprint and thus overcomes limitations of current phylogenetic inference techniques. This allows for an unbiased and rapid analysis of large viral datasets. We introduce a new topological measure for convergent evolution and apply it to the GISAID dataset as of February 2021, comprising 303,651 high-quality SARS-CoV-2 isolates collected since the beginning of the pandemic. We find that topologically salient mutations on the receptor-binding domain appear in several variants of concern and are linked with an increase in infectivity and immune escape, and for many adaptive mutations the topological signal precedes an increase in prevalence. We show that our method effectively identifies emerging adaptive mutations at an early stage. By localizing topological signals in the dataset, we extract geo-temporal information about the early occurrence of emerging adaptive mutations. The identification of these mutations can help to develop an alert system to monitor mutations of concern and guide experimentalists to focus the study of specific circulating variants.
  148. Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome (2014)

    David Romano, Monica Nicolau, Eve-Marie Quintin, Paul K. Mazaika, Amy A. Lightbody, Heather Cody Hazlett, Joseph Piven, Gunnar Carlsson, Allan L. Reiss
    Abstract Fragile X syndrome (FXS), due to mutations of the FMR1 gene, is the most common known inherited cause of developmental disability as well as the most common single-gene risk factor for autism. Our goal was to examine variation in brain structure in FXS with topological data analysis (TDA), and to assess how such variation is associated with measures of IQ and autism-related behaviors. To this end, we analyzed imaging and behavioral data from young boys (n = 52; aged 1.57–4.15 years) diagnosed with FXS. Application of topological methods to structural MRI data revealed two large subgroups within the study population. Comparison of these subgroups showed significant between-subgroup neuroanatomical differences similar to those previously reported to distinguish children with FXS from typically developing controls (e.g., enlarged caudate). In addition to neuroanatomy, the groups showed significant differences in IQ and autism severity scores. These results suggest that despite arising from a single gene mutation, FXS may encompass two biologically, and clinically separable phenotypes. In addition, these findings underscore the potential of TDA as a powerful tool in the search for biological phenotypes of neuropsychiatric disorders. Hum Brain Mapp 35:4904–4915, 2014. © 2014 Wiley Periodicals, Inc.
  149. Topological Data Analysis for Aviation Applications (2019)

    Max Z. Li, Megan S. Ryerson, Hamsa Balakrishnan
    Abstract Aviation data sets are increasingly high-dimensional and sparse. Consequently, the underlying features and interactions are not easily uncovered by traditional data analysis methods. Recent advancements in applied mathematics introduce topological methods, offering a new approach to obtain these features. This paper applies the fundamental notions underlying topological data analysis and persistent homology (TDA/PH) to aviation data analytics. We review past aviation research that leverage topological methods, and present a new computational case study exploring the topology of airport surface connectivity. In each case, we connect abstract topological features with real-world processes in aviation, and highlight potential operational and managerial insights.
  150. Complexes of Tournaments, Directionality Filtrations and Persistent Homology (2020)

    Dejan Govc, Ran Levi, Jason P. Smith
    Abstract Complete digraphs are referred to in the combinatorics literature as tournaments. We consider a family of semi-simplicial complexes, that we refer to as "tournaplexes", whose simplices are tournaments. In particular, given a digraph \$\mathcal\G\\$, we associate with it a "flag tournaplex" which is a tournaplex containing the directed flag complex of \$\mathcal\G\\$, but also the geometric realisation of cliques that are not directed. We define several types of filtrations on tournaplexes, and exploiting persistent homology, we observe that flag tournaplexes provide finer means of distinguishing graph dynamics than the directed flag complex. We then demonstrate the power of these ideas by applying them to graph data arising from the Blue Brain Project's digital reconstruction of a rat's neocortex.
  151. Coexistence Holes Characterize the Assembly and Disassembly of Multispecies Systems (2021)

    Marco Tulio Angulo, Aaron Kelley, Luis Montejano, Chuliang Song, Serguei Saavedra
    Abstract A central goal of ecological research has been to understand the limits on the maximum number of species that can coexist under given constraints. However, we know little about the assembly and disassembly processes under which a community can reach such a maximum number, or whether this number is in fact attainable in practice. This limitation is partly due to the challenge of performing experimental work and partly due to the lack of a formalism under which one can systematically study such processes. Here, we introduce a formalism based on algebraic topology and homology theory to study the space of species coexistence formed by a given pool of species. We show that this space is characterized by ubiquitous discontinuities that we call coexistence holes (that is, empty spaces surrounded by filled space). Using theoretical and experimental systems, we provide direct evidence showing that these coexistence holes do not occur arbitrarily—their diversity is constrained by the internal structure of species interactions and their frequency can be explained by the external factors acting on these systems. Our work suggests that the assembly and disassembly of ecological systems is a discontinuous process that tends to obey regularities.
  152. Persistent Homology Index as a Robust Quantitative Measure of Immunohistochemical Scoring (2017)

    Akihiro Takiyama, Takashi Teramoto, Hiroaki Suzuki, Katsushige Yamashiro, Shinya Tanaka
    Abstract Immunohistochemical data (IHC) plays an important role in clinical practice, and is typically gathered in a semi-quantitative fashion that relies on some degree of visual scoring. However, visual scoring by a pathologist is inherently subjective and manifests both intra-observer and inter-observer variability. In this study, we introduce a novel computer-aided quantification methodology for immunohistochemical scoring that uses the algebraic concept of persistent homology. Using 8 bit grayscale image data derived from 90 specimens of invasive ductal carcinoma of the breast, stained for the replicative marker Ki-67, we computed homology classes. These were then compared to nuclear grades and the Ki-67 labeling indices obtained by visual scoring. Three metrics for IHC staining were newly defined: Persistent Homology Index (PHI), center coordinates of positive and negative groups, and the sum of squares within groups (WSS). This study demonstrates that PHI, a novel index for immunohistochemical labeling using persistent homology, can produce highly similar data to that generated by a pathologist using visual evaluation. The potential benefits associated with our novel technology include both improved quantification and reproducibility. Since our method reflects cellularity and nuclear atypia, it carries a greater quantity of biologic data compared to conventional evaluation using Ki-67.
  153. Segmentation of Biomedical Images by a Computational Topology Framework (2017)

    Rodrigo Rojas Moraleda, Wei Xiong, Niels Halama, Katja Breitkopf-Heinlein, Steven Steven, Luis Salinas, Dieter W. Heermann, Nektarios A. Valous
    Abstract The segmentation of cell nuclei is an important step towards the automated analysis of histological images. The presence of a large number of nuclei in whole-slide images necessitates methods that are computationally tractable in addition to being effective. In this work, a method is developed for the robust segmentation of cell nuclei in histological images based on the principles of persistent homology. More specifically, an abstract simplicial homology approach for image segmentation is established. Essentially, the approach deals with the persistence of disconnected sets in the image, thus identifying salient regions that express patterns of persistence. By introducing an image representation based on topological features, the task of segmentation is less dependent on variations of color or texture. This results in a novel approach that generalizes well and provides stable performance. The method conceptualizes regions of interest (cell nuclei) pertinent to their topological features in a successful manner. The time cost of the proposed approach is lower-bounded by an almost linear behavior and upper-bounded by O(n2) in a worst-case scenario. Time complexity matches a quasilinear behavior which is O(n1+ɛ) for ε \textless 1. Images acquired from histological sections of liver tissue are used as a case study to demonstrate the effectiveness of the approach. The histological landscape consists of hepatocytes and non-parenchymal cells. The accuracy of the proposed methodology is verified against an automated workflow created by the output of a conventional filter bank (validated by experts) and the supervised training of a random forest classifier. The results are obtained on a per-object basis. The proposed workflow successfully detected both hepatocyte and non-parenchymal cell nuclei with an accuracy of 84.6%, and hepatocyte cell nuclei only with an accuracy of 86.2%. A public histological dataset with supplied ground-truth data is also used for evaluating the performance of the proposed approach (accuracy: 94.5%). Further validations are carried out with a publicly available dataset and ground-truth data from the Gland Segmentation in Colon Histology Images Challenge (GlaS) contest. The proposed method is useful for obtaining unsupervised robust initial segmentations that can be further integrated in image/data processing and management pipelines. The development of a fully automated system supporting a human expert provides tangible benefits in the context of clinical decision-making.
  154. Inferring COVID-19 Biological Pathways From Clinical Phenotypes via Topological Analysis (2021)

    Negin Karisani, Daniel E. Platt, Saugata Basu, Laxmi Parida
    Abstract COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this process in a timely manner. However, medical records are often unstructured clinical notes, and this poses significant challenges to developing the automated systems. In this article, we propose a pipeline to aid practitioners in analyzing clinical notes and revealing the pathways associated with this disease. Our pipeline relies on topological properties and consists of three steps: 1) pre-processing the clinical notes to extract the salient concepts, 2) constructing a feature space of the patients to characterize the extracted concepts, and finally, 3) leveraging the topological properties to distill the available knowledge and visualize the result. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.
  155. Understanding Diffraction Patterns of Glassy, Liquid and Amorphous Materials via Persistent Homology Analyses (2019)

    Yohei Onodera, Shinji Kohara, Shuta Tahara, Atsunobu Masuno, Hiroyuki Inoue, Motoki Shiga, Akihiko Hirata, Koichi Tsuchiya, Yasuaki Hiraoka, Ippei Obayashi, Koji Ohara, Akitoshi Mizuno, Osami Sakata
    Abstract The structure of glassy, liquid, and amorphous materials is still not well understood, due to the insufficient structural information from diffraction data. In this article, attempts are made to understand the origin of diffraction peaks, particularly of the first sharp diffraction peak (FSDP, Q1), the principal peak (PP, Q2), and the third peak (Q3), observed in the measured diffraction patterns of disordered materials whose structure contains tetrahedral motifs. It is confirmed that the FSDP (Q1) is not a signature of the formation of a network, because an FSDP is observed in tetrahedral molecular liquids. It is found that the PP (Q2) reflects orientational correlations of tetrahedra. Q3, that can be observed in all disordered materials, even in common liquid metals, stems from simple pair correlations. Moreover, information on the topology of disordered materials was revealed by utilizing persistent homology analyses. The persistence diagram of silica (SiO2) glass suggests that the shape of rings in the glass is similar not only to those in the crystalline phase with comparable density (α-cristobalite), but also to rings present in crystalline phases with higher density (α-quartz and coesite); this is thought to be the signature of disorder. Furthermore, we have succeeded in revealing the differences, in terms of persistent homology, between tetrahedral networks and tetrahedral molecular liquids, and the difference/similarity between liquid and amorphous (glassy) states. Our series of analyses demonstrated that a combination of diffraction data and persistent homology analyses is a useful tool for allowing us to uncover structural features hidden in halo pattern of disordered materials.
  156. CD8 T-Cell Reactivity to Islet Antigens Is Unique to Type 1 While CD4 T-Cell Reactivity Exists in Both Type 1 and Type 2 Diabetes (2014)

    Ghanashyam Sarikonda, Jeremy Pettus, Sonal Phatak, Sowbarnika Sachithanantham, Jacqueline F. Miller, Johnna D. Wesley, Eithon Cadag, Ji Chae, Lakshmi Ganesan, Ronna Mallios, Steve Edelman, Bjoern Peters, Matthias von Herrath
    Abstract Previous cross-sectional analyses demonstrated that CD8+ and CD4+ T-cell reactivity to islet-specific antigens was more prevalent in T1D subjects than in healthy donors (HD). Here, we examined T1D-associated epitope-specific CD4+ T-cell cytokine production and autoreactive CD8+ T-cell frequency on a monthly basis for one year in 10 HD, 33 subjects with T1D, and 15 subjects with T2D. Autoreactive CD4+ T-cells from both T1D and T2D subjects produced more IFN-γ when stimulated than cells from HD. In contrast, higher frequencies of islet antigen-specific CD8+ T-cells were detected only in T1D. These observations support the hypothesis that general beta-cell stress drives autoreactive CD4+ T-cell activity while islet over-expression of MHC class I commonly seen in T1D mediates amplification of CD8+ T-cells and more rapid beta-cell loss. In conclusion, CD4+ T-cell autoreactivity appears to be present in both T1D and T2D while autoreactive CD8+ T-cells are unique to T1D. Thus, autoreactive CD8+ cells may serve as a more T1D-specific biomarker.
  157. Imaging-Based Representation and Stratification of Intra-Tumor Heterogeneity via Tree-Edit Distance (2022)

    Lara Cavinato, Matteo Pegoraro, Alessandra Ragni, Francesca Ieva
    Abstract Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning.
  158. MRI and Biomechanics Multidimensional Data Analysis Reveals R2 -R1ρ as an Early Predictor of Cartilage Lesion Progression in Knee Osteoarthritis (2017)

    Valentina Pedoia, Jenny Haefeli, Kazuhito Morioka, Hsiang-Ling Teng, Lorenzo Nardo, Richard B. Souza, Adam R. Ferguson, Sharmila Majumdar
    Abstract PURPOSE: To couple quantitative compositional MRI, gait analysis, and machine learning multidimensional data analysis to study osteoarthritis (OA). OA is a multifactorial disorder accompanied by biochemical and morphological changes in the articular cartilage, modulated by skeletal biomechanics and gait. While we can now acquire detailed information about the knee joint structure and function, we are not yet able to leverage the multifactorial factors for diagnosis and disease management of knee OA. MATERIALS AND METHODS: We mapped 178 subjects in a multidimensional space integrating: demographic, clinical information, gait kinematics and kinetics, cartilage compositional T1ρ and T2 and R2 -R1ρ (1/T2 -1/T1ρ ) acquired at 3T and whole-organ magnetic resonance imaging score morphological grading. Topological data analysis (TDA) and Kolmogorov-Smirnov test were adopted for data integration, analysis, and hypothesis generation. Regression models were used for hypothesis testing. RESULTS: The results of the TDA showed a network composed of three main patient subpopulations, thus potentially identifying new phenotypes. T2 and T1ρ values (T2 lateral femur P = 1.45*10-8 , T1ρ medial tibia P = 1.05*10-5 ), the presence of femoral cartilage defects (P = 0.0013), lesions in the meniscus body (P = 0.0035), and race (P = 2.44*10-4 ) were key markers in the subpopulation classification. Within one of the subpopulations we observed an association between the composite metric R2 -R1ρ and the longitudinal progression of cartilage lesions. CONCLUSION: The analysis presented demonstrates some of the complex multitissue biochemical and biomechanical interactions that define joint degeneration and OA using a multidimensional approach, and potentially indicates that R2 -R1ρ may be an imaging biomarker for early OA. LEVEL OF EVIDENCE: 3 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;47:78-90.
  159. Induction Motor Eccentricity Fault Detection and Quantification Using Topological Data Analysis (2024)

    Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru
    Abstract In this paper, we propose a topological data analysis (TDA) method for the processing of induction motor stator current data, and apply it to the detection and quantification of eccentricity faults. Traditionally, physics-based models and involved signal processing techniques are required to identify and extract the subtle frequency components in current data related to a particular fault. We show that TDA offers an alternative way to extract fault related features, and effectively distinguish data from different fault conditions. We will introduce TDA method and the procedure of extracting topological features from time-domain data, and apply it to induction motor current data measured under different eccentricity fault conditions. We show that while the raw time-domain data are very challenging to distinguish, the extracted topological features from these data are distinct and highly associated with eccentricity fault level. With TDA processed data, we can effectively train machine learning models to predict fault levels with good accuracy, even for new data from eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of data-driven fault detection applications.
  160. A Sheaf and Topology Approach to Generating Local Branch Numbers in Digital Images (2020)

    Chuan-Shen Hu, Yu-Min Chung
    Abstract This paper concerns a theoretical approach that combines topological data analysis (TDA) and sheaf theory. Topological data analysis, a rising field in mathematics and computer science, concerns the shape of the data and has been proven effective in many scientific disciplines. Sheaf theory, a mathematics subject in algebraic geometry, provides a framework for describing the local consistency in geometric objects. Persistent homology (PH) is one of the main driving forces in TDA, and the idea is to track changes of geometric objects at different scales. The persistence diagram (PD) summarizes the information of PH in the form of a multi-set. While PD provides useful information about the underlying objects, it lacks fine relations about the local consistency of specific pairs of generators in PD, such as the merging relation between two connected components in the PH. The sheaf structure provides a novel point of view for describing the merging relation of local objects in PH. It is the goal of this paper to establish a theoretic framework that utilizes the sheaf theory to uncover finer information from the PH. We also show that the proposed theory can be applied to identify the branch numbers of local objects in digital images.
  161. Microscopic Description of Yielding in Glass Based on Persistent Homology (2019)

    Tatsuhiko Shirai, Takenobu Nakamura
    Abstract Persistent homology (PH) was applied to probe the structural changes of glasses under shear. PH associates each local atomistic structure in an atomistic configuration to a geometric object, namely, a hole, and evaluates the robustness of these holes against noise. We found that the microscopic structures were qualitatively different before and after yielding. The structures before yielding contained robust holes, the number of which decreased after yielding. We also observed that the structures after yielding approached those of quickly quenched glass. This work demonstrates the crucial role of robust holes in yielding and provides an interpretation based on geometry.
  162. A Topological Paradigm for Hippocampal Spatial Map Formation Using Persistent Homology (2012)

    Y. Dabaghian, F. Mémoli, L. Frank, G. Carlsson
    Abstract An animal's ability to navigate through space rests on its ability to create a mental map of its environment. The hippocampus is the brain region centrally responsible for such maps, and it has been assumed to encode geometric information (distances, angles). Given, however, that hippocampal output consists of patterns of spiking across many neurons, and downstream regions must be able to translate those patterns into accurate information about an animal's spatial environment, we hypothesized that 1) the temporal pattern of neuronal firing, particularly co-firing, is key to decoding spatial information, and 2) since co-firing implies spatial overlap of place fields, a map encoded by co-firing will be based on connectivity and adjacency, i.e., it will be a topological map. Here we test this topological hypothesis with a simple model of hippocampal activity, varying three parameters (firing rate, place field size, and number of neurons) in computer simulations of rat trajectories in three topologically and geometrically distinct test environments. Using a computational algorithm based on recently developed tools from Persistent Homology theory in the field of algebraic topology, we find that the patterns of neuronal co-firing can, in fact, convey topological information about the environment in a biologically realistic length of time. Furthermore, our simulations reveal a “learning region” that highlights the interplay between the parameters in combining to produce hippocampal states that are more or less adept at map formation. For example, within the learning region a lower number of neurons firing can be compensated by adjustments in firing rate or place field size, but beyond a certain point map formation begins to fail. We propose that this learning region provides a coherent theoretical lens through which to view conditions that impair spatial learning by altering place cell firing rates or spatial specificity., Our ability to navigate our environments relies on the ability of our brains to form an internal representation of the spaces we're in. The hippocampus plays a central role in forming this internal spatial map, and it is thought that the ensemble of active “place cells” (neurons that are sensitive to location) somehow encode metrical information about the environment, akin to a street map. Several considerations suggested to us, however, that the brain might be more interested in topological information—i.e., connectivity, containment, and adjacency, more akin to a subway map— so we employed new methods in computational topology to estimate how basic properties of neuronal firing affect the time required to form a hippocampal spatial map of three test environments. Our analysis suggests that, in order to encode topological information correctly and in a biologically reasonable amount of time, the hippocampal place cells must operate within certain parameters of neuronal activity that vary with both the geometric and topological properties of the environment. The interplay of these parameters forms a “learning region” in which changes in one parameter can successfully compensate for changes in the others; values beyond the limits of this region, however, impair map formation.
  163. Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury (2015)

    Jessica L. Nielson, Jesse Paquette, Aiwen W. Liu, Cristian F. Guandique, C. Amy Tovar, Tomoo Inoue, Karen-Amanda Irvine, John C. Gensel, Jennifer Kloke, Tanya C. Petrossian, Pek Y. Lum, Gunnar E. Carlsson, Geoffrey T. Manley, Wise Young, Michael S. Beattie, Jacqueline C. Bresnahan, Adam R. Ferguson
    Abstract Data-driven discovery in complex neurological disorders has potential to extract meaningful knowledge from large, heterogeneous datasets. Here the authors apply topological data analysis to assess therapeutic effects in preclinical traumatic brain injury and spinal cord injury research studies.
  164. Fruit Flies and Moduli: Interactions Between Biology and Mathematics (2015)

    Ezra Miller
    Abstract Possibilities for using geometry and topology to analyze statistical problems in biology raise a host of novel questions in geometry, probability, algebra, and combinatorics that demonstrate the power of biology to influence the future of pure mathematics. This expository article is a tour through some biological explorations and their mathematical ramifications. The article starts with evolution of novel topological features in wing veins of fruit flies, which are quantified using the algebraic structure of multiparameter persistent homology. The statistical issues involved highlight mathematical implications of sampling from moduli spaces. These lead to geometric probability on stratified spaces, including the sticky phenomenon for Frechet means and the origin of this mathematical area in the reconstruction of phylogenetic trees.
  165. Diverse 3D Cellular Patterns Underlie the Development of Cardamine Hirsuta and Arabidopsis Thaliana Ovules (2023)

    Tejasvinee Atul Mody, Alexander Rolle, Nico Stucki, Fabian Roll, Ulrich Bauer, Kay Schneitz
    Abstract A fundamental question in biology is how organ morphogenesis comes about. The ovules of Arabidopsis thaliana have been established as a successful model to study numerous aspects of tissue morphogenesis; however, little is known regarding the relative contributions and dynamics of differential tissue and cellular growth and architecture in establishing ovule morphogenesis in different species. To address this issue, we generated a 3D digital atlas of Cardamine hirsuta ovule development with full cellular resolution. We combined quantitative comparative morphometrics and topological analysis to explore similarities and differences in the 3D cellular architectures underlying ovule development of the two species. We discovered that they show diversity in the way the three radial cell layers of the primordium contribute to its growth, in the formation of a new cell layer in the inner integument and, in certain cases, in the topological properties of the 3D cell architectures of homologous tissues despite their similar shape. Our work demonstrates the power of comparative 3D cellular morphometry and the importance of internal tissues and their cellular architecture in organ morphogenesis. Summary Statement Quantitative morphometric comparison of 3D digital ovules at full cellular resolution reveals diversity in internal 3D cellular architectures between similarly shaped ovules of Cardamine hirsuta and Arabidopsis thaliana.
  166. Gene Expression Data Classification Using Topology and Machine Learning Models (2022)

    Tamal K. Dey, Sayan Mandal, Soham Mukherjee
    Abstract Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.
  167. The Classification of Endoscopy Images With Persistent Homology (2016)

    Olga Dunaeva, Herbert Edelsbrunner, Anton Lukyanov, Michael Machin, Daria Malkova, Roman Kuvaev, Sergey Kashin
    Abstract Aiming at the automatic diagnosis of tumors using narrow band imaging (NBI) magnifying endoscopic (ME) images of the stomach, we combine methods from image processing, topology, geometry, and machine learning to classify patterns into three classes: oval, tubular and irregular. Training the algorithm on a small number of images of each type, we achieve a high rate of correct classifications. The analysis of the learning algorithm reveals that a handful of geometric and topological features are responsible for the overwhelming majority of decisions.
  168. The Emergence of Higher-Order Structure in Scientific and Technological Knowledge Networks (2020)

    Thomas Gebhart, Russell J. Funk
    Abstract The growth of science and technology is primarily a recombinative process, wherein new discoveries and inventions are generally built from prior knowledge. While the recent past has seen rapid growth in scientific and technological knowledge, relatively little is known about the manner in which science and technology develop and coalesce knowledge into larger structures that enable or constrain future breakthroughs. Network science has recently emerged as a framework for measuring the structure and dynamics of knowledge. While helpful, these existing approaches struggle to capture the global structural properties of the underlying networks, leading to conflicting observations about the nature of scientific and technological progress. We bridge this methodological gap using tools from algebraic topology to characterize the higher-order structure of knowledge networks in science and technology across scale. We observe rapid and varied growth in the high-dimensional structure in many fields of science and technology, and find this high-dimensional growth coincides with decline in lower-dimensional structure. This higher-order growth in knowledge networks has historically far outpaced the growth in scientific and technological collaboration networks. We also characterize the relationship between higher-order structure and the nature of the science and technology produced within these structural environments and find a positive relationship between the abstractness of language used within fields and increasing high-dimensional structure. We also find a robust relationship between high-dimensional structure and number of metrics for publication success, implying this high-dimensional structure may be linked to discovery and invention.
  169. Stable Signatures for Dynamic Graphs and Dynamic Metric Spaces via Zigzag Persistence (2018)

    Woojin Kim, Facundo Memoli
    Abstract When studying flocking/swarming behaviors in animals one is interested in quantifying and comparing the dynamics of the clustering induced by the coalescence and disbanding of animals in different groups. In a similar vein, studying the dynamics of social networks leads to the problem of characterizing groups/communities as they form and disperse throughout time. Motivated by this, we study the problem of obtaining persistent homology based summaries of time-dependent data. Given a finite dynamic graph (DG), we first construct a zigzag persistence module arising from linearizing the dynamic transitive graph naturally induced from the input DG. Based on standard results, we then obtain a persistence diagram or barcode from this zigzag persistence module. We prove that these barcodes are stable under perturbations in the input DG under a suitable distance between DGs that we identify. More precisely, our stability theorem can be interpreted as providing a lower bound for the distance between DGs. Since it relies on barcodes, and their bottleneck distance, this lower bound can be computed in polynomial time from the DG inputs. Since DGs can be given rise by applying the Rips functor (with a fixed threshold) to dynamic metric spaces, we are also able to derive related stable invariants for these richer class of dynamic objects. Along the way, we propose a summarization of dynamic graphs that captures their time-dependent clustering features which we call formigrams. These set-valued functions generalize the notion of dendrogram, a prevalent tool for hierarchical clustering. In order to elucidate the relationship between our distance between two DGs and the bottleneck distance between their associated barcodes, we exploit recent advances in the stability of zigzag persistence due to Botnan and Lesnick, and to Bjerkevik.
  170. Multiscale Projective Coordinates via Persistent Cohomology of Sparse Filtrations (2018)

    Jose A. Perea
    Abstract We present a framework which leverages the underlying topology of a data set, in order to produce appropriate coordinate representations. In particular, we show how to construct maps to real and complex projective spaces, given appropriate persistent cohomology classes. An initial map is obtained in two steps: First, the persistent cohomology of a sparse filtration is used to compute systems of transition functions for (real and complex) line bundles over neighborhoods of the data. Next, the transition functions are used to produce explicit classifying maps for the induced bundles. A framework for dimensionality reduction in projective space (Principal Projective Components) is also developed, aimed at decreasing the target dimension of the original map. Several examples are provided as well as theorems addressing choices in the construction.
  171. Relational Persistent Homology for Multispecies Data With Application to the Tumor Microenvironment (2023)

    Bernadette J. Stolz, Jagdeep Dhesi, Joshua A. Bull, Heather A. Harrington, Helen M. Byrne, Iris H. R. Yoon
    Abstract Topological data analysis (TDA) is an active field of mathematics for quantifying shape in complex data. Standard methods in TDA such as persistent homology (PH) are typically focused on the analysis of data consisting of a single entity (e.g., cells or molecular species). However, state-of-the-art data collection techniques now generate exquisitely detailed multispecies data, prompting a need for methods that can examine and quantify the relations among them. Such heterogeneous data types arise in many contexts, ranging from biomedical imaging, geospatial analysis, to species ecology. Here, we propose two methods for encoding spatial relations among different data types that are based on Dowker complexes and Witness complexes. We apply the methods to synthetic multispecies data of a tumor microenvironment and analyze topological features that capture relations between different cell types, e.g., blood vessels, macrophages, tumor cells, and necrotic cells. We demonstrate that relational topological features can extract biological insight, including the dominant immune cell phenotype (an important predictor of patient prognosis) and the parameter regimes of a data-generating model. The methods provide a quantitative perspective on the relational analysis of multispecies spatial data, overcome the limits of traditional PH, and are readily computable.
  172. The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)

    Salvador Chulián, Bernadette J. Stolz, Álvaro Martínez-Rubio, Cristina Blázquez Goñi, Juan F. Rodríguez Gutiérrez, Teresa Caballero Velázquez, Águeda Molinos Quintana, Manuel Ramírez Orellana, Ana Castillo Robleda, José Luis Fuster Soler, Alfredo Minguela Puras, María Victoria Martínez Sánchez, María Rosa, Víctor M. Pérez-García, Helen Byrne
    Abstract Acute Lymphoblastic Leukaemia (ALL) is the most frequent paediatric cancer. Modern therapies have improved survival rates, but approximately 15-20 % of patients relapse. At present, patients’ risk of relapse are assessed by projecting high-dimensional flow cytometry data onto a subset of biomarkers and manually estimating the shape of this reduced data. Here, we apply methods from topological data analysis (TDA), which quantify shape in data via features such as connected components and loops, to pre-treatment ALL datasets with known outcomes. We combine these fully unsupervised analyses with machine learning to identify features in the pre-treatment data that are prognostic for risk of relapse. We find significant topological differences between relapsing and non-relapsing patients and confirm the predictive power of CD10, CD20, CD38, and CD45. Further, we are able to use the TDA descriptors to predict patients who relapsed. We propose three prognostic pipelines that readily extend to other haematological malignancies. Teaser Topology reveals features in flow cytometry data which predict relapse of patients with acute lymphoblastic leukemia
  173. HiDeF: Identifying Persistent Structures in Multiscale ‘Omics Data (2021)

    Fan Zheng, She Zhang, Christopher Churas, Dexter Pratt, Ivet Bahar, Trey Ideker
    Abstract In any ‘omics study, the scale of analysis can dramatically affect the outcome. For instance, when clustering single-cell transcriptomes, is the analysis tuned to discover broad or specific cell types? Likewise, protein communities revealed from protein networks can vary widely in sizes depending on the method. Here, we use the concept of persistent homology, drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. Application to mouse single-cell transcriptomes significantly expands the catalog of identified cell types, while analysis of SARS-COV-2 protein interactions suggests hijacking of WNT. The method, HiDeF, is available via Python and Cytoscape.
  174. Topological Edge Modes by Smart Patterning (2018)

    David J. Apigo, Kai Qian, Camelia Prodan, Emil Prodan
    Abstract We study identical coupled mechanical resonators whose collective dynamics are fully determined by the patterns in which they are arranged. In this work, we call a system topological if (1) boundary resonant modes fully fill all existing spectral gaps whenever the system is halved, and (2) if the boundary spectrum cannot be removed or gapped by any boundary condition. We demonstrate that such topological characteristics can be induced solely through patterning, in a manner entirely independent of the structure of the resonators and the details of the couplings. The existence of such patterns is proven using K theory and exemplified using an experimental platform based on magnetically coupled spinners. Topological metamaterials built on these principles can be easily engineered at any scale, providing a practical platform for applications and devices.
  175. Weighted-Persistent-Homology-Based Machine Learning for RNA Flexibility Analysis (2020)

    Chi Seng Pun, Brandon Yung Sin Yong, Kelin Xia
    Abstract With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.
  176. Classification of Skin Lesions by Topological Data Analysis Alongside With Neural Network (2020)

    Naiereh Elyasi, Mehdi Hosseini Moghadam
    Abstract In this paper we use TDA mapper alongside with deep convolutional neural networks in the classification of 7 major skin diseases. First we apply kepler mapper with neural network as one of its filter steps to classify the dataset HAM10000. Mapper visualizes the classification result by a simplicial complex, where neural network can not do this alone, but as a filter step neural network helps to classify data better. Furthermore we apply TDA mapper and persistent homology to understand the weights of layers of mobilenet network in different training epochs of HAM10000. Also we use persistent diagrams to visualize the results of analysis of layers of mobilenet network.
  177. A Machine-Learning-Based Early Warning System Boosted by Topological Data Analysis (2019)

    Devraj Basu, Tieqiang Li
    Abstract We propose a novel early warning system for detecting financial market crashes that utilizes the information extracted from the shape of financial market movement. Our system incorporates Topological Data Analysis (TDA), a new set of data analytics techniques specialised in profiling the shape of data, into a more traditional machine learning framework. Incorporating TDA leads to substantial improvements in timely detecting the onset of a sharp market decline. Our framework is both able to generate new features and also unlock more value from existing factors. Our results illustrate the importance of understanding the shape of financial market data and suggest that incorporating TDA into a machine learning framework could be beneficial in a number of financial market settings.
  178. Persistent Homology in Cosmic Shear - II. A Tomographic Analysis of DES-Y1 (2022)

    Sven Heydenreich, Benjamin Brück, Pierre Burger, Joachim Harnois-Déraps, Sandra Unruh, Tiago Castro, Klaus Dolag, Nicolas Martinet
    Abstract We demonstrate how to use persistent homology for cosmological parameter inference in a tomographic cosmic shear survey. We obtain the first cosmological parameter constraints from persistent homology by applying our method to the first-year data of the Dark Energy Survey. To obtain these constraints, we analyse the topological structure of the matter distribution by extracting persistence diagrams from signal-to-noise maps of aperture masses. This presents a natural extension to the widely used peak count statistics. Extracting the persistence diagrams from the cosmo-SLICS, a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations with variable cosmological parameters, we interpolate the signal using Gaussian processes and marginalise over the most relevant systematic effects, including intrinsic alignments and baryonic effects. For the structure growth parameter, we find , which is in full agreement with other late-time probes. We also constrain the intrinsic alignment parameter to \textlessi\textgreaterA\textlessi/\textgreater = 1.54 ± 0.52, which constitutes a detection of the intrinsic alignment effect at almost 3\textlessi\textgreaterσ\textlessi/\textgreater.
  179. Cliques of Neurons Bound Into Cavities Provide a Missing Link Between Structure and Function (2017)

    Michael W. Reimann, Max Nolte, Martina Scolamiero, Katharine Turner, Rodrigo Perin, Giuseppe Chindemi, Paweł Dłotko, Ran Levi, Kathryn Hess, Henry Markram
    Abstract The lack of a formal link between neural network structure and its emergent function has hampered our understanding of how the brain processes information. We have now come closer to describing such a link by taking the direction of synaptic transmission into account, constructing graphs of a network that reflect the direction of information flow, and analyzing these directed graphs using algebraic topology. Applying this approach to a local network of neurons in the neocortex revealed a remarkably intricate and previously unseen topology of synaptic connectivity. The synaptic network contains an abundance of cliques of neurons bound into cavities that guide the emergence of correlated activity. In response to stimuli, correlated activity binds synaptically connected neurons into functional cliques and cavities that evolve in a stereotypical sequence towards peak complexity. We propose that the brain processes stimuli by forming increasingly complex functional cliques and cavities.
  180. Topology of Force Networks in Granular Media Under Impact (2017)

    M. X. Lim, R. P. Behringer
    Abstract We investigate the evolution of the force network in experimental systems of two-dimensional granular materials under impact. We use the first Betti number, , and persistence diagrams, as measures of the topological properties of the force network. We show that the structure of the network has a complex, hysteretic dependence on both the intruder acceleration and the total force response of the granular material. can also distinguish between the nonlinear formation and relaxation of the force network. In addition, using the persistence diagram of the force network, we show that the size of the loops in the force network has a Poisson-like distribution, the characteristic size of which changes over the course of the impact.
  181. PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction (2020)

    Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov, Katherine Yelick
    Abstract Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structure-based prediction schemes. We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own as well as a baseline neural network that learns from the same information. PersGNN achieves a 9.3% boost in area under the precision recall curve (AUPR) compared to the best individual model, as well as high F1 scores across different gene ontology categories, indicating the transferability of this approach.
  182. Steinhaus Filtration and Stable Paths in the Mapper (2020)

    Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel Saul
    Abstract Two central concepts from topological data analysis are persistence and the Mapper construction. Persistence employs a sequence of objects built on data called a filtration. A Mapper produces insightful summaries of data, and has found widespread applications in diverse areas. We define a new filtration called the cover filtration built from a single cover based on a generalized Steinhaus distance, which is a generalization of Jaccard distance. We prove a stability result: the cover filtrations of two covers are \$\alpha/m\$ interleaved, where \$\alpha\$ is a bound on bottleneck distance between covers and \$m\$ is the size of smallest set in either cover. We also show our construction is equivalent to the Cech filtration under certain settings, and the Vietoris-Rips filtration completely determines the cover filtration in all cases. We then develop a theory for stable paths within this filtration. Unlike standard results on stability in topological persistence, our definition of path stability aligns exactly with the above result on stability of cover filtration. We demonstrate how our framework can be employed in a variety of applications where a metric is not obvious but a cover is readily available. First we present a new model for recommendation systems using cover filtration. For an explicit example, stable paths identified on a movies data set represent sequences of movies constituting gentle transitions from one genre to another. As a second application in explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations. Stable paths in the Mapper from a supervised machine learning model trained on the FashionMNIST data set provide improved explanations of relationships between subpopulations of images.
  183. Motor Eccentricity Fault Detection: Physics-Based and Data-Driven Approaches (2023)

    Bingnan Wang, Hiroshi Inoue, Makoto Kanemaru
    Abstract Fault detection using motor current signature analysis (MCSA) is attractive for industrial applications due to its simplicity with no additional sensor installation required. However current components associated with faults are often very subtle and much smaller than the supply frequency component, making it challenging to detect and quantify fault levels. In this paper, we present our work on quantitative eccentricity fault diagnosis technologies for electric motors, including physical-model approach using improved winding function theory, which can simulate motor dynamics under faulty conditions and agrees well with experiment data, and data-driven approach using topological data analysis (TDA), which can effectively differentiate signals measured at different eccentricity levels. The advantages and limitations of each approach is discussed. Both methods can be extended to the detection and quantification of other types of electric motor faults.
  184. Continuous Indexing of Fibrosis (CIF): Improving the Assessment and Classification of MPN Patients (2022)

    Hosuk Ryou, Korsuk Sirinukunwattana, Alan Aberdeen, Gillian Grindstaff, Bernadette Stolz, Helen Byrne, Heather A. Harrington, Nikolaos Sousos, Anna L. Godfrey, Claire N. Harrison, Bethan Psaila, Adam J. Mead, Gabrielle Rees, Gareth D. H. Turner, Jens Rittscher, Daniel Royston
    Abstract The detection and grading of fibrosis in myeloproliferative neoplasms (MPN) is an important component of disease classification, prognostication and disease monitoring. However, current fibrosis grading systems are only semi-quantitative and fail to capture sample heterogeneity. To improve the detection, quantitation and representation of reticulin fibrosis, we developed a machine learning (ML) approach using bone marrow trephine (BMT) samples (n = 107) from patients diagnosed with MPN or a reactive / nonneoplastic marrow. The resulting Continuous Indexing of Fibrosis (CIF) enhances the detection and monitoring of fibrosis within BMTs, and aids the discrimination of MPN subtypes. When combined with megakaryocyte feature analysis, CIF discriminates between the frequently challenging differential diagnosis of essential thrombocythemia (ET) and pre-fibrotic myelofibrosis (pre-PMF) with high predictive accuracy [area under the curve = 0.94]. CIF also shows significant promise in the identification of MPN patients at risk of disease progression; analysis of samples from 35 patients diagnosed with ET and enrolled in the Primary Thrombocythemia-1 (PT-1) trial identified features predictive of post-ET myelofibrosis (area under the curve = 0.77). In addition to these clinical applications, automated analysis of fibrosis has clear potential to further refine disease classification boundaries and inform future studies of the micro-environmental factors driving disease initiation and progression in MPN and other stem cell disorders. The image analysis methods used to generate CIF can be readily integrated with those of other key morphological features in MPNs, including megakaryocyte morphology, that lie beyond the scope of conventional histological assessment. Key PointsMachine learning enables an objective and quantitative description of reticulin fibrosis within the bone marrow of patients with myeloproliferative neoplasms (MPN),Automated analysis and Continuous Indexing of Fibrosis (CIF) captures heterogeneity within MPN samples and has utility in refined classification and disease monitoringQuantitative fibrosis assessment combined with topological data analysis may help to predict patients at increased risk of progression to post-ET myelofibrosis, and assist in the discrimination of ET and pre-fibrotic PMF (pre-PMF)
  185. Persistent Homology for Path Planning in Uncertain Environments (2015)

    S. Bhattacharya, R. Ghrist, V. Kumar
    Abstract We address the fundamental problem of goal-directed path planning in an uncertain environment represented as a probability (of occupancy) map. Most methods generally use a threshold to reduce the grayscale map to a binary map before applying off-the-shelf techniques to find the best path. This raises the somewhat ill-posed question, what is the right (optimal) value to threshold the map? We instead suggest a persistent homology approach to the problem-a topological approach in which we seek the homology class of trajectories that is most persistent for the given probability map. In other words, we want the class of trajectories that is free of obstacles over the largest range of threshold values. In order to make this problem tractable, we use homology in ℤ2 coefficients (instead of the standard ℤ coefficients), and describe how graph search-based algorithms can be used to find trajectories in different homology classes. Our simulation results demonstrate the efficiency and practical applicability of the algorithm proposed in this paper.paper.
  186. Weighted Persistent Homology for Biomolecular Data Analysis (2020)

    Zhenyu Meng, D. Vijay Anand, Yunpeng Lu, Jie Wu, Kelin Xia
    Abstract In this paper, we systematically review weighted persistent homology (WPH) models and their applications in biomolecular data analysis. Essentially, the weight value, which reflects physical, chemical and biological properties, can be assigned to vertices (atom centers), edges (bonds), or higher order simplexes (cluster of atoms), depending on the biomolecular structure, function, and dynamics properties. Further, we propose the first localized weighted persistent homology (LWPH). Inspired by the great success of element specific persistent homology (ESPH), we do not treat biomolecules as an inseparable system like all previous weighted models, instead we decompose them into a series of local domains, which may be overlapped with each other. The general persistent homology or weighted persistent homology analysis is then applied on each of these local domains. In this way, functional properties, that are embedded in local structures, can be revealed. Our model has been applied to systematically study DNA structures. It has been found that our LWPH based features can be used to successfully discriminate the A-, B-, and Z-types of DNA. More importantly, our LWPH based principal component analysis (PCA) model can identify two configurational states of DNA structures in ion liquid environment, which can be revealed only by the complicated helical coordinate system. The great consistence with the helical-coordinate model demonstrates that our model captures local structure variations so well that it is comparable with geometric models. Moreover, geometric measurements are usually defined in local regions. For instance, the helical-coordinate system is limited to one or two basepairs. However, our LWPH can quantitatively characterize structure information in regions or domains with arbitrary sizes and shapes, where traditional geometrical measurements fail.
  187. Substructure Topology Preserving Simplification of Tetrahedral Meshes (2011)

    Fabien Vivodtzev, Georges-Pierre Bonneau, Stefanie Hahmann, Hans Hagen
    Abstract Interdisciplinary efforts in modeling and simulating phenomena have led to complex multi-physics models involving different physical properties and materials in the same system. Within a 3d domain, substructures of lower dimensions appear at the interface between different materials. Correspondingly, an unstructuredtetrahedral mesh used for such a simulation includes 2d and 1d substructures embedded in the vertices, edges and faces of the mesh.The simplification of suchtetrahedral meshes must preserve (1) the geometry and the topology of the 3d domain, (2) the simulated data and (3) the geometry and topology of the embedded substructures. Although intensive research has been conducted on the first two goals, the third objective has received little attention.This paper focuses on the preservation of the topology of 1d and 2d substructures embedded in an unstructuredtetrahedral mesh, during edge collapse simplification. We define these substructures as simplicial sub-complexes of the mesh, which is modeled as an extended simplicial complex. We derive a robust algorithm, based on combinatorial topology results, in order to determine if an edge can be collapsed without changing the topology of both the mesh and all embedded substructures. Based on this algorithm we have developed a system for simplifying scientific datasets defined on irregular tetrahedral meshes with substructures. The implementation of our system is discussed in detail. We demonstrate the power of our system with real world scientific datasets from electromagnetism simulations.
  188. Topological Early Warning Signals: Quantifying Varying Routes to Extinction in a Spatially Distributed Population Model (2022)

    Laura S. Storch, Sarah L. Day
    Abstract Understanding and predicting critical transitions in spatially explicit ecological systems is particularly challenging due to their complex spatial and temporal dynamics and high dimensionality. Here, we explore changes in population distribution patterns during a critical transition (an extinction event) using computational topology. Computational topology allows us to quantify certain features of a population distribution pattern, such as the level of fragmentation. We create population distribution patterns via a simple coupled patch model with Ricker map growth and nearest neighbors dispersal on a two dimensional lattice. We observe two dominant paths to extinction within the explored parameter space that depend critically on the dispersal rate d and the rate of parameter drift, Δϵ. These paths to extinction are easily topologically distinguishable, so categorization can be automated. We use this population model as a theoretical proof-of-concept for the methodology, and argue that computational topology is a powerful tool for analyzing dynamical changes in systems with noisy data that are coarsely resolved in space and/or time. In addition, computational topology can provide early warning signals for chaotic dynamical systems where traditional statistical early warning signals would fail. For these reasons, we envision this work as a helpful addition to the critical transitions prediction toolbox.
  189. Persistent Homology Advances Interpretable Machine Learning for Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Joseph Montoya, Jens Hummelshøj, Dmitriy Morozov
    Abstract Machine learning for nanoporous materials design and discovery has emerged as a promising alternative to more time-consuming experiments and simulations. The challenge with this approach is the selection of features that enable universal and interpretable materials representations across multiple prediction tasks. We use persistent homology to construct holistic representations of the materials structure. We show that these representations can also be augmented with other generic features such as word embeddings from natural language processing to capture chemical information. We demonstrate our approach on multiple metal-organic framework datasets by predicting a variety of gas adsorption targets. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from commonly used manually curated features. Persistent homology features allow us to locate the pores that correlate best to adsorption at different pressures, contributing to understanding atomic level structure-property relationships for materials design.
  190. Topology in Cyber Research (2022)

    Steve Huntsman, Jimmy Palladino, Michael Robinson
    Abstract We give an idiosyncratic overview of applications of topology to cyber research, spanning the analysis of variables/assignments and control flow in computer programs, a brief sketch of topological data analysis in one dimension, and the use of sheaves to analyze wireless networks. The text is from a chapter in the forthcoming book Mathematics in Cyber Research, to be published by Taylor and Francis.
  191. Topological Machine Learning With Persistence Indicator Functions (2019)

    Bastian Rieck, Filip Sadlo, Heike Leitte
    Abstract Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distance-based data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernel-based similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data.
  192. Topological Portraits of Multiscale Coordination Dynamics (2020)

    Mengsen Zhang, William D. Kalies, J. A. Scott Kelso, Emmanuelle Tognoli
    Abstract Living systems exhibit complex yet organized behavior on multiple spatiotemporal scales. To investigate the nature of multiscale coordination in living systems, one needs a meaningful and systematic way to quantify the complex dynamics, a challenge in both theoretical and empirical realms. The present work shows how integrating approaches from computational algebraic topology and dynamical systems may help us meet this challenge. In particular, we focus on the application of multiscale topological analysis to coordinated rhythmic processes. First, theoretical arguments are introduced as to why certain topological features and their scale-dependency are highly relevant to understanding complex collective dynamics. Second, we propose a method to capture such dynamically relevant topological information using persistent homology, which allows us to effectively construct a multiscale topological portrait of rhythmic coordination. Finally, the method is put to test in detecting transitions in real data from an experiment of rhythmic coordination in ensembles of interacting humans. The recurrence plots of topological portraits highlight collective transitions in coordination patterns that were elusive to more traditional methods. This sensitivity to collective transitions would be lost if the behavioral dynamics of individuals were treated as separate degrees of freedom instead of constituents of the topology that they collectively forge. Such multiscale topological portraits highlight collective aspects of coordination patterns that are irreducible to properties of individual parts. The present work demonstrates how the analysis of multiscale coordination dynamics can benefit from topological methods, thereby paving the way for further systematic quantification of complex, high-dimensional dynamics in living systems.
  193. Topological Graph Neural Networks (2021)

    Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt
    Abstract Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data.
  194. Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

    John T. Nardini, Bernadette J. Stolz, Kevin B. Flores, Heather A. Harrington, Helen M. Byrne
    Abstract Angiogenesis is the process by which blood vessels form from pre-existing vessels. It plays a key role in many biological processes, including embryonic development and wound healing, and contributes to many diseases including cancer and rheumatoid arthritis. The structure of the resulting vessel networks determines their ability to deliver nutrients and remove waste products from biological tissues. Here we simulate the Anderson-Chaplain model of angiogenesis at different parameter values and quantify the vessel architectures of the resulting synthetic data. Specifically, we propose a topological data analysis (TDA) pipeline for systematic analysis of the model. TDA is a vibrant and relatively new field of computational mathematics for studying the shape of data. We compute topological and standard descriptors of model simulations generated by different parameter values. We show that TDA of model simulation data stratifies parameter space into regions with similar vessel morphology. The methodologies proposed here are widely applicable to other synthetic and experimental data including wound healing, development, and plant biology.
  195. Musical Stylistic Analysis: A Study of Intervallic Transition Graphs via Persistent Homology (2022)

    Martín Mijangos, Alessandro Bravetti, Pablo Padilla
    Abstract Topological data analysis has been recently applied to investigate stylistic signatures and trends in musical compositions. A useful tool in this area is Persistent Homology. In this paper, we develop a novel method to represent a weighted directed graph as a finite metric space and then use persistent homology to extract useful features. We apply this method to weighted directed graphs obtained from pitch transitions information of a given musical fragment and use these techniques to the study of stylistic trends. In particular, we are interested in using these tools to make quantitative stylistic comparisons. As a first illustration, we analyze a selection of string quartets by Haydn, Mozart and Beethoven and discuss possible implications of our results in terms of different approaches by these composers to stylistic exploration and variety. We observe that Haydn is stylistically the most conservative, followed by Mozart, while Beethoven is the most innovative, expanding and modifying the string quartet as a musical form. Finally we also compare the variability of different genres, namely minuets, allegros, prestos and adagios, by a given composer and conclude that the minuet is the most stable form of the string quartet movements.
  196. Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis (2019)

    Lorin Crawford, Anthea Monod, Andrew X. Chen, Sayan Mukherjee, Raúl Rabadán
    Abstract Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study in the field of cancer biology. Its rapid progression and the relative time cost of obtaining molecular data make other readily available forms of data, such as images, an important resource for actionable measures in patients. Our goal is to use information given by medical images taken from GBM patients in statistical settings. To do this, we design a novel statistic—the smooth Euler characteristic transform (SECT)—that quantifies magnetic resonance images of tumors. Due to its well-defined inner product structure, the SECT can be used in a wider range of functional and nonparametric modeling approaches than other previously proposed topological summary statistics. When applied to a cohort of GBM patients, we find that the SECT is a better predictor of clinical outcomes than both existing tumor shape quantifications and common molecular assays. Specifically, we demonstrate that SECT features alone explain more of the variance in GBM patient survival than gene expression, volumetric features, and morphometric features. The main takeaways from our findings are thus 2-fold. First, they suggest that images contain valuable information that can play an important role in clinical prognosis and other medical decisions. Second, they show that the SECT is a viable tool for the broader study of medical imaging informatics. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
  197. Topological Autoencoders (2020)

    Michael Moor, Max Horn, Bastian Rieck, Karsten Borgwardt
    Abstract We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors.
  198. Cubical Ripser: Software for Computing Persistent Homology of Image and Volume Data (2020)

    Shizuo Kaji, Takeki Sudo, Kazushi Ahara
    Abstract We introduce Cubical Ripser for computing persistent homology of image and volume data. To our best knowledge, Cubical Ripser is currently the fastest and the most memory-efficient program for computing persistent homology of image and volume data. We demonstrate our software with an example of image analysis in which persistent homology and convolutional neural networks are successfully combined. Our open source implementation is available at [14].
  199. Rule Generation for Classifying SLT Failed Parts (2022)

    Ho-Chieh Hsu, Cheng-Che Lu, Shih-Wei Wang, Kelly Jones, Kai-Chiang Wu, Mango C.-T. Chao
    Abstract System-level test (SLT) has recently gained visibility when integrated circuits become harder and harder to be fully tested due to increasing transistor density and circuit design complexity. Albeit SLT is effective for reducing test escapes, little diagnostic information can be obtained for product improvement. In this paper, we propose an unsupervised learning (UL) method to resolve the aforementioned issue by discovering correlative, potentially systematic defects during the SLT phase. Toward this end, HDBSCAN [1] is used for clustering SLT failed devices in a low-dimensional space created by UMAP [2]. Decision trees are subsequently applied to explain the HDBSCAN results based on generating explainable quantitative rules, e.g., inequality constraints, providing domain experts additional information for advanced diagnosis. Experiments on industrial data demonstrate that the proposed methodology can effectively cluster SLT failed devices and then explain the clustering results with a promising accuracy of above 90%. Our methodology is also scalable and fast, requiring two to five orders of magnitude lower runtime than the method presented in [3].
  200. Persistent Homology in Cosmic Shear: Constraining Parameters With Topological Data Analysis (2021)

    Sven Heydenreich, Benjamin Brück, Joachim Harnois-Déraps
    Abstract In recent years, cosmic shear has emerged as a powerful tool for studying the statistical distribution of matter in our Universe. Apart from the standard two-point correlation functions, several alternative methods such as peak count statistics offer competitive results. Here we show that persistent homology, a tool from topological data analysis, can extract more cosmological information than previous methods from the same data set. For this, we use persistent Betti numbers to efficiently summarise the full topological structure of weak lensing aperture mass maps. This method can be seen as an extension of the peak count statistics, in which we additionally capture information about the environment surrounding the maxima. We first demonstrate the performance in a mock analysis of the KiDS+VIKING-450 data: We extract the Betti functions from a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations and use these to train a Gaussian process emulator that provides rapid model predictions; we next run a Markov chain Monte Carlo analysis on independent mock data to infer the cosmological parameters and their uncertainties. When comparing our results, we recover the input cosmology and achieve a constraining power on that is 3% tighter than that on peak count statistics. Performing the same analysis on 100 deg\textlesssup\textgreater2\textlesssup/\textgreater of \textlessi\textgreaterEuclid\textlessi/\textgreater-like simulations, we are able to improve the constraints on \textlessi\textgreaterS\textlessi/\textgreater\textlesssub\textgreater8\textlesssub/\textgreater and Ω\textlesssub\textgreaterm\textlesssub/\textgreater by 19% and 12%, respectively, while breaking some of the degeneracy between \textlessi\textgreaterS\textlessi/\textgreater\textlesssub\textgreater8\textlesssub/\textgreater and the dark energy equation of state. To our knowledge, the methods presented here are the most powerful topological tools for constraining cosmological parameters with lensing data.
  201. Statistical Topological Data Analysis - A Kernel Perspective (2015)

    Roland Kwitt, Stefan Huber, Marc Niethammer, Weili Lin, Ulrich Bauer
    Abstract We consider the problem of statistical computations with persistence diagrams, a summary representation of topological features in data. These diagrams encode persistent homology, a widely used invariant in topological data analysis. While several avenues towards a statistical treatment of the diagrams have been explored recently, we follow an alternative route that is motivated by the success of methods based on the embedding of probability measures into reproducing kernel Hilbert spaces. In fact, a positive definite kernel on persistence diagrams has recently been proposed, connecting persistent homology to popular kernel-based learning techniques such as support vector machines. However, important properties of that kernel enabling a principled use in the context of probability measure embeddings remain to be explored. Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data.
  202. Euler Characteristic Surfaces (2021)

    Gabriele Beltramo, Rayna Andreeva, Ylenia Giarratano, Miguel O. Bernabeu, Rik Sarkar, Primoz Skraba
    Abstract We study the use of the Euler characteristic for multiparameter topological data analysis. Euler characteristic is a classical, well-understood topological invariant that has appeared in numerous applications, including in the context of random fields. The goal of this paper is to present the extension of using the Euler characteristic in higher-dimensional parameter spaces. While topological data analysis of higher-dimensional parameter spaces using stronger invariants such as homology continues to be the subject of intense research, Euler characteristic is more manageable theoretically and computationally, and this analysis can be seen as an important intermediary step in multi-parameter topological data analysis. We show the usefulness of the techniques using artificially generated examples, and a real-world application of detecting diabetic retinopathy in retinal images.
  203. Skeletonization and Partitioning of Digital Images Using Discrete Morse Theory (2015)

    Olaf Delgado-Friedrichs, Vanessa Robins, Adrian Sheppard
    Abstract We show how discrete Morse theory provides a rigorous and unifying foundation for defining skeletons and partitions of grayscale digital images. We model a grayscale image as a cubical complex with a real-valued function defined on its vertices (the voxel values). This function is extended to a discrete gradient vector field using the algorithm presented in Robins, Wood, Sheppard TPAMI 33:1646 (2011). In the current paper we define basins (the building blocks of a partition) and segments of the skeleton using the stable and unstable sets associated with critical cells. The natural connection between Morse theory and homology allows us to prove the topological validity of these constructions; for example, that the skeleton is homotopic to the initial object. We simplify the basins and skeletons via Morse-theoretic cancellation of critical cells in the discrete gradient vector field using a strategy informed by persistent homology. Simple working Python code for our algorithms for efficient vector field traversal is included. Example data are taken from micro-CT images of porous materials, an application area where accurate topological models of pore connectivity are vital for fluid-flow modelling.
  204. Spatial Embedding Imposes Constraints on Neuronal Network Architectures (2018)

    Jennifer Stiso, Danielle S. Bassett
    Abstract Recent progress towards understanding circuit function has capitalized on tools from network science to parsimoniously describe the spatiotemporal architecture of neural systems. Such tools often address systems topology divorced from its physical instantiation. Nevertheless, for embedded systems such as the brain, physical laws directly constrain the processes of network growth, development, and function. We review here the rules imposed by the space and volume of the brain on the development of neuronal networks, and show that these rules give rise to a specific set of complex topologies. These rules also affect the repertoire of neural dynamics that can emerge from the system, and thereby inform our understanding of network dysfunction in disease. We close by discussing new tools and models to delineate the effects of spatial embedding.
  205. Persistent Homology Analysis of Ion Aggregations and Hydrogen-Bonding Networks (2018)

    Kelin Xia
    Abstract Despite the great advancement of experimental tools and theoretical models, a quantitative characterization of the microscopic structures of ion aggregates and their associated water hydrogen-bonding networks still remains a challenging problem. In this paper, a newly-invented mathematical method called persistent homology is introduced, for the first time, to quantitatively analyze the intrinsic topological properties of ion aggregation systems and hydrogen-bonding networks. The two most distinguishable properties of persistent homology analysis of assembly systems are as follows. First, it does not require a predefined bond length to construct the ion or hydrogen-bonding network. Persistent homology results are determined by the morphological structure of the data only. Second, it can directly measure the size of circles or holes in ion aggregates and hydrogen-bonding networks. To validate our model, we consider two well-studied systems, i.e., NaCl and KSCN solutions, generated from molecular dynamics simulations. They are believed to represent two morphological types of aggregation, i.e., local clusters and extended ion networks. It has been found that the two aggregation types have distinguishable topological features and can be characterized by our topological model very well. Further, we construct two types of networks, i.e., O-networks and H2O-networks, for analyzing the topological properties of hydrogen-bonding networks. It is found that for both models, KSCN systems demonstrate much more dramatic variations in their local circle structures with a concentration increase. A consistent increase of large-sized local circle structures is observed and the sizes of these circles become more and more diverse. In contrast, NaCl systems show no obvious increase of large-sized circles. Instead a consistent decline of the average size of the circle structures is observed and the sizes of these circles become more and more uniform with a concentration increase. As far as we know, these unique intrinsic topological features in ion aggregation systems have never been pointed out before. More importantly, our models can be directly used to quantitatively analyze the intrinsic topological invariants, including circles, loops, holes, and cavities, of any network-like structures, such as nanomaterials, colloidal systems, biomolecular assemblies, among others. These topological invariants cannot be described by traditional graph and network models.
  206. Rootstock Effects on Scion Phenotypes in a ‘Chambourcin’ Experimental Vineyard (2019)

    Zoë Migicovsky, Zachary N Harris, Laura L Klein, Mao Li, Adam McDermaid, Daniel H Chitwood, Anne Fennell, Laszlo G Kovacs, Misha Kwasniewski, Jason P Londo, Qin Ma, Allison J Miller
    Abstract Understanding how root systems modulate shoot system phenotypes is a fundamental question in plant biology and will be useful in developing resilient agricultural crops. Grafting is a common horticultural practice that joins the roots (rootstock) of one plant to the shoot (scion) of another, providing an excellent method for investigating how these two organ systems affect each other. In this study, we used the French-American hybrid grapevine ‘Chambourcin’ (Vitis L.) as a model to explore the rootstock–scion relationship. We examined leaf shape, ion concentrations, and gene expression in ‘Chambourcin’ grown ungrafted as well as grafted to three different rootstocks (‘SO4’, ‘1103P’ and ‘3309C’) across 2 years and three different irrigation treatments. We found that a significant amount of the variation in leaf shape could be explained by the interaction between rootstock and irrigation. For ion concentrations, the primary source of variation identified was the position of a leaf in a shoot, although rootstock and rootstock by irrigation interaction also explained a significant amount of variation for most ions. Lastly, we found rootstock-specific patterns of gene expression in grafted plants when compared to ungrafted vines. Thus, our work reveals the subtle and complex effect of grafting on ‘Chambourcin’ leaf morphology, ionomics, and gene expression.
  207. Genomics Data Analysis via Spectral Shape and Topology (2022)

    Erik J. Amézquita, Farzana Nasrin, Kathleen M. Storey, Masato Yoshizawa
    Abstract Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper and differential gene expression. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-SNE. Although Mapper shows promise in analyzing high-dimensional data, building tools to statistically analyze Mapper graphical structures is limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.
  208. Chatter Classification in Turning Using Machine Learning and Topological Data Analysis (2018)

    Firas A. Khasawneh, Elizabeth Munch, Jose A. Perea
    Abstract Chatter identification and detection in machining processes has been an active area of research in the past two decades. Part of the challenge in studying chatter is that machining equations that describe its occurrence are often nonlinear delay differential equations. The majority of the available tools for chatter identification rely on defining a metric that captures the characteristics of chatter, and a threshold that signals its occurrence. The difficulty in choosing these parameters can be somewhat alleviated by utilizing machine learning techniques. However, even with a successful classification algorithm, the transferability of typical machine learning methods from one data set to another remains very limited. In this paper we combine supervised machine learning with Topological Data Analysis (TDA) to obtain a descriptor of the process which can detect chatter. The features we use are derived from the persistence diagram of an attractor reconstructed from the time series via Takens embedding. We test the approach using deterministic and stochastic turning models, where the stochasticity is introduced via the cutting coefficient term. Our results show a 97% successful classification rate on the deterministic model labeled by the stability diagram obtained using the spectral element method. The features gleaned from the deterministic model are then utilized for characterization of chatter in a stochastic turning model where there are very limited analysis methods.
  209. Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov
    Abstract Machine learning has emerged as an attractive alternative to experiments and simulations for predicting material properties. Usually, such an approach relies on specific domain knowledge for feature design: each learning target requires careful selection of features that an expert recognizes as important for the specific task. The major drawback of this approach is that computation of only a few structural features has been implemented so far, and it is difficult to tell a priori which features are important for a particular application. The latter problem has been empirically observed for predictors of guest uptake in nanoporous materials: local and global porosity features become dominant descriptors at low and high pressures, respectively. We investigate a feature representation of materials using tools from topological data analysis. Specifically, we use persistent homology to describe the geometry of nanoporous materials at various scales. We combine our topological descriptor with traditional structural features and investigate the relative importance of each to the prediction tasks. We demonstrate an application of this feature representation by predicting methane adsorption in zeolites, for pressures in the range of 1-200 bar. Our results not only show a considerable improvement compared to the baseline, but they also highlight that topological features capture information complementary to the structural features: this is especially important for the adsorption at low pressure, a task particularly difficult for the traditional features. Furthermore, by investigation of the importance of individual topological features in the adsorption model, we are able to pinpoint the location of the pores that correlate best to adsorption at different pressure, contributing to our atom-level understanding of structure-property relationships.
  210. Some Applications of TDA on Financial Markets (2022)

    Miguel Angel Ruiz-Ortiz, José Carlos Gómez-Larrañaga, Jesús Rodríguez-Viorato
    Abstract The Topological Data Analysis (TDA) has had many applications. However, financial markets has been studied slightly through TDA. Here we present a quick review of some recent applications of TDA on financial markets and propose a new turbulence index based on persistent homology -- the fundamental tool for TDA -- that seems to capture critical transitions on financial data, based on our experiment with SP500 data before 2020 stock market crash in February 20, 2020, due to the COVID-19 pandemic. We review applications in the early detection of turbulence periods in financial markets and how TDA can help to get new insights while investing and obtain superior risk-adjusted returns compared with investing strategies using classical turbulence indices as VIX and the Chow's index based on the Mahalanobis distance. Furthermore, we include an introduction to persistent homology so the reader could be able to understand this paper without knowing TDA.
  211. Persistent Homology for Breast Tumor Classification Using Mammogram Scans (2022)

    Aras Asaad, Dashti Ali, Taban Majeed, Rasber Rashid
    Abstract An Important tool in the field topological data analysis is known as persistent Homology (PH) which is used to encode abstract representation of the homology of data at different resolutions in the form of persistence diagram (PD). In this work we build more than one PD representation of a single image based on a landmark selection method, known as local binary patterns, that encode different types of local textures from images. We employed different PD vectorizations using persistence landscapes, persistence images, persistence binning (Betti Curve) and statistics. We tested the effectiveness of proposed landmark based PH on two publicly available breast abnormality detection datasets using mammogram scans. Sensitivity of landmark based PH obtained is over 90% in both datasets for the detection of abnormal breast scans. Finally, experimental results give new insights on using different types of PD vectorizations which help in utilising PH in conjunction with machine learning classifiers.
  212. A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks (2022)

    Bouchaib Azamir, Driss Bennis, Bertrand Michel
    Abstract Topological data analysis has recently been applied to the study of dynamic networks. In this context, an algorithm was introduced and helps, among other things, to detect early warning signals of abnormal changes in the dynamic network under study. However, the complexity of this algorithm increases significantly once the database studied grows. In this paper, we propose a simplification of the algorithm without affecting its performance. We give various applications and simulations of the new algorithm on some weighted networks. The obtained results show clearly the efficiency of the introduced approach. Moreover, in some cases, the proposed algorithm makes it possible to highlight local information and sometimes early warning signals of local abnormal changes.
  213. Cooperative Grasping Through Topological Object Representation (2014)

    A. Marzinotto, J. A. Stork, D. V. Dimarogonas, D. Kragic
    Abstract We present a cooperative grasping approach based on a topological representation of objects. Using point cloud data we extract loops on objects suitable for generating entanglement. We use the Gauss Linking Integral to derive controllers for multi-agent systems that generate hooking grasps on such loops while minimizing the entanglement between robots. The approach copes well with noisy point cloud data, it is computationally simple and robust. We demonstrate the method for performing object grasping and transportation, through a hooking maneuver, with two coordinated NAO robots.
  214. Fibers of Failure: Classifying Errors in Predictive Processes (2020)

    Leo S. Carlsson, Mikael Vejdemo-Johansson, Gunnar Carlsson, Pär G. Jönsson
    Abstract Predictive models are used in many different fields of science and engineering and are always prone to make faulty predictions. These faulty predictions can be more or less malignant depending on the model application. We describe fibers of failure (FiFa), a method to classify failure modes of predictive processes. Our method uses Mapper, an algorithm from topological data analysis (TDA), to build a graphical model of input data stratified by prediction errors. We demonstrate two ways to use the failure mode groupings: either to produce a correction layer that adjusts predictions by similarity to the failure modes; or to inspect members of the failure modes to illustrate and investigate what characterizes each failure mode. We demonstrate FiFa on two scenarios: a convolutional neural network (CNN) predicting MNIST images with added noise, and an artificial neural network (ANN) predicting the electrical energy consumption of an electric arc furnace (EAF). The correction layer on the CNN model improved its prediction accuracy significantly while the inspection of failure modes for the EAF model provided guiding insights into the domain-specific reasons behind several high-error regions.
  215. Tree Decomposition of Reeb Graphs, Parametrized Complexity, and Applications to Phylogenetics (2020)

    Anastasios Stefanou
    Abstract Inspired by the interval decomposition of persistence modules and the extended Newick format of phylogenetic networks, we show that, inside the larger category of partially ordered Reeb graphs, every Reeb graph with n leaves and first Betti number s, can be identified with a coproduct of at most \$\$2\textasciicircums\$\$2s partially ordered trees with \$\$(n + s)\$\$(n+s) leaves. Reeb graphs are therefore classified up to isomorphism by their tree-decomposition. An implication of this result, is that the isomorphism problem for Reeb graphs is fixed parameter tractable when the parameter is the first Betti number. We propose partially ordered Reeb graphs as a model for time consistent phylogenetic networks and propose a certain Hausdorff distance as a metric on these structures.
  216. Homological Scaffold via Minimal Homology Bases (2021)

    Marco Guerra, Alessandro De Gregorio, Ulderico Fugacci, Giovanni Petri, Francesco Vaccarino
    Abstract The homological scaffold leverages persistent homology to construct a topologically sound summary of a weighted network. However, its crucial dependency on the choice of representative cycles hinders the ability to trace back global features onto individual network components, unless one provides a principled way to make such a choice. In this paper, we apply recent advances in the computation of minimal homology bases to introduce a quasi-canonical version of the scaffold, called minimal, and employ it to analyze data both real and in silico. At the same time, we verify that, statistically, the standard scaffold is a good proxy of the minimal one for sufficiently complex networks.
  217. Path Homologies of Motifs and Temporal Network Representations (2022)

    Samir Chowdhury, Steve Huntsman, Matvey Yutin
    Abstract Path homology is a powerful method for attaching algebraic invariants to digraphs. While there have been growing theoretical developments on the algebro-topological framework surrounding path homology, bona fide applications to the study of complex networks have remained stagnant. We address this gap by presenting an algorithm for path homology that combines efficient pruning and indexing techniques and using it to topologically analyze a variety of real-world complex temporal networks. A crucial step in our analysis is the complete characterization of path homologies of certain families of small digraphs that appear as subgraphs in these complex networks. These families include all digraphs, directed acyclic graphs, and undirected graphs up to certain numbers of vertices, as well as some specially constructed cases. Using information from this analysis, we identify small digraphs contributing to path homology in dimension two for three temporal networks in an aggregated representation and relate these digraphs to network behavior. We then investigate alternative temporal network representations and identify complementary subgraphs as well as behavior that is preserved across representations. We conclude that path homology provides insight into temporal network structure, and in turn, emergent structures in temporal networks provide us with new subgraphs having interesting path homology.
  218. Determining Clinically Relevant Features in Cytometry Data Using Persistent Homology (2022)

    Soham Mukherjee, Darren Wethington, Tamal K. Dey, Jayajit Das
    Abstract Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as ‘elbows’. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.

    Community Resources

  219. Uncovering Precision Phenotype-Biomarker Associations in Traumatic Brain Injury Using Topological Data Analysis (2017)

    Jessica L. Nielson, Shelly R. Cooper, John K. Yue, Marco D. Sorani, Tomoo Inoue, Esther L. Yuh, Pratik Mukherjee, Tanya C. Petrossian, Jesse Paquette, Pek Y. Lum, Gunnar E. Carlsson, Mary J. Vassar, Hester F. Lingsma, Wayne A. Gordon, Alex B. Valadka, David O. Okonkwo, Geoffrey T. Manley, Adam R. Ferguson, Track-Tbi Investigators
    Abstract Background Traumatic brain injury (TBI) is a complex disorder that is traditionally stratified based on clinical signs and symptoms. Recent imaging and molecular biomarker innovations provide unprecedented opportunities for improved TBI precision medicine, incorporating patho-anatomical and molecular mechanisms. Complete integration of these diverse data for TBI diagnosis and patient stratification remains an unmet challenge. Methods and findings The Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Pilot multicenter study enrolled 586 acute TBI patients and collected diverse common data elements (TBI-CDEs) across the study population, including imaging, genetics, and clinical outcomes. We then applied topology-based data-driven discovery to identify natural subgroups of patients, based on the TBI-CDEs collected. Our hypothesis was two-fold: 1) A machine learning tool known as topological data analysis (TDA) would reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery, and 2) TDA-identified biomarkers would significantly predict patient outcome recovery after TBI using more traditional methods of univariate statistical tests. TDA algorithms organized and mapped the data of TBI patients in multidimensional space, identifying a subset of mild TBI patients with a specific multivariate phenotype associated with unfavorable outcome at 3 and 6 months after injury. Further analyses revealed that this patient subset had high rates of post-traumatic stress disorder (PTSD), and enrichment in several distinct genetic polymorphisms associated with cellular responses to stress and DNA damage (PARP1), and in striatal dopamine processing (ANKK1, COMT, DRD2). Conclusions TDA identified a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms. Machine learning methods such as TDA may provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials in TBI patients. Trial Registration ClinicalTrials.gov Identifier NCT01565551
  220. Topological Data Analysis and Diagnostics of Compressible Magnetohydrodynamic Turbulence (2018)

    Irina Makarenko, Paul Bushby, Andrew Fletcher, Robin Henderson, Nikolay Makarenko, Anvar Shukurov
    Abstract The predictions of mean-field electrodynamics can now be probed using direct numerical simulations of random flows and magnetic fields. When modelling astrophysical magnetohydrodynamics, it is important to verify that such simulations are in agreement with observations. One of the main challenges in this area is to identify robust quantitative measures to compare structures found in simulations with those inferred from astrophysical observations. A similar challenge is to compare quantitatively results from different simulations. Topological data analysis offers a range of techniques, including the Betti numbers and persistence diagrams, that can be used to facilitate such a comparison. After describing these tools, we first apply them to synthetic random fields and demonstrate that, when the data are standardized in a straightforward manner, some topological measures are insensitive to either large-scale trends or the resolution of the data. Focusing upon one particular astrophysical example, we apply topological data analysis to H i observations of the turbulent interstellar medium (ISM) in the Milky Way and to recent magnetohydrodynamic simulations of the random, strongly compressible ISM. We stress that these topological techniques are generic and could be applied to any complex, multi-dimensional random field.
  221. Persistent Homology on Grassmann Manifolds for Analysis of Hyperspectral Movies (2016)

    Sofya Chepushtanova, Michael Kirby, Chris Peterson, Lori Ziegelmeier
    Abstract The existence of characteristic structure, or shape, in complex data sets has been recognized as increasingly important for mathematical data analysis. This realization has motivated the development of new tools such as persistent homology for exploring topological invariants, or features, in large data sets. In this paper, we apply persistent homology to the characterization of gas plumes in time dependent sequences of hyperspectral cubes, i.e. the analysis of 4-way arrays. We investigate hyperspectral movies of Long-Wavelength Infrared data monitoring an experimental release of chemical simulant into the air. Our approach models regions of interest within the hyperspectral data cubes as points on the real Grassmann manifold Gk,ï źn whose points parameterize the k-dimensional subspaces of \$\$\mathbb \R\\textasciicircumn\$\$Rn, contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows a sequence of time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmann manifold. This motivates the search for topological features, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the Grassmann manifold. The proposed mathematical model affords the processing of large data sets while retaining valuable discriminatory information. In this paper, we discuss how embedding our data in the Grassmann manifold, together with topological data analysis, captures dynamical events that occur as the chemical plume is released and evolves.
  222. Sheaves Are the Canonical Data Structure for Sensor Integration (2017)

    Michael Robinson
    Abstract A sensor integration framework should be sufficiently general to accurately represent many sensor modalities, and also be able to summarize information in a faithful way that emphasizes important, actionable information. Few approaches adequately address these two discordant requirements. The purpose of this expository paper is to explain why sheaves are the canonical data structure for sensor integration and how the mathematics of sheaves satisfies our two requirements. We outline some of the powerful inferential tools that are not available to other representational frameworks.
  223. Hierarchical Clustering and Zeroth Persistent Homology (2020)

    İsmail Güzel, Atabey Kaygun
    Abstract In this article, we show that hierarchical clustering and the zeroth persistent homology do deliver the same topological information about a given data set. We show this fact using cophenetic matrices constructed out of the filtered Vietoris-Rips complex of the data set at hand. As in any cophenetic matrix, one can also display the inter-relations of zeroth homology classes via a rooted tree, also known as a dendogram. Since homological cophenetic matrices can be calculated for higher homologies, one can also sketch similar dendograms for higher persistent homology classes.
  224. Cosmic Web Reconstruction Through Density Ridges: Method and Algorithm (2015)

    Yen-Chi Chen, Shirley Ho, Peter E. Freeman, Christopher R. Genovese, Larry Wasserman
    Abstract The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters that dictate the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the subspace constrained mean shift (SCMS) algorithm (Ozertem & Erdogmus 2011; Genovese et al. 2014) to uncover filamentary structure in galaxydata. The SCMS algorithm is a gradient ascent method that models filaments as density ridges, one-dimensional smooth curves that trace high-density regions within the point cloud. We also demonstrate how augmenting the SCMS algorithm with bootstrap-based methods of uncertainty estimation allows one to place uncertainty bands around putative filaments. We apply the SCMS first to the data set generated from the Voronoi model. The density ridges show strong agreement with the filaments from Voronoi method. We then apply the SCMS method data sets sampled from a P3M N-body simulation, with galaxy number densities consistent with SDSS and WFIRST-AFTA, and to LOWZ and CMASS data from the Baryon Oscillation Spectroscopic Survey (BOSS). To further assess the efficacy of SCMS, we compare the relative locations of BOSS filaments with galaxy clusters in the redMaPPer catalogue, and find that redMaPPer clusters are significantly closer (with p-values \textless10−9) to SCMS-detected filaments than to randomly selected galaxies.
  225. Time-Inhomogeneous Diffusion Geometry and Topology (2022)

    Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy
    Abstract Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
  226. Topological Data Analysis for Electric Motor Eccentricity Fault Detection (2022)

    Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru
    Abstract In this paper, we develop topological data analysis (TDA) method for motor current signature analysis (MCSA), and apply it to induction motor eccentricity fault detection. We introduce TDA and present the procedure of extracting topological features from time-domain data that will be represented using persistence diagrams and vectorized Betti sequences. The procedure is applied to induction machine phase current signal analysis, and shown to be highly effective in differentiating signals from different eccentricity levels. With TDA, we are able to use a simple regression model that can predict the fault levels with reasonable accuracy, even for the data of eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of fault detection applications.
  227. Topological Data Analysis of Spatial Patterning in Heterogeneous Cell Populations: Clustering and Sorting With Varying Cell-Cell Adhesion (2023)

    Dhananjay Bhaskar, William Y. Zhang, Alexandria Volkening, Björn Sandstede, Ian Y. Wong
    Abstract Different cell types aggregate and sort into hierarchical architectures during the formation of animal tissues. The resulting spatial organization depends (in part) on the strength of adhesion of one cell type to itself relative to other cell types. However, automated and unsupervised classification of these multicellular spatial patterns remains challenging, particularly given their structural diversity and biological variability. Recent developments based on topological data analysis are intriguing to reveal similarities in tissue architecture, but these methods remain computationally expensive. In this article, we show that multicellular patterns organized from two interacting cell types can be efficiently represented through persistence images. Our optimized combination of dimensionality reduction via autoencoders, combined with hierarchical clustering, achieved high classification accuracy for simulations with constant cell numbers. We further demonstrate that persistence images can be normalized to improve classification for simulations with varying cell numbers due to proliferation. Finally, we systematically consider the importance of incorporating different topological features as well as information about each cell type to improve classification accuracy. We envision that topological machine learning based on persistence images will enable versatile and robust classification of complex tissue architectures that occur in development and disease.
  228. Multivariate Data Analysis Using Persistence-Based Filtering and Topological Signatures (2012)

    B. Rieck, H. Mara, H. Leitte
    Abstract The extraction of significant structures in arbitrary high-dimensional data sets is a challenging task. Moreover, classifying data points as noise in order to reduce a data set bears special relevance for many application domains. Standard methods such as clustering serve to reduce problem complexity by providing the user with classes of similar entities. However, they usually do not highlight relations between different entities and require a stopping criterion, e.g. the number of clusters to be detected. In this paper, we present a visualization pipeline based on recent advancements in algebraic topology. More precisely, we employ methods from persistent homology that enable topological data analysis on high-dimensional data sets. Our pipeline inherently copes with noisy data and data sets of arbitrary dimensions. It extracts central structures of a data set in a hierarchical manner by using a persistence-based filtering algorithm that is theoretically well-founded. We furthermore introduce persistence rings, a novel visualization technique for a class of topological features-the persistence intervals-of large data sets. Persistence rings provide a unique topological signature of a data set, which helps in recognizing similarities. In addition, we provide interactive visualization techniques that assist the user in evaluating the parameter space of our method in order to extract relevant structures. We describe and evaluate our analysis pipeline by means of two very distinct classes of data sets: First, a class of synthetic data sets containing topological objects is employed to highlight the interaction capabilities of our method. Second, in order to affirm the utility of our technique, we analyse a class of high-dimensional real-world data sets arising from current research in cultural heritage.
  229. Visual Detection of Structural Changes in Time-Varying Graphs Using Persistent Homology (2018)

    Mustafa Hajij, Bei Wang, Carlos Scheidegger, Paul Rosen
    Abstract Topological data analysis is an emerging area in exploratory data analysis and data mining. Its main tool, persistent homology, has become a popular technique to study the structure of complex, high-dimensional data. In this paper, we propose a novel method using persistent homology to quantify structural changes in time-varying graphs. Specifically, we transform each instance of the time-varying graph into a metric space, extract topological features using persistent homology, and compare those features over time. We provide a visualization that assists in time-varying graph exploration and helps to identify patterns of behavior within the data. To validate our approach, we conduct several case studies on real-world datasets and show how our method can find cyclic patterns, deviations from those patterns, and one-time events in time-varying graphs. We also examine whether a persistence-based similarity measure satisfies a set of well-established, desirable properties for graph metrics.
  230. A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)

    Xiaoyu Zhang, Takanori Fujiwara, Senthil Chandrasegaran, Michael P. Brundage, Thurston Sexton, Alden Dima, Kwan-Liu Ma
    Abstract Analysis of large, high-dimensional, and heterogeneous datasets is challenging as no one technique is suitable for visualizing and clustering such data in order to make sense of the underlying information. For instance, heterogeneous logs detailing machine repair and maintenance in an organization often need to be analyzed to diagnose errors and identify abnormal patterns, formalize root-cause analyses, and plan preventive maintenance. Such real-world datasets are also beset by issues such as inconsistent and/or missing entries. To conduct an effective diagnosis, it is important to extract and understand patterns from the data with support from analytic algorithms (e.g., finding that certain kinds of machine complaints occur more in the summer) while involving the human-in-the-loop. To address these challenges, we adopt existing techniques for dimensionality reduction (DR) and clustering of numerical, categorical, and text data dimensions, and introduce a visual analytics approach that uses multiple coordinated views to connect DR + clustering results across each kind of the data dimension stated. To help analysts label the clusters, each clustering view is supplemented with techniques and visualizations that contrast a cluster of interest with the rest of the dataset. Our approach assists analysts to make sense of machine maintenance logs and their errors. Then the gained insights help them carry out preventive maintenance. We illustrate and evaluate our approach through use cases and expert studies respectively, and discuss generalization of the approach to other heterogeneous data.
  231. A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science (2023)

    Lander Ver Hoef, Henry Adams, Emily J. King, Imme Ebert-Uphoff
    Abstract Abstract Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code. Significance Statement Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.
  232. A Classification of Topological Discrepancies in Additive Manufacturing (2019)

    Morad Behandish, Amir M. Mirzendehdel, Saigopal Nelaturi
    Abstract Additive manufacturing (AM) enables enormous freedom for design of complex structures. However, the process-dependent limitations that result in discrepancies between as-designed and as-manufactured shapes are not fully understood. The tradeoffs between infinitely many different ways to approximate a design by a manufacturable replica are even harder to characterize. To support design for AM (DfAM), one has to quantify local discrepancies introduced by AM processes, identify the detrimental deviations (if any) to the original design intent, and prescribe modifications to the design and/or process parameters to countervail their effects. Our focus in this work will be on topological analysis. There is ample evidence in many applications that preserving local topology (e.g., connectivity of beams in a lattice) is important even when slight geometric deviations can be tolerated. We first present a generic method to characterize local topological discrepancies due to material under-and over-deposition in AM, and show how it captures various types of defects in the as-manufactured structures. We use this information to systematically modify the as-manufactured outcomes within the limitations of available 3D printer resolution(s), which often comes at the expense of introducing more geometric deviations (e.g., thickening a beam to avoid disconnection). We validate the effectiveness of the method on 3D examples with nontrivial topologies such as lattice structures and foams.
  233. Topological Differential Testing (2020)

    Kristopher Ambrose, Steve Huntsman, Michael Robinson, Matvey Yutin
    Abstract We introduce topological differential testing (TDT), an approach to extracting the consensus behavior of a set of programs on a corpus of inputs. TDT uses the topological notion of a simplicial complex (and implicitly draws on richer topological notions such as sheaves and persistence) to determine inputs that cause inconsistent behavior and in turn reveal \emph\de facto\ input specifications. We gently introduce TDT with a toy example before detailing its application to understanding the PDF file format from the behavior of various parsers. Finally, we discuss theoretical details and other possible applications.
  234. Geometry and Topology of the Space of Sonar Target Echos (2018)

    Michael Robinson, Sean Fennell, Brian DiZio, Jennifer Dumiak
    Abstract Successful synthetic aperture sonar target classification depends on the “shape” of the scatterers within a target signature. This article presents a workflow that computes a target-to-target distance from persistence diagrams, since the “shape” of a signature informs its persistence diagram in a structure-preserving way. The target-to-target distances derived from persistence diagrams compare favorably against those derived from spectral features and have the advantage of being substantially more compact. While spectral features produce clusters associated to each target type that are reasonably dense and well formed, the clusters are not well-separated from one another. In rather dramatic contrast, a distance derived from persistence diagrams results in highly separated clusters at the expense of some misclassification of outliers.
  235. Analyzing Collective Motion With Machine Learning and Topology (2019)

    Dhananjay Bhaskar, Angelika Manhart, Jesse Milzman, John T. Nardini, Kathleen M. Storey, Chad M. Topaz, Lori Ziegelmeier
    Abstract We use topological data analysis and machine learning to study a seminal model of collective motion in biology [M. R. D’Orsogna et al., Phys. Rev. Lett. 96, 104302 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based on topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.
  236. WDR76 Co-Localizes With Heterochromatin Related Proteins and Rapidly Responds to DNA Damage (2016)

    Joshua M. Gilmore, Mihaela E. Sardiu, Brad D. Groppe, Janet L. Thornton, Xingyu Liu, Gerald Dayebgadoh, Charles A. Banks, Brian D. Slaughter, Jay R. Unruh, Jerry L. Workman, Laurence Florens, Michael P. Washburn
    Abstract Proteins that respond to DNA damage play critical roles in normal and diseased states in human biology. Studies have suggested that the S. cerevisiae protein CMR1/YDL156w is associated with histones and is possibly associated with DNA repair and replication processes. Through a quantitative proteomic analysis of affinity purifications here we show that the human homologue of this protein, WDR76, shares multiple protein associations with the histones H2A, H2B, and H4. Furthermore, our quantitative proteomic analysis of WDR76 associated proteins demonstrated links to proteins in the DNA damage response like PARP1 and XRCC5 and heterochromatin related proteins like CBX1, CBX3, and CBX5. Co-immunoprecipitation studies validated these interactions. Next, quantitative imaging studies demonstrated that WDR76 was recruited to laser induced DNA damage immediately after induction, and we compared the recruitment of WDR76 to laser induced DNA damage to known DNA damage proteins like PARP1, XRCC5, and RPA1. In addition, WDR76 co-localizes to puncta with the heterochromatin proteins CBX1 and CBX5, which are also recruited to DNA damage but much less intensely than WDR76. This work demonstrates the chromatin and DNA damage protein associations of WDR76 and demonstrates the rapid response of WDR76 to laser induced DNA damage.
  237. Topology of Viral Evolution (2013)

    Joseph Minhow Chan, Gunnar Carlsson, Raul Rabadan
    Abstract The tree structure is currently the accepted paradigm to represent evolutionary relationships between organisms, species or other taxa. However, horizontal, or reticulate, genomic exchanges are pervasive in nature and confound characterization of phylogenetic trees. Drawing from algebraic topology, we present a unique evolutionary framework that comprehensively captures both clonal and reticulate evolution. We show that whereas clonal evolution can be summarized as a tree, reticulate evolution exhibits nontrivial topology of dimension greater than zero. Our method effectively characterizes clonal evolution, reassortment, and recombination in RNA viruses. Beyond detecting reticulate evolution, we succinctly recapitulate the history of complex genetic exchanges involving more than two parental strains, such as the triple reassortment of H7N9 avian influenza and the formation of circulating HIV-1 recombinants. In addition, we identify recurrent, large-scale patterns of reticulate evolution, including frequent PB2-PB1-PA-NP cosegregation during avian influenza reassortment. Finally, we bound the rate of reticulate events (i.e., 20 reassortments per year in avian influenza). Our method provides an evolutionary perspective that not only captures reticulate events precluding phylogeny, but also indicates the evolutionary scales where phylogenetic inference could be accurate.
  238. Knowledge Gaps in the Early Growth of Semantic Feature Networks (2018)

    Ann E. Sizemore, Elisabeth A. Karuza, Chad Giusti, Danielle S. Bassett
    Abstract Understanding language learning and more general knowledge acquisition requires the characterization of inherently qualitative structures. Recent work has applied network science to this task by creating semantic feature networks, in which words correspond to nodes and connections correspond to shared features, and then by characterizing the structure of strongly interrelated groups of words. However, the importance of sparse portions of the semantic network—knowledge gaps—remains unexplored. Using applied topology, we query the prevalence of knowledge gaps, which we propose manifest as cavities in the growing semantic feature network of toddlers. We detect topological cavities of multiple dimensions and find that, despite word order variation, the global organization remains similar. We also show that nodal network measures correlate with filling cavities better than basic lexical properties. Finally, we discuss the importance of semantic feature network topology in language learning and speculate that the progression through knowledge gaps may be a robust feature of knowledge acquisition.
  239. Raw Material Flow Optimization as a Capacitated Vehicle Routing Problem: A Visual Benchmarking Approach for Sustainable Manufacturing (2017)

    Michele Dassisti, Yasamin Eslami, Matin Mohaghegh
    Abstract Optimisation problem concerning material flows, to increase the efficiency while reducing relative resource consumption is one of the most pressing problems today. The focus point of this study is to propose a new visual benchmarking approach to select the best material-flow path from the depot to the production lines, referring to the well-known Capacitated Vehicle Routing Problem (CVRP). An example industrial case study is considered to this aim. Two different solution techniques were adopted (namely Mixed Integer Linear Programming and the Ant Colony Optimization) in searching optimal solutions to the CVRP. The visual benchmarking proposed, based on the persistent homology approach, allowed to support the comparison of the optimal solutions based on the entropy of the output in different scenarios. Finally, based on the non-standard measurements of Crossing Length Percentage (CLP), the visual benchmarking procedure makes it possible to find the most practical and applicable solution to CVRP by considering the visual attractiveness and the quality of the routes.
  240. Topological Extraction and Tracking of Defects in Crystal Structures (2011)

    Sebastian Grottel, Carlos A. Dietrich, João L. D. Comba, Thomas Ertl
    Abstract Interfaces between materials with different mechanical properties play an important role in technical applications. Nowadays molecular dynamics simulations are used to observe the behavior of such compound materials at the atomic level. Due to different atom crystal sizes, dislocations in the atom crystal structure occur once external forces are applied, and it has been observed that studying the change of thesedislocations can provide further understanding of macroscopic attributes like elasticity and plasticity. Standard visualization techniques such as the rendering of individual atoms work for 2D data or sectional views; however, visualizingdislocations in 3D using such methods usually fail due to occlusion and clutter. In this work we propose to extract and visualize the structure ofdislocations, which summarizes the commonly employed filtered atomistic renderings into a concise representation. The benefits of our approach are clearer images while retaining relevant data and easier visual tracking of topological changes over time.
  241. Uncovering the Topology of Time-Varying fMRI Data Using Cubical Persistence (2020)

    Bastian Rieck, Tristan Yates, Christian Bock, Karsten Borgwardt, Guy Wolf, Nicholas Turk-Browne, Smita Krishnaswamy
    Abstract Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and person-to-person variation in how information is represented in the brain. To address this challenge, we present a novel topological approach that encodes each time point in an fMRI data set as a persistence diagram of topological features, i.e. high-dimensional voids present in the data. This representation naturally does not rely on voxel-by-voxel correspondence and is robust to noise. We show that these time-varying persistence diagrams can be clustered to find meaningful groupings between participants, and that they are also useful in studying within-subject brain state trajectories of subjects performing a particular task. Here, we apply both clustering and trajectory analysis techniques to a group of participants watching the movie 'Partly Cloudy'. We observe significant differences in both brain state trajectories and overall topological activity between adults and children watching the same movie.
  242. Grasping Objects With Holes: A Topological Approach (2013)

    F. T. Pokorny, J. A. Stork, D. Kragic
    Abstract This work proposes a topologically inspired approach for generating robot grasps on objects with `holes'. Starting from a noisy point-cloud, we generate a simplicial representation of an object of interest and use a recently developed method for approximating shortest homology generators to identify graspable loops. To control the movement of the robot hand, a topologically motivated coordinate system is used in order to wrap the hand around such loops. Finally, another concept from topology - namely the Gauss linking integral - is adapted to serve as evidence for secure caging grasps after a grasp has been executed. We evaluate our approach in simulation on a Barrett hand using several target objects of different sizes and shapes and present an initial experiment with real sensor data.
  243. Extremal Event Graphs: A (Stable) Tool for Analyzing Noisy Time Series Data (2022)

    Robin Belton, Bree Cummins, Brittany Terese Fasy, Tomáš Gedeon
    Abstract Local maxima and minima, or extremal events, in experimental time series can be used as a coarse summary to characterize data. However, the discrete sampling in recording experimental measurements suggests uncertainty on the true timing of extrema during the experiment. This in turn gives uncertainty in the timing order of extrema within the time series. Motivated by applications in genomic time series and biological network analysis, we construct a weighted directed acyclic graph (DAG) called an extremal event DAG using techniques from persistent homology that is robust to measurement noise. Furthermore, we define a distance between extremal event DAGs based on the edit distance between strings. We prove several properties including local stability for the extremal event DAG distance with respect to pairwise \$L_\\infty\\$ distances between functions in the time series data. Lastly, we provide algorithms, publicly free software, and implementations on extremal event DAG construction and comparison.
  244. Multiphase Mixing Quantification by Computational Homology and Imaging Analysis (2011)

    Jianxin Xu, Hua Wang, Hui Fang
    Abstract The purpose of this study is to introduce a new technique for quantifying the efficiency of multiphase mixing. This technique based on algebraic topology is illustrated by using the hydraulic modeling of gas agitated reactors stirred by top lance gas injection and image analysis. The zeroth Betti numbers are used to estimate the numbers of pieces in the patterns, leading to a useful parameter to characterize the mixture homogeneity. The first Betti numbers are introduced to characterize the nonhomogeneity of the mixture. The mixing efficiency can be characterized by the Betti numbers for binary images of the patterns. This novel method may be applied for studying a variety of multiphase mixing problems in which multiphase components or tracers are visually distinguishable.
  245. Blind Swarms for Coverage in 2-D (2005)

    V. D. Silva, R. Ghrist, A. Muhammad
    Abstract We consider coverage problems in robot sensor networks with minimal sensing capabilities. In particular, we demonstrate that a “blind” swarm of robots with no localization and only a weak form of distance estimation can rigorously determine coverage in a bounded planar domain of unknown size and shape. The methods we introduce come from algebraic topology. I. COVERAGE PROBLEMS Many of the potential applications of robot swarms require information about coverage in a given domain. For example, using a swarm of robot sensors for surveillance and security applications carries with it the charge to maximize, or, preferably, guarantee coverage. Such applications include networks of security cameras, mine field sweeping via networked robots [18], and oceanographic sampling [4]. In these contexts, each robot has some coverage domain, and one wishes to know about the union of these coverage domains. Such problems are also crucial in applications not involving robots directly, e.g., communication networks. As a preliminary analysis, we consider the static “field” coverage problem, in which robots are assumed stationary and the goal is to verify blanket coverage of a given domain. There is a large literature on this subject; see, e.g., [7], [1], [16]. In addition, there are variants on these problems involving “barrier” coverage to separate regions. Dynamic or “sweeping” coverage [3] is a common and challenging task with applications ranging from security to vacuuming. Although a sensor network composed of robots will have dynamic capabilities, we restrict attention in this brief paper to the static case in order to lay the groundwork for future inquiry. There are two primary approaches to static coverage problems in the literature. The first uses computational geometry tools applied to exact node coordinates. This typically involves ‘ruler-and-compass’ style geometry [10] or Delaunay triangulations of the domain [16], [14], [20]. Such approaches are very rigid with regards to inputs: one must know exact node coordinates and one must know the geometry of the domain precisely to determine the Delaunay complex. To alleviate the former requirement, many authors have turned to probabilistic tools. For example, in [13], the author assumes a randomly and uniformly distributed collection of nodes in a domain with a fixed geometry and proves expected area coverage. Other approaches [15], [19] give percolationtype results about coverage and network integrity for randomly distributed nodes. The drawback of these methods is the need for strong assumptions about the exact shape of the domain, as well as the need for a uniform distribution of nodes. In the sensor networks community, there is a compelling interest (and corresponding burgeoning literature) in determining properties of a network in which the nodes do not possess coordinate data. One example of a coordinate-free approach is in [17], which gives a heuristic method for geographic routing without coordinate data: among the large literature arising from this paper, we note in particular the mathematical analysis of this approach in [11]. To our knowledge, noone has treated the coverage problem in a coordinate-free setting. In this note, we introduce a new set of tools for answering coverage problems in robotics and sensor networks with minimal assumptions about domain geometry and node localization. We provide a sufficiency criterion for coverage. We do not answer the problem of how the nodes should be placed in order to maximize coverage, nor the minimum number of such nodes necessary; neither do we address how to reallocate nodes to fill coverage holes.
  246. (Quasi)Periodicity Quantification in Video Data, Using Topology (2018)

    Christopher J. Tralie, Jose A. Perea
    Abstract This work introduces a novel framework for quantifying the presence and strength of recurrent dynamics in video data. Specifically, we provide continuous measures of periodicity (perfect repetition) and quasiperiodicity (superposition of periodic modes with noncommensurate periods), in a way which does not require segmentation, training, object tracking, or 1-dimensional surrogate signals. Our methodology operates directly on video data. The approach combines ideas from nonlinear time series analysis (delay embeddings) and computational topology (persistent homology) by translating the problem of finding recurrent dynamics in video data into the problem of determining the circularity or toroidality of an associated geometric space. Through extensive testing, we show the robustness of our scores with respect to several noise models/levels; we show that our periodicity score is superior to other methods when compared to human-generated periodicity rankings; and furthermore, we show that our quasiperiodicity score clearly indicates the presence of biphonation in videos of vibrating vocal folds, which has never before been accomplished quantitatively end to end.
  247. Persistent Homology of Geospatial Data: A Case Study With Voting (2021)

    Michelle Feng, Mason A. Porter
    Abstract A crucial step in the analysis of persistent homology is the transformation of data into an appropriate topological object (which, in our case, is a simplicial complex). Software packages for computing persistent homology typically construct Vietoris--Rips or other distance-based simplicial complexes on point clouds because they are relatively easy to compute. We investigate alternative methods of constructing simplicial complexes and the effects of making associated choices during simplicial-complex construction on the output of persistent-homology algorithms. We present two new methods for constructing simplicial complexes from two-dimensional geospatial data (such as maps). We apply these methods to a California precinct-level voting data set, and we thereby demonstrate that our new constructions can capture geometric characteristics that are missed by distance-based constructions. Our new constructions can thus yield more interpretable persistence modules and barcodes for geospatial data. In particular, they are able to distinguish short-persistence features that occur only for a narrow range of distance scales (e.g., voting patterns in densely populated cities) from short-persistence noise by incorporating information about other spatial relationships between regions.
  248. Novel Production Prediction Model of Gasoline Production Processes for Energy Saving and Economic Increasing Based on AM-GRU Integrating the UMAP Algorithm (2023)

    Jintao Liu, Liangchao Chen, Wei Xu, Mingfei Feng, Yongming Han, Tao Xia, Zhiqiang Geng
    Abstract Gasoline, as an extremely important petroleum product, is of great significance to ensure people's living standards and maintain national energy security. In the actual gasoline industrial production environment, the point information collected by industrial devices usually has the characteristics of high dimension, high noise and time series because of the instability of manual operation and equipment operation. Therefore, it is difficult to use the traditional method to predict and optimize gasoline production. In this paper, a novel production prediction model using an attention mechanism (AM) based gated recurrent unit (GRU) (AM-GRU) integrating the uniform manifold approximation and projection (UMAP) is proposed. The data collected in the industrial plant are processed by the box plot to remove the data outside the quartile. Then, the UMAP is used to remove the strong correlation between the data, which can improve the running speed and the performance of the AM-GRU. Compared with the existing time series data prediction method, the superiority of the AM-GRU is verified based on University of California Irvine (UCI) benchmark datasets. Finally, the production prediction model of actual complex gasoline production processes for energy saving and economic increasing based on the proposed method is built. The experiment results show that compared with other time series data prediction models, the proposed model has better stability and higher accuracy with reaching 0.4171, 0.9969, 0.2538 and 0.5038 in terms of the mean squared error, the average absolute accuracy, the mean squared error and the root mean square error. Moreover, according to the optimal scheme of the raw material, the inefficiency production points can be expected to increase about 0.69 tons of the gasoline yield and between about \$645.1 and \$925.6 of economic benefits of industrial production.
  249. Capturing Dynamics of Time-Varying Data via Topology (2020)

    Lu Xian, Henry Adams, Chad M. Topaz, Lori Ziegelmeier
    Abstract One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of time-varying metric spaces including vineyards [17], crocker plots [52], and multiparameter rank functions [34]. We then introduce a new tool to summarize time-varying metric spaces: a crocker stack. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable stability property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [54]. Altogether, we aim to bring the broader applied mathematics community up-to-date on topological summaries of time-varying metric spaces.
  250. Using Persistent Homology and Dynamical Distances to Analyze Protein Binding (2016)

    Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolić, Giseon Heo
    Abstract Persistent homology captures the evolution of topological features of a model as a parameter changes. The most commonly used summary statistics of persistent homology are the barcode and the persistence diagram. Another summary statistic, the persistence landscape, was recently introduced by Bubenik. It is a functional summary, so it is easy to calculate sample means and variances, and it is straightforward to construct various test statistics. Implementing a permutation test we detect conformational changes between closed and open forms of the maltose-binding protein, a large biomolecule consisting of 370 amino acid residues. Furthermore, persistence landscapes can be applied to machine learning methods. A hyperplane from a support vector machine shows the clear separation between the closed and open proteins conformations. Moreover, because our approach captures dynamical properties of the protein our results may help in identifying residues susceptible to ligand binding; we show that the majority of active site residues and allosteric pathway residues are located in the vicinity of the most persistent loop in the corresponding filtered Vietoris-Rips complex. This finding was not observed in the classical anisotropic network model.
  251. Transfer Learning for Autonomous Chatter Detection in Machining (2022)

    Melih C. Yesilli, Firas A. Khasawneh, Brian P. Mann
    Abstract Large-amplitude chatter vibrations are one of the most important phenomena in machining processes. It is often detrimental in cutting operations causing a poor surface finish and decreased tool life. Therefore, chatter detection using machine learning has been an active research area over the last decade. Three challenges can be identified in applying machine learning for chatter detection at large in industry: an insufficient understanding of the universality of chatter features across different processes, the need for automating feature extraction, and the existence of limited data for each specific workpiece-machine tool combination, e.g., when machining one-off products. These three challenges can be grouped under the umbrella of transfer learning, which is concerned with studying how knowledge gained from one setting can be leveraged to obtain information in new settings. This paper studies automating chatter detection by evaluating transfer learning of prominent as well as novel chatter detection methods. We investigate chatter classification accuracy using a variety of features extracted from turning and milling experiments with different cutting configurations. The studied methods include Fast Fourier Transform (FFT), Power Spectral Density (PSD), the Auto-correlation Function (ACF), and decomposition based tools such as Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We also examine more recent approaches based on Topological Data Analysis (TDA) and similarity measures of time series based on Discrete Time Warping (DTW). We evaluate transfer learning potential of each approach by training and testing both within and across the turning and milling data sets. Four supervised classification algorithms are explored: support vector machine (SVM), logistic regression, random forest classification, and gradient boosting. In addition to accuracy, we also comment on the automation potential of feature extraction for each approach which is integral to creating autonomous manufacturing centers. Our results show that carefully chosen time-frequency features can lead to high classification accuracies albeit at the cost of requiring manual pre-processing and the tagging of an expert user. On the other hand, we found that the TDA and DTW approaches can provide accuracies and F1-scores on par with the time-frequency methods without the need for manual preprocessing via completely automatic pipelines. Further, we discovered that the DTW approach outperforms all other methods when trained using the milling data and tested on the turning data. Therefore, TDA and DTW approaches may be preferred over the time-frequency-based approaches for fully automated chatter detection schemes. DTW and TDA also can be more advantageous when pooling data from either limited workpiece-machine tool combinations, or from small data sets of one-off processes.
  252. The Topology of the Cosmic Web in Terms of Persistent Betti Numbers (2017)

    Pratyush Pranav, Herbert Edelsbrunner, Rien van de Weygaert, Gert Vegter, Michael Kerber, Bernard J. T. Jones, Mathijs Wintraecken
    Abstract Abstract. We introduce a multiscale topological description of the Megaparsec web-like cosmic matter distribution. Betti numbers and topological persistence of
  253. PI-Net: A Deep Learning Approach to Extract Topological Persistence Images (2020)

    Anirudh Som, Hongjun Choi, Karthikeyan Natesan Ramamurthy, Matthew Buman, Pavan Turaga
    Abstract Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in real-world data, such as view-point, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difficulty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PINet respectively. To the best of our knowledge, we are the first to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PI-Net architectures on two applications: human activity recognition using tri-axial accelerometer sensor data and image classification. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.
  254. Topological Data Analysis for Arrhythmia Detection Through Modular Neural Networks (2020)

    Meryll Dindin, Yuhei Umeda, Frederic Chazal
    Abstract This paper presents an innovative and generic deep learning approach to monitor heart conditions from ECG signals. We focus our attention on both the detection and classification of abnormal heartbeats, known as arrhythmia. We strongly insist on generalization throughout the construction of a shallow deep-learning model that turns out to be effective for new unseen patient. The novelty of our approach relies on the use of topological data analysis to deal with individual differences. We show that our structure reaches the performances of the state-of-the-art methods for both arrhythmia detection and classification.
  255. Topologically Densified Distributions (2020)

    Christoph Hofer, Florian Graf, Marc Niethammer, Roland Kwitt
    Abstract We study regularization in the context of small sample-size learning with over-parametrized neural networks. Specifically, we shift focus from architectural properties, such as norms on the network weights, to properties of the internal representations before a linear classifier. Specifically, we impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances, i.e., a property beneficial for generalization. By leveraging previous work to impose topological constrains in a neural network setting, we provide empirical evidence (across various vision benchmarks) to support our claim for better generalization.
  256. Image-Based Phenotyping for Identification of QTL Determining Fruit Shape and Size in American Cranberry (Vaccinium Macrocarpon L.) (2018)

    Luis Diaz-Garcia, Giovanny Covarrubias-Pazaran, Brandon Schlautman, Edward Grygleski, Juan Zalapa
    Abstract Image-based phenotyping methodologies are powerful tools to determine quality parameters for fruit breeders and processors. The fruit size and shape of American cranberry (Vaccinium macrocarpon L.) are particularly important characteristics that determine the harvests’ processing value and potential end-use products (e.g., juice vs. sweetened dried cranberries). However, cranberry fruit size and shape attributes can be difficult and time consuming for breeders and processors to measure, especially when relying on manual measurements and visual ratings. Therefore, in this study, we implemented image-based phenotyping techniques for gathering data regarding basic cranberry fruit parameters such as length, width, length-to-width ratio, and eccentricity. Additionally, we applied a persistent homology algorithm to better characterize complex shape parameters. Using this high-throughput artificial vision approach, we characterized fruit from 351 progeny from a full-sib cranberry population over three field seasons. Using a covariate analysis to maximize the identification of well-supported quantitative trait loci (QTL), we found 252 single QTL in a 3-year period for cranberry fruit size and shape descriptors from which 20% were consistently found in all years. The present study highlights the potential for the identified QTL and the image-based methods to serve as a basis for future explorations of the genetic architecture of fruit size and shape in cranberry and other fruit crops.
  257. Classification of COVID-19 via Homology of CT-SCAN (2021)

    Sohail Iqbal, H. Fareed Ahmed, Talha Qaiser, Muhammad Imran Qureshi, Nasir Rajpoot
    Abstract In this worldwide spread of SARS-CoV-2 (COVID-19) infection, it is of utmost importance to detect the disease at an early stage especially in the hot spots of this epidemic. There are more than 110 Million infected cases on the globe, sofar. Due to its promptness and effective results computed tomography (CT)-scan image is preferred to the reverse-transcription polymerase chain reaction (RT-PCR). Early detection and isolation of the patient is the only possible way of controlling the spread of the disease. Automated analysis of CT-Scans can provide enormous support in this process. In this article, We propose a novel approach to detect SARS-CoV-2 using CT-scan images. Our method is based on a very intuitive and natural idea of analyzing shapes, an attempt to mimic a professional medic. We mainly trace SARS-CoV-2 features by quantifying their topological properties. We primarily use a tool called persistent homology, from Topological Data Analysis (TDA), to compute these topological properties. We train and test our model on the "SARS-CoV-2 CT-scan dataset" i̧tep\soares2020sars\, an open-source dataset, containing 2,481 CT-scans of normal and COVID-19 patients. Our model yielded an overall benchmark F1 score of \$99.42\% \$, accuracy \$99.416\%\$, precision \$99.41\%\$, and recall \$99.42\%\$. The TDA techniques have great potential that can be utilized for efficient and prompt detection of COVID-19. The immense potential of TDA may be exploited in clinics for rapid and safe detection of COVID-19 globally, in particular in the low and middle-income countries where RT-PCR labs and/or kits are in a serious crisis.
  258. Topological Data Analysis as a Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace (2018)

    Mao Li, Hong An, Ruthie Angelovici, Clement Bagaza, Albert Batushansky, Lynn Clark, Viktoriya Coneva, Michael J. Donoghue, Erika Edwards, Diego Fajardo, Hui Fang, Margaret H. Frank, Timothy Gallaher, Sarah Gebken, Theresa Hill, Shelley Jansky, Baljinder Kaur, Phillip C. Klahs, Laura L. Klein, Vasu Kuraparthy, Jason Londo, Zoë Migicovsky, Allison Miller, Rebekah Mohn, Sean Myles, Wagner C. Otoni, J. C. Pires, Edmond Rieffer, Sam Schmerler, Elizabeth Spriggs, Christopher N. Topp, Allen Van Deynze, Kuang Zhang, Linglong Zhu, Braden M. Zink, Daniel H. Chitwood