🍩 Database of Original & Non-Theoretical Uses of Topology

(found 452 matches in 0.05492s)
  1. ChainNet: Learning on Blockchain Graphs With Topological Features (2019)

    N. C. Abay, C. G. Akcora, Y. R. Gel, M. Kantarcioglu, U. D. Islambekov, Y. Tian, B. Thuraisingham
    Abstract The following topics are dealt with: learning (artificial intelligence); graph theory; neural nets; pattern classification; data mining; feature extraction; recommender systems; pattern clustering; social networking (online); optimisation.
  2. Statistical Inference for Persistent Homology Applied to Simulated fMRI Time Series Data (2023)

    Hassan Abdallah, Adam Regalski, Mohammad Behzad Kang, Maria Berishaj, Nkechi Nnadi, Asadur Chowdury, Vaibhav A. Diwadkar, Andrew Salch
    Abstract Time-series data are amongst the most widely-used in biomedical sciences, including domains such as functional Magnetic Resonance Imaging (fMRI). Structure within time series data can be captured by the tools of topological data analysis (TDA). Persistent homology is the mostly commonly used data-analytic tool in TDA, and can effectively summarize complex high-dimensional data into an interpretable 2-dimensional representation called a persistence diagram. Existing methods for statistical inference for persistent homology of data depend on an independence assumption being satisfied. While persistent homology can be computed for each time index in a time-series, time-series data often fail to satisfy the independence assumption. This paper develops a statistical test that obviates the independence assumption by implementing a multi-level block sampled Monte Carlo test with sets of persistence diagrams. Its efficacy for detecting task-dependent topological organization is then demonstrated on simulated fMRI data. This new statistical test is therefore suitable for analyzing persistent homology of fMRI data, and of non-independent data in general.
  3. Localization in the Crowd With Topological Constraints (2020)

    Shahira Abousamra, Minh Hoai, Dimitris Samaras, Chao Chen
    Abstract We address the problem of crowd localization, i.e., the prediction of dots corresponding to people in a crowded scene. Due to various challenges, a localization method is prone to spatial semantic errors, i.e., predicting multiple dots within a same person or collapsing multiple dots in a cluttered region. We propose a topological approach targeting these semantic errors. We introduce a topological constraint that teaches the model to reason about the spatial arrangement of dots. To enforce this constraint, we define a persistence loss based on the theory of persistent homology. The loss compares the topographic landscape of the likelihood map and the topology of the ground truth. Topological reasoning improves the quality of the localization algorithm especially near cluttered regions. On multiple public benchmarks, our method outperforms previous localization methods. Additionally, we demonstrate the potential of our method in improving the performance in the crowd counting task.
  4. Evasion Paths in Mobile Sensor Networks (2015)

    Henry Adams, Gunnar Carlsson
    Abstract Suppose that ball-shaped sensors wander in a bounded domain. A sensor does not know its location but does know when it overlaps a nearby sensor. We say that an evasion path exists in this sensor network if a moving intruder can avoid detection. In ‘Coordinate-free coverage in sensor networks with controlled boundaries via homology', Vin de Silva and Robert Ghrist give a necessary condition, depending only on the time-varying connectivity data of the sensors, for an evasion path to exist. Using zigzag persistent homology, we provide an equivalent condition that moreover can be computed in a streaming fashion. However, no method with time-varying connectivity data as input can give necessary and sufficient conditions for the existence of an evasion path. Indeed, we show that the existence of an evasion path depends not only on the fibrewise homotopy type of the region covered by sensors but also on its embedding in spacetime. For planar sensors that also measure weak rotation and distance information, we provide necessary and sufficient conditions for the existence of an evasion path.
  5. Persistence Images: A Stable Vector Representation of Persistent Homology (2017)

    Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, Lori Ziegelmeier
    Abstract Many data sets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a data set. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite-dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs.
  6. Graph Classification via Heat Diffusion on Simplicial Complexes (2020)

    Mehmet Emin Aktas, Esra Akbas
    Abstract In this paper, we study the graph classification problem in vertex-labeled graphs. Our main goal is to classify the graphs comparing their higher-order structures thanks to heat diffusion on their simplices. We first represent vertex-labeled graphs as simplex-weighted super-graphs. We then define the diffusion Frechet function over their simplices to encode the higher-order network topology and finally reach our goal by combining the function values with machine learning algorithms. Our experiments on real-world bioinformatics networks show that using diffusion Fr\éḩet function on simplices is promising in graph classification and more effective than the baseline methods. To the best of our knowledge, this paper is the first paper in the literature using heat diffusion on higher-dimensional simplices in a graph mining problem. We believe that our method can be extended to different graph mining domains, not only the graph classification problem.
  7. Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

    Mehmet Emin Aktas, Esra Akbas
    Abstract Due to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach.
  8. A Framework for Topological Music Analysis (TMA) (2022)

    Alberto Alcalá-Alvarez, Pablo Padilla-Longoria
    Abstract In the present article we describe and discuss a framework for applying different topological data analysis (TDA) techniques to a music fragment given as a score in traditional Western notation. We first consider different sets of points in Euclidean spaces of different dimensions that correspond to musical events in the score, and obtain their persistent homology features. Then we introduce two families of simplicial complexes that can be associated to chord sequences, and calculate their main homological descriptors. These complexes lead us to the definition of dynamical systems modeling harmonic progressions. Finally, we show the results of applying the described methods to the analysis and stylistic comparison of fragments from three Brandenburg Concertos by J.S. Bach and two Graffiti by Mexican composer Armando Luna.
  9. Topological Differential Testing (2020)

    Kristopher Ambrose, Steve Huntsman, Michael Robinson, Matvey Yutin
    Abstract We introduce topological differential testing (TDT), an approach to extracting the consensus behavior of a set of programs on a corpus of inputs. TDT uses the topological notion of a simplicial complex (and implicitly draws on richer topological notions such as sheaves and persistence) to determine inputs that cause inconsistent behavior and in turn reveal \emph\de facto\ input specifications. We gently introduce TDT with a toy example before detailing its application to understanding the PDF file format from the behavior of various parsers. Finally, we discuss theoretical details and other possible applications.
  10. Genomics Data Analysis via Spectral Shape and Topology (2022)

    Erik J. Amézquita, Farzana Nasrin, Kathleen M. Storey, Masato Yoshizawa
    Abstract Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper and differential gene expression. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-SNE. Although Mapper shows promise in analyzing high-dimensional data, building tools to statistically analyze Mapper graphical structures is limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.
  11. Measuring Hidden Phenotype: Quantifying the Shape of Barley Seeds Using the Euler Characteristic Transform (2021)

    Erik J. Amézquita, Michelle Y. Quigley, Tim Ophelders, Jacob B. Landis, Daniel Koenig, Elizabeth Munch, Daniel H. Chitwood
    Abstract Shape plays a fundamental role in biology. Traditional phenotypic analysis methods measure some features but fail to measure the information embedded in shape comprehensively. To extract, compare, and analyze this information embedded in a robust and concise way, we turn to Topological Data Analysis (TDA), specifically the Euler Characteristic Transform. TDA measures shape comprehensively using mathematical representations based on algebraic topology features. To study its use, we compute both traditional and topological shape descriptors to quantify the morphology of 3121 barley seeds scanned with X-ray Computed Tomography (CT) technology at 127 micron resolution. The Euler Characteristic Transform measures shape by analyzing topological features of an object at thresholds across a number of directional axes. A Kruskal-Wallis analysis of the information encoded by the topological signature reveals that the Euler Characteristic Transform picks up successfully the shape of the crease and bottom of the seeds. Moreover, while traditional shape descriptors can cluster the seeds based on their accession, topological shape descriptors can cluster them further based on their panicle. We then successfully train a support vector machine (SVM) to classify 28 different accessions of barley based exclusively on the shape of their grains. We observe that combining both traditional and topological descriptors classifies barley seeds better than using just traditional descriptors alone. This improvement suggests that TDA is thus a powerful complement to traditional morphometrics to comprehensively describe a multitude of “hidden” shape nuances which are otherwise not detected.
  12. Weighted Persistent Homology for Osmolyte Molecular Aggregation and Hydrogen-Bonding Network Analysis (2020)

    D. Vijay Anand, Zhenyu Meng, Kelin Xia, Yuguang Mu
    Abstract It has long been observed that trimethylamine N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work on these two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). Boltzmann persistent entropy (BPE) is proposed to quantitatively characterize the topological features from LPH and IPH, together with persistent Betti number (PBN). More specifically, from the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO is found to exhibit a local network structure. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 Å and a few global circle elements at around 12 Å. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function. Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4 Å, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5 Å to 10 Å. These differences are also reflected in the PBNs and BPEs of the local-scale IPH. These localized topological information has never been revealed before. Since graphs can be transferred into simplicial complexes by the clique complex, our weighted persistent homology models can be used in the analysis of various networks and graphs from any molecular structures and aggregation systems.
  13. The Topology of Higher-Order Complexes Associated With Brain Hubs in Human Connectomes (2020)

    Miroslav Andjelković, Bosiljka Tadić, Roderick Melnik
    Abstract Higher-order connectivity in complex systems described by simplexes of different orders provides a geometry for simplex-based dynamical variables and interactions. Simplicial complexes that constitute a functional geometry of the human connectome can be crucial for the brain complex dynamics. In this context, the best-connected brain areas, designated as hub nodes, play a central role in supporting integrated brain function. Here, we study the structure of simplicial complexes attached to eight global hubs in the female and male connectomes and identify the core networks among the affected brain regions. These eight hubs (Putamen, Caudate, Hippocampus and Thalamus-Proper in the left and right cerebral hemisphere) are the highest-ranking according to their topological dimension, defined as the number of simplexes of all orders in which the node participates. Furthermore, we analyse the weight-dependent heterogeneity of simplexes. We demonstrate changes in the structure of identified core networks and topological entropy when the threshold weight is gradually increased. These results highlight the role of higher-order interactions in human brain networks and provide additional evidence for (dis)similarity between the female and male connectomes.
  14. Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)

    Rayna Andreeva, Anwesha Sarkar, Rik Sarkar
    Abstract The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics.
  15. Coexistence Holes Characterize the Assembly and Disassembly of Multispecies Systems (2021)

    Marco Tulio Angulo, Aaron Kelley, Luis Montejano, Chuliang Song, Serguei Saavedra
    Abstract A central goal of ecological research has been to understand the limits on the maximum number of species that can coexist under given constraints. However, we know little about the assembly and disassembly processes under which a community can reach such a maximum number, or whether this number is in fact attainable in practice. This limitation is partly due to the challenge of performing experimental work and partly due to the lack of a formalism under which one can systematically study such processes. Here, we introduce a formalism based on algebraic topology and homology theory to study the space of species coexistence formed by a given pool of species. We show that this space is characterized by ubiquitous discontinuities that we call coexistence holes (that is, empty spaces surrounded by filled space). Using theoretical and experimental systems, we provide direct evidence showing that these coexistence holes do not occur arbitrarily—their diversity is constrained by the internal structure of species interactions and their frequency can be explained by the external factors acting on these systems. Our work suggests that the assembly and disassembly of ecological systems is a discontinuous process that tends to obey regularities.
  16. Topological Edge Modes by Smart Patterning (2018)

    David J. Apigo, Kai Qian, Camelia Prodan, Emil Prodan
    Abstract We study identical coupled mechanical resonators whose collective dynamics are fully determined by the patterns in which they are arranged. In this work, we call a system topological if (1) boundary resonant modes fully fill all existing spectral gaps whenever the system is halved, and (2) if the boundary spectrum cannot be removed or gapped by any boundary condition. We demonstrate that such topological characteristics can be induced solely through patterning, in a manner entirely independent of the structure of the resonators and the details of the couplings. The existence of such patterns is proven using K theory and exemplified using an experimental platform based on magnetically coupled spinners. Topological metamaterials built on these principles can be easily engineered at any scale, providing a practical platform for applications and devices.
  17. Steinhaus Filtration and Stable Paths in the Mapper (2020)

    Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel Saul
    Abstract Two central concepts from topological data analysis are persistence and the Mapper construction. Persistence employs a sequence of objects built on data called a filtration. A Mapper produces insightful summaries of data, and has found widespread applications in diverse areas. We define a new filtration called the cover filtration built from a single cover based on a generalized Steinhaus distance, which is a generalization of Jaccard distance. We prove a stability result: the cover filtrations of two covers are \$\alpha/m\$ interleaved, where \$\alpha\$ is a bound on bottleneck distance between covers and \$m\$ is the size of smallest set in either cover. We also show our construction is equivalent to the Cech filtration under certain settings, and the Vietoris-Rips filtration completely determines the cover filtration in all cases. We then develop a theory for stable paths within this filtration. Unlike standard results on stability in topological persistence, our definition of path stability aligns exactly with the above result on stability of cover filtration. We demonstrate how our framework can be employed in a variety of applications where a metric is not obvious but a cover is readily available. First we present a new model for recommendation systems using cover filtration. For an explicit example, stable paths identified on a movies data set represent sequences of movies constituting gentle transitions from one genre to another. As a second application in explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations. Stable paths in the Mapper from a supervised machine learning model trained on the FashionMNIST data set provide improved explanations of relationships between subpopulations of images.
  18. Current Theoretical Models Fail to Predict the Topological Complexity of the Human Genome (2015)

    Javier Arsuaga, Reyka G. Jayasinghe, Robert G. Scharein, Mark R. Segal, Robert H. Stolz, Mariel Vazquez
    Abstract Understanding the folding of the human genome is a key challenge of modern structural biology. The emergence of chromatin conformation capture assays (e.g., Hi-C) has revolutionized chromosome biology and provided new insights into the three dimensional structure of the genome. The experimental data are highly complex and need to be analyzed with quantitative tools. It has been argued that the data obtained from Hi-C assays are consistent with a fractal organization of the genome. A key characteristic of the fractal globule is the lack of topological complexity (knotting or inter-linking). However, the absence of topological complexity contradicts results from polymer physics showing that the entanglement of long linear polymers in a confined volume increases rapidly with the length and with decreasing volume. In vivo and in vitro assays support this claim in some biological systems. We simulate knotted lattice polygons confined inside a sphere and demonstrate that their contact frequencies agree with the human Hi-C data. We conclude that the topological complexity of the human genome cannot be inferred from current Hi-C data.
  19. Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology (2015)

    Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, Catherine Park
    Abstract DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer.
  20. Topological Analysis of Gene Expression Arrays Identifies High Risk Molecular Subtypes in Breast Cancer (2012)

    Javier Arsuaga, Nils A. Baas, Daniel DeWoskin, Hideaki Mizuno, Aleksandr Pankov, Catherine Park
    Abstract Genomic technologies measure thousands of molecular signals with the goal of understanding complex biological processes. In cancer these molecular signals have been used to characterize disease subtypes, signaling pathways and to identify subsets of patients with specific prognosis. However molecular signals for any disease type are so vast and complex that novel mathematical approaches are required for further analyses. Persistent and computational homology provide a new method for these analyses. In our previous work we presented a new homology-based supervised classification method to identify copy number aberrations from comparative genomic hybridization arrays. In this work we first propose a theoretical framework for our classification method and second we extend our analysis to gene expression data. We analyze a published breast cancer data set and find that that our method can distinguish most, but not all, different breast cancer subtypes. This result suggests that specific relationships between genes, captured by our algorithm, help distinguish between breast cancer subtypes. We propose that topological methods can be used for the classification and clustering of gene expression profiles.
  21. Persistent Homology for Breast Tumor Classification Using Mammogram Scans (2022)

    Aras Asaad, Dashti Ali, Taban Majeed, Rasber Rashid
    Abstract An Important tool in the field topological data analysis is known as persistent Homology (PH) which is used to encode abstract representation of the homology of data at different resolutions in the form of persistence diagram (PD). In this work we build more than one PD representation of a single image based on a landmark selection method, known as local binary patterns, that encode different types of local textures from images. We employed different PD vectorizations using persistence landscapes, persistence images, persistence binning (Betti Curve) and statistics. We tested the effectiveness of proposed landmark based PH on two publicly available breast abnormality detection datasets using mammogram scans. Sensitivity of landmark based PH obtained is over 90% in both datasets for the detection of abnormal breast scans. Finally, experimental results give new insights on using different types of PD vectorizations which help in utilising PH in conjunction with machine learning classifiers.
  22. Characterising Epithelial Tissues Using Persistent Entropy (2019)

    N. Atienza, L. M. Escudero, M. J. Jimenez, M. Soriano-Trigueros
    Abstract In this paper, we apply persistent entropy, a novel topological statistic, for characterization of images of epithelial tissues. We have found out that persistent entropy is able to summarize topological and geometric information encoded by \$\$\alpha \$\$α-complexes and persistent homology. After using some statistical tests, we can guarantee the existence of significant differences in the studied tissues.
  23. Stable Topological Summaries for Analyzing the Organization of Cells in a Packed Tissue (2021)

    Nieves Atienza, Maria-Jose Jimenez, Manuel Soriano-Trigueros
    Abstract We use topological data analysis tools for studying the inner organization of cells in segmented images of epithelial tissues. More specifically, for each segmented image, we compute different persistence barcodes, which codify the lifetime of homology classes (persistent homology) along different filtrations (increasing nested sequences of simplicial complexes) that are built from the regions representing the cells in the tissue. We use a complete and well-grounded set of numerical variables over those persistence barcodes, also known as topological summaries. A novel combination of normalization methods for both the set of input segmented images and the produced barcodes allows for the proven stability results for those variables with respect to small changes in the input, as well as invariance to image scale. Our study provides new insights to this problem, such as a possible novel indicator for the development of the drosophila wing disc tissue or the importance of centroids’ distribution to differentiate some tissues from their CVT-path counterpart (a mathematical model of epithelia based on Voronoi diagrams). We also show how the use of topological summaries may improve the classification accuracy of epithelial images using a Random Forest algorithm.
  24. A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks (2022)

    Bouchaib Azamir, Driss Bennis, Bertrand Michel
    Abstract Topological data analysis has recently been applied to the study of dynamic networks. In this context, an algorithm was introduced and helps, among other things, to detect early warning signals of abnormal changes in the dynamic network under study. However, the complexity of this algorithm increases significantly once the database studied grows. In this paper, we propose a simplification of the algorithm without affecting its performance. We give various applications and simulations of the new algorithm on some weighted networks. The obtained results show clearly the efficiency of the introduced approach. Moreover, in some cases, the proposed algorithm makes it possible to highlight local information and sometimes early warning signals of local abnormal changes.
  25. Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

    Paolo Bajardi, Matteo Delfino, André Panisson, Giovanni Petri, Michele Tizzoni
    Abstract We analyse a large mobile phone activity dataset provided by Telecom Italia for the Telecom Big Data Challenge contest. The dataset reports the international country codes of every call/SMS made and received by mobile phone users in Milan, Italy, between November and December 2013, with a spatial resolution of about 200 meters. We first show that the observed spatial distribution of international codes well matches the distribution of international communities reported by official statistics, confirming the value of mobile phone data for demographic research. Next, we define an entropy function to measure the heterogeneity of the international phone activity in space and time. By comparing the entropy function to empirical data, we show that it can be used to identify the city’s hotspots, defined by the presence of points of interests. Eventually, we use the entropy function to characterize the spatial distribution of international communities in the city. Adopting a topological data analysis approach, we find that international mobile phone users exhibit some robust clustering patterns that correlate with basic socio-economic variables. Our results suggest that mobile phone records can be used in conjunction with topological data analysis tools to study the geography of migrant communities in a global city.
  26. Mind the Gap: A Study in Global Development Through Persistent Homology (2018)

    Andrew Banman, Lori Ziegelmeier
    Abstract The Gapminder project set out to use statistics to dispel simplistic notions about global development. In the same spirit, we use persistent homology, a technique from computational algebraic topology, to explore the relationship between country development and geography. For each country, four indicators, gross domestic product per capita; average life expectancy; infant mortality; and gross national income per capita, were used to quantify the development. Two analyses were performed. The first considers clusters of the countries based on these indicators, and the second uncovers cycles in the data when combined with geographic border structure. Our analysis is a multi-scale approach that reveals similarities and connections among countries at a variety of levels. We discover localized development patterns that are invisible in standard statistical methods.
  27. A Machine-Learning-Based Early Warning System Boosted by Topological Data Analysis (2019)

    Devraj Basu, Tieqiang Li
    Abstract We propose a novel early warning system for detecting financial market crashes that utilizes the information extracted from the shape of financial market movement. Our system incorporates Topological Data Analysis (TDA), a new set of data analytics techniques specialised in profiling the shape of data, into a more traditional machine learning framework. Incorporating TDA leads to substantial improvements in timely detecting the onset of a sharp market decline. Our framework is both able to generate new features and also unlock more value from existing factors. Our results illustrate the importance of understanding the shape of financial market data and suggest that incorporating TDA into a machine learning framework could be beneficial in a number of financial market settings.
  28. Topology of Frame Field Meshing (2020)

    Piotr Beben
    Abstract In the past decade frame fields have emerged as a promising approach for generating hexahedral meshes for CFD and CAE applications. One important problem asks for construction of a boundary aligned frame field with prescribed singularity constraints that correspond to a valid hexahedral mesh. We give a necessary and sufficient condition in terms of solutions to a system of monomial equations whose variables are in the binary octahedral group. Along the way we look at frame field design from an algebraic topological perspective, proving various results, some known, some new.
  29. Unsupervised Topological Learning Approach of Crystal Nucleation in Pure Tantalum (2021)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unraveled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometer length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to a monatomic metal, namely Tantalum (Ta), it shows that both translational and orientational ordering always come into play simultaneously when homogeneous nucleation starts in regions with low five-fold symmetry.
  30. Unsupervised Topological Learning Approach of Crystal Nucleation (2022)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unravelled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometre length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to monatomic metals, it shows that both translational and orientational ordering always come into play simultaneously as a result of the strong bonding when homogeneous nucleation starts in regions with low five-fold symmetry. It also reveals the specificity of the nucleation pathways depending on the element considered, with features beyond the hypothesis of Classical Nucleation Theory.
  31. Unsupervised Topological Learning for Identification of Atomic Structures (2022)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract We propose an unsupervised learning methodology with descriptors based on topological data analysis (TDA) concepts to describe the local structural properties of materials at the atomic scale. Based only on atomic positions and without a priori knowledge, our method allows for an autonomous identification of clusters of atomic structures through a Gaussian mixture model. We apply successfully this approach to the analysis of elemental Zr in the crystalline and liquid states as well as homogeneous nucleation events under deep undercooling conditions. This opens the way to deeper and autonomous study of complex phenomena in materials at the atomic scale.
  32. Barcodes Distinguish Morphology of Neuronal Tauopathy (2022)

    David Beers, Despoina Goniotaki, Diane P. Hanger, Alain Goriely, Heather A. Harrington
    Abstract The geometry of neurons is known to be important for their functions. Hence, neurons are often classified by their morphology. Two recent methods, persistent homology and the topological morphology descriptor, assign a morphology descriptor called a barcode to a neuron equipped with a given function, such as the Euclidean distance from the root of the neuron. These barcodes can be converted into matrices called persistence images, which can then be averaged across groups. We show that when the defining function is the path length from the root, both the topological morphology descriptor and persistent homology are equivalent. We further show that persistence images arising from the path length procedure provide an interpretable summary of neuronal morphology. We introduce \topological morphology functions\, a class of functions similar to Sholl functions, that can be recovered from the associated topological morphology descriptor. To demonstrate this topological approach, we compare healthy cortical and hippocampal mouse neurons to those affected by progressive tauopathy. We find a significant difference in the morphology of healthy neurons and those with a tauopathy at a postsymptomatic age. We use persistence images to conclude that the diseased group tends to have neurons with shorter branches as well as fewer branches far from the soma.
  33. A Classification of Topological Discrepancies in Additive Manufacturing (2019)

    Morad Behandish, Amir M. Mirzendehdel, Saigopal Nelaturi
    Abstract Additive manufacturing (AM) enables enormous freedom for design of complex structures. However, the process-dependent limitations that result in discrepancies between as-designed and as-manufactured shapes are not fully understood. The tradeoffs between infinitely many different ways to approximate a design by a manufacturable replica are even harder to characterize. To support design for AM (DfAM), one has to quantify local discrepancies introduced by AM processes, identify the detrimental deviations (if any) to the original design intent, and prescribe modifications to the design and/or process parameters to countervail their effects. Our focus in this work will be on topological analysis. There is ample evidence in many applications that preserving local topology (e.g., connectivity of beams in a lattice) is important even when slight geometric deviations can be tolerated. We first present a generic method to characterize local topological discrepancies due to material under-and over-deposition in AM, and show how it captures various types of defects in the as-manufactured structures. We use this information to systematically modify the as-manufactured outcomes within the limitations of available 3D printer resolution(s), which often comes at the expense of introducing more geometric deviations (e.g., thickening a beam to avoid disconnection). We validate the effectiveness of the method on 3D examples with nontrivial topologies such as lattice structures and foams.
  34. Lung Topology Characteristics in Patients With Chronic Obstructive Pulmonary Disease (2018)

    Francisco Belchi, Mariam Pirashvili, Joy Conway, Michael Bennett, Ratko Djukanovic, Jacek Brodzki
    Abstract Quantitative features that can currently be obtained from medical imaging do not provide a complete picture of Chronic Obstructive Pulmonary Disease (COPD). In this paper, we introduce a novel analytical tool based on persistent homology that extracts quantitative features from chest CT scans to describe the geometric structure of the airways inside the lungs. We show that these new radiomic features stratify COPD patients in agreement with the GOLD guidelines for COPD and can distinguish between inspiratory and expiratory scans. These CT measurements are very different to those currently in use and we demonstrate that they convey significant medical information. The results of this study are a proof of concept that topological methods can enhance the standard methodology to create a finer classification of COPD and increase the possibilities of more personalized treatment.
  35. Extremal Event Graphs: A (Stable) Tool for Analyzing Noisy Time Series Data (2022)

    Robin Belton, Bree Cummins, Brittany Terese Fasy, Tomáš Gedeon
    Abstract Local maxima and minima, or extremal events, in experimental time series can be used as a coarse summary to characterize data. However, the discrete sampling in recording experimental measurements suggests uncertainty on the true timing of extrema during the experiment. This in turn gives uncertainty in the timing order of extrema within the time series. Motivated by applications in genomic time series and biological network analysis, we construct a weighted directed acyclic graph (DAG) called an extremal event DAG using techniques from persistent homology that is robust to measurement noise. Furthermore, we define a distance between extremal event DAGs based on the edit distance between strings. We prove several properties including local stability for the extremal event DAG distance with respect to pairwise \$L_\\infty\\$ distances between functions in the time series data. Lastly, we provide algorithms, publicly free software, and implementations on extremal event DAG construction and comparison.
  36. Euler Characteristic Surfaces (2021)

    Gabriele Beltramo, Rayna Andreeva, Ylenia Giarratano, Miguel O. Bernabeu, Rik Sarkar, Primoz Skraba
    Abstract We study the use of the Euler characteristic for multiparameter topological data analysis. Euler characteristic is a classical, well-understood topological invariant that has appeared in numerous applications, including in the context of random fields. The goal of this paper is to present the extension of using the Euler characteristic in higher-dimensional parameter spaces. While topological data analysis of higher-dimensional parameter spaces using stronger invariants such as homology continues to be the subject of intense research, Euler characteristic is more manageable theoretically and computationally, and this analysis can be seen as an important intermediary step in multi-parameter topological data analysis. We show the usefulness of the techniques using artificially generated examples, and a real-world application of detecting diabetic retinopathy in retinal images.
  37. Computing Robustness and Persistence for Images (2010)

    P. Bendich, H. Edelsbrunner, M. Kerber
    Abstract We are interested in 3-dimensional images given as arrays of voxels with intensity values. Extending these values to a continuous function, we study the robustness of homology classes in its level and interlevel sets, that is, the amount of perturbation needed to destroy these classes. The structure of the homology classes and their robustness, over all level and interlevel sets, can be visualized by a triangular diagram of dots obtained by computing the extended persistence of the function. We give a fast hierarchical algorithm using the dual complexes of oct-tree approximations of the function. In addition, we show that for balanced oct-trees, the dual complexes are geometrically realized in R3 and can thus be used to construct level and interlevel sets. We apply these tools to study 3-dimensional images of plant root systems.
  38. Persistent Homology Analysis of Brain Artery Trees (2016)

    Paul Bendich, J. S. Marron, Ezra Miller, Alex Pieloch, Sean Skwerer
    Abstract New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries.
  39. Analyzing Collective Motion With Machine Learning and Topology (2019)

    Dhananjay Bhaskar, Angelika Manhart, Jesse Milzman, John T. Nardini, Kathleen M. Storey, Chad M. Topaz, Lori Ziegelmeier
    Abstract We use topological data analysis and machine learning to study a seminal model of collective motion in biology [M. R. D’Orsogna et al., Phys. Rev. Lett. 96, 104302 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based on topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.
  40. Topological Data Analysis of Collective and Individual Epithelial Cells Using Persistent Homology of Loops (2021)

    Dhananjay Bhaskar, William Y. Zhang, Ian Y. Wong
    Abstract Interacting, self-propelled particles such as epithelial cells can dynamically self-organize into complex multicellular patterns, which are challenging to classify without a priori information. Classically, different phases and phase transitions have been described based on local ordering, which may not capture structural features at larger length scales. Instead, topological data analysis (TDA) determines the stability of spatial connectivity at varying length scales (i.e. persistent homology), and can compare different particle configurations based on the “cost” of reorganizing one configuration into another. Here, we demonstrate a topology-based machine learning approach for unsupervised profiling of individual and collective phases based on large-scale loops. We show that these topological loops (i.e. dimension 1 homology) are robust to variations in particle number and density, particularly in comparison to connected components (i.e. dimension 0 homology). We use TDA to map out phase diagrams for simulated particles with varying adhesion and propulsion, at constant population size as well as when proliferation is permitted. Next, we use this approach to profile our recent experiments on the clustering of epithelial cells in varying growth factor conditions, which are compared to our simulations. Finally, we characterize the robustness of this approach at varying length scales, with sparse sampling, and over time. Overall, we envision TDA will be broadly applicable as a model-agnostic approach to analyze active systems with varying population size, from cytoskeletal motors to motile cells to flocking or swarming animals.
  41. Topological Data Analysis of Spatial Patterning in Heterogeneous Cell Populations: Clustering and Sorting With Varying Cell-Cell Adhesion (2023)

    Dhananjay Bhaskar, William Y. Zhang, Alexandria Volkening, Björn Sandstede, Ian Y. Wong
    Abstract Different cell types aggregate and sort into hierarchical architectures during the formation of animal tissues. The resulting spatial organization depends (in part) on the strength of adhesion of one cell type to itself relative to other cell types. However, automated and unsupervised classification of these multicellular spatial patterns remains challenging, particularly given their structural diversity and biological variability. Recent developments based on topological data analysis are intriguing to reveal similarities in tissue architecture, but these methods remain computationally expensive. In this article, we show that multicellular patterns organized from two interacting cell types can be efficiently represented through persistence images. Our optimized combination of dimensionality reduction via autoencoders, combined with hierarchical clustering, achieved high classification accuracy for simulations with constant cell numbers. We further demonstrate that persistence images can be normalized to improve classification for simulations with varying cell numbers due to proliferation. Finally, we systematically consider the importance of incorporating different topological features as well as information about each cell type to improve classification accuracy. We envision that topological machine learning based on persistence images will enable versatile and robust classification of complex tissue architectures that occur in development and disease.
  42. Persistent Homology for Path Planning in Uncertain Environments (2015)

    S. Bhattacharya, R. Ghrist, V. Kumar
    Abstract We address the fundamental problem of goal-directed path planning in an uncertain environment represented as a probability (of occupancy) map. Most methods generally use a threshold to reduce the grayscale map to a binary map before applying off-the-shelf techniques to find the best path. This raises the somewhat ill-posed question, what is the right (optimal) value to threshold the map? We instead suggest a persistent homology approach to the problem-a topological approach in which we seek the homology class of trajectories that is most persistent for the given probability map. In other words, we want the class of trajectories that is free of obstacles over the largest range of threshold values. In order to make this problem tractable, we use homology in ℤ2 coefficients (instead of the standard ℤ coefficients), and describe how graph search-based algorithms can be used to find trajectories in different homology classes. Our simulation results demonstrate the efficiency and practical applicability of the algorithm proposed in this paper.paper.
  43. The Persistence of Large Scale Structures I: Primordial Non-Gaussianity (2020)

    Matteo Biagetti, Alex Cole, Gary Shiu
    Abstract We develop an analysis pipeline for characterizing the topology of large scale structure and extracting cosmological constraints based on persistent homology. Persistent homology is a technique from topological data analysis that quantifies the multiscale topology of a data set, in our context unifying the contributions of clusters, filament loops, and cosmic voids to cosmological constraints. We describe how this method captures the imprint of primordial local non-Gaussianity on the late-time distribution of dark matter halos, using a set of N-body simulations as a proxy for real data analysis. For our best single statistic, running the pipeline on several cubic volumes of size \$40~(\rm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes. Additionally we test our ability to resolve degeneracies between the topological signature of \$f_\\rm NL\\textasciicircum\\rm loc\\$ and variation of \$\sigma_8\$ and argue that correctly identifying nonzero \$f_\\rm NL\\textasciicircum\\rm loc\\$ in this case is possible via an optimal template method. Our method relies on information living at \$\mathcal\O\(10)\$ Mpc/h, a complementary scale with respect to commonly used methods such as the scale-dependent bias in the halo/galaxy power spectrum. Therefore, while still requiring a large volume, our method does not require sampling long-wavelength modes to constrain primordial non-Gaussianity. Moreover, our statistics are interpretable: we are able to reproduce previous results in certain limits and we make new predictions for unexplored observables, such as filament loops formed by dark matter halos in a simulation box.
  44. Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)

    Bryan Bischof, Eric Bunch
    Abstract We experimentally investigate a collection of feature engineering pipelines for use with a CNN for classifying eyes-open or eyes-closed from electroencephalogram (EEG) time-series from the Bonn dataset. Using the Takens' embedding--a geometric representation of time-series--we construct simplicial complexes from EEG data. We then compare \$\epsilon\$-series of Betti-numbers and \$\epsilon\$-series of graph spectra (a novel construction)--two topological invariants of the latent geometry from these complexes--to raw time series of the EEG to fill in a gap in the literature for benchmarking. These methods, inspired by Topological Data Analysis, are used for feature engineering to capture local geometry of the time-series. Additionally, we test these feature pipelines' robustness to downsampling and data reduction. This paper seeks to establish clearer expectations for both time-series classification via geometric features, and how CNNs for time-series respond to data of degraded resolution.
  45. The Accumulated Persistence Function, a New Useful Functional Summary Statistic for Topological Data Analysis, With a View to Brain Artery Trees and Spatial Point Process Applications (2019)

    C.A.N. Biscio, J. Møller
    Abstract We start with a simple introduction to topological data analysis where the most popular tool is called a persistence diagram. Briefly, a persistence diagram is a multiset of points in the plane describing the persistence of topological features of a compact set when a scale parameter varies. Since statistical methods are difficult to apply directly on persistence diagrams, various alternative functional summary statistics have been suggested, but either they do not contain the full information of the persistence diagram or they are two-dimensional functions. We suggest a new functional summary statistic that is one-dimensional and hence easier to handle, and which under mild conditions contains the full information of the persistence diagram. Its usefulness is illustrated in statistical settings concerned with point clouds and brain artery trees. The supplementary materials include additional methods and examples, technical details, and the R code used for all examples. © 2019, © 2019 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
  46. Topology Identifies Emerging Adaptive Mutations in SARS-CoV-2 (2021)

    Michael Bleher, Lukas Hahn, Juan Angel Patino-Galindo, Mathieu Carriere, Ulrich Bauer, Raul Rabadan, Andreas Ott
    Abstract The COVID-19 pandemic has lead to a worldwide effort to characterize its evolution through the mapping of mutations in the genome of the coronavirus SARS-CoV-2. Ideally, one would like to quickly identify new mutations that could confer adaptive advantages (e.g. higher infectivity or immune evasion) by leveraging the large number of genomes. One way of identifying adaptive mutations is by looking at convergent mutations, mutations in the same genomic position that occur independently. However, the large number of currently available genomes precludes the efficient use of phylogeny-based techniques. Here, we establish a fast and scalable Topological Data Analysis approach for the early warning and surveillance of emerging adaptive mutations based on persistent homology. It identifies convergent events merely by their topological footprint and thus overcomes limitations of current phylogenetic inference techniques. This allows for an unbiased and rapid analysis of large viral datasets. We introduce a new topological measure for convergent evolution and apply it to the GISAID dataset as of February 2021, comprising 303,651 high-quality SARS-CoV-2 isolates collected since the beginning of the pandemic. We find that topologically salient mutations on the receptor-binding domain appear in several variants of concern and are linked with an increase in infectivity and immune escape, and for many adaptive mutations the topological signal precedes an increase in prevalence. We show that our method effectively identifies emerging adaptive mutations at an early stage. By localizing topological signals in the dataset, we extract geo-temporal information about the early occurrence of emerging adaptive mutations. The identification of these mutations can help to develop an alert system to monitor mutations of concern and guide experimentalists to focus the study of specific circulating variants.
  47. Persistence-Based Pooling for Shape Pose Recognition (2016)

    Thomas Bonis, Maks Ovsjanikov, Steve Oudot, Frédéric Chazal
    Abstract In this paper, we propose a novel pooling approach for shape classification and recognition using the bag-of-words pipeline, based on topological persistence, a recent tool from Topological Data Analysis. Our technique extends the standard max-pooling, which summarizes the distribution of a visual feature with a single number, thereby losing any notion of spatiality. Instead, we propose to use topological persistence, and the derived persistence diagrams, to provide significantly more informative and spatially sensitive characterizations of the feature functions, which can lead to better recognition performance. Unfortunately, despite their conceptual appeal, persistence diagrams are difficult to handle, since they are not naturally represented as vectors in Euclidean space and even the standard metric, the bottleneck distance is not easy to compute. Furthermore, classical distances between diagrams, such as the bottleneck and Wasserstein distances, do not allow to build positive definite kernels that can be used for learning. To handle this issue, we provide a novel way to transform persistence diagrams into vectors, in which comparisons are trivial. Finally, we demonstrate the performance of our construction on the Non-Rigid 3D Human Models SHREC 2014 dataset, where we show that topological pooling can provide significant improvements over the standard pooling methods for the shape pose recognition within the bag-of-words pipeline.
  48. Topological Phase Estimation Method for Reparameterized Periodic Functions (2022)

    Thomas Bonis, Frédéric Chazal, Bertrand Michel, Wojciech Reise
    Abstract We consider a signal composed of several periods of a periodic function, of which we observe a noisy reparametrisation. The phase estimation problem consists of finding that reparametrisation, and, in particular, the number of observed periods. Existing methods are well-suited to the setting where the periodic function is known, or at least, simple. We consider the case when it is unknown and we propose an estimation method based on the shape of the signal. We use the persistent homology of sublevel sets of the signal to capture the temporal structure of its local extrema. We infer the number of periods in the signal by counting points in the persistence diagram and their multiplicities. Using the estimated number of periods, we construct an estimator of the reparametrisation. It is based on counting the number of sufficiently prominent local minima in the signal. This work is motivated by a vehicle positioning problem, on which we evaluated the proposed method.
  49. Structural Insight Into RNA Hairpin Folding Intermediates (2008)

    Gregory R. Bowman, Xuhui Huang, Yuan Yao, Jian Sun, Gunnar Carlsson, Leonidas J. Guibas, Vijay S. Pande
    Abstract , Hairpins are a ubiquitous secondary structure motif in RNA molecules. Despite their simple structure, there is some debate over whether they fold in a two-state or multi-state manner. We have studied the folding of a small tetraloop hairpin using a serial version of replica exchange molecular dynamics on a distributed computing environment. On the basis of these simulations, we have identified a number of intermediates that are consistent with experimental results. We also find that folding is not simply the reverse of high-temperature unfolding and suggest that this may be a general feature of biomolecular folding.
  50. Atom-Specific Persistent Homology and Its Application to Protein Flexibility Analysis (2020)

    David Bramer, Guo-Wei Wei
    Abstract Recently, persistent homology has had tremendous success in biomolecular data analysis. It works by examining the topological relationship or connectivity of a group of atoms in a molecule at a variety of scales, then rendering a family of topological representations of the molecule. However, persistent homology is rarely employed for the analysis of atomic properties, such as biomolecular flexibility analysis or B-factor prediction. This work introduces atom-specific persistent homology to provide a local atomic level representation of a molecule via a global topological tool. This is achieved through the construction of a pair of conjugated sets of atoms and corresponding conjugated simplicial complexes, as well as conjugated topological spaces. The difference between the topological invariants of the pair of conjugated sets is measured by Bottleneck and Wasserstein metrics and leads to an atom-specific topological representation of individual atomic properties in a molecule. Atom-specific topological features are integrated with various machine learning algorithms, including gradient boosting trees and convolutional neural network for protein thermal fluctuation analysis and B-factor prediction. Extensive numerical results indicate the proposed method provides a powerful topological tool for analyzing and predicting localized information in complex macromolecules.
  51. Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)

    Nicole Bussola, Bruno Papa, Ombretta Melaiu, Aurora Castellano, Doriana Fruci, Giuseppe Jurman
    Abstract We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the U-Net model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform.
  52. Inference of Ancestral Recombination Graphs Through Topological Data Analysis (2016)

    Pablo G. Cámara, Arnold J. Levine, Raúl Rabadán
    Abstract The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands., Evolution occurs through different mechanisms, including point mutations, gene duplication, horizontal gene transfer, and recombinations. Some of these mechanisms cannot be captured by tree graphs. We present a framework, based on the mathematical tools of computational topology, that can explicitly accommodate both recombination and mutation events across the evolutionary history of a sample of genomic sequences. This approach generates a new type of summary graph and algebraic structures that provide quantitative information on the evolutionary scale and frequency of recombination events. The accompanying software, TARGet, is applied to several examples, including migration between sexually-reproducing populations, human recombination, and recombination in Darwin’s finches.
  53. Topological Data Analysis Generates High-Resolution, Genome-Wide Maps of Human Recombination (2016)

    Pablo G. Camara, Daniel I. S. Rosenbloom, Kevin J. Emmett, Arnold J. Levine, Raul Rabadan
    Abstract Meiotic recombination is a fundamental evolutionary process driving diversity in eukaryotes. In mammals, recombination is known to occur preferentially at specific genomic regions. Using topological data analysis (TDA), a branch of applied topology that extracts global features from large data sets, we developed an efficient method for mapping recombination at fine scales. When compared to standard linkage-based methods, TDA can deal with a larger number of SNPs and genomes without incurring prohibitive computational costs. We applied TDA to 1,000 Genomes Project data and constructed high-resolution whole-genome recombination maps of seven human populations. Our analysis shows that recombination is generally under-represented within transcription start sites. However, the binding sites of specific transcription factors are enriched for sites of recombination. These include transcription factors that regulate the expression of meiosis- and gametogenesis-specific genes, cell cycle progression, and differentiation blockage. Additionally, our analysis identifies an enrichment for sites of recombination at repeat-derived loci matched by piwi-interacting RNAs.
  54. Evolutionary Homology on Coupled Dynamical Systems With Applications to Protein Flexibility Analysis (2020)

    Zixuan Cang, Elizabeth Munch, Guo-Wei Wei
    Abstract While the spatial topological persistence is naturally constructed from a radius-based filtration, it has hardly been derived from a temporal filtration. Most topological models are designed for the global topology of a given object as a whole. There is no method reported in the literature for the topology of an individual component in an object to the best of our knowledge. For many problems in science and engineering, the topology of an individual component is important for describing its properties. We propose evolutionary homology (EH) constructed via a time evolution-based filtration and topological persistence. Our approach couples a set of dynamical systems or chaotic oscillators by the interactions of a physical system, such as a macromolecule. The interactions are approximated by weighted graph Laplacians. Simplices, simplicial complexes, algebraic groups and topological persistence are defined on the coupled trajectories of the chaotic oscillators. The resulting EH gives rise to time-dependent topological invariants or evolutionary barcodes for an individual component of the physical system, revealing its topology-function relationship. In conjunction with Wasserstein metrics, the proposed EH is applied to protein flexibility analysis, an important problem in computational biophysics. Numerical results for the B-factor prediction of a benchmark set of 364 proteins indicate that the proposed EH outperforms all the other state-of-the-art methods in the field.
  55. Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

    Zixuan Cang, Lin Mu, Guo-Wei Wei
    Abstract This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
  56. Fibers of Failure: Classifying Errors in Predictive Processes (2020)

    Leo S. Carlsson, Mikael Vejdemo-Johansson, Gunnar Carlsson, Pär G. Jönsson
    Abstract Predictive models are used in many different fields of science and engineering and are always prone to make faulty predictions. These faulty predictions can be more or less malignant depending on the model application. We describe fibers of failure (FiFa), a method to classify failure modes of predictive processes. Our method uses Mapper, an algorithm from topological data analysis (TDA), to build a graphical model of input data stratified by prediction errors. We demonstrate two ways to use the failure mode groupings: either to produce a correction layer that adjusts predictions by similarity to the failure modes; or to inspect members of the failure modes to illustrate and investigate what characterizes each failure mode. We demonstrate FiFa on two scenarios: a convolutional neural network (CNN) predicting MNIST images with added noise, and an artificial neural network (ANN) predicting the electrical energy consumption of an electric arc furnace (EAF). The correction layer on the CNN model improved its prediction accuracy significantly while the inspection of failure modes for the EAF model provided guiding insights into the domain-specific reasons behind several high-error regions.
  57. On the Local Behavior of Spaces of Natural Images (2008)

    Gunnar Carlsson, Tigran Ishkhanov, Vin de Silva, Afra Zomorodian
    Abstract In this study we concentrate on qualitative topological analysis of the local behavior of the space of natural images. To this end, we use a space of 3 by 3 high-contrast patches ℳ. We develop a theoretical model for the high-density 2-dimensional submanifold of ℳ showing that it has the topology of the Klein bottle. Using our topological software package PLEX we experimentally verify our theoretical conclusions. We use polynomial representation to give coordinatization to various subspaces of ℳ. We find the best-fitting embedding of the Klein bottle into the ambient space of ℳ. Our results are currently being used in developing a compression algorithm based on a Klein bottle dictionary.
  58. Topological Pattern Recognition for Point Cloud Data* (2014)

    Gunnar Carlsson
    Abstract In this paper we discuss the adaptation of the methods of homology from algebraic topology to the problem of pattern recognition in point cloud data sets. The method is referred to as persistent homology, and has numerous applications to scientific problems. We discuss the definition and computation of homology in the standard setting of simplicial complexes and topological spaces, then show how one can obtain useful signatures, called barcodes, from finite metric spaces, thought of as sampled from a continuous object. We present several different cases where persistent homology is used, to illustrate the different ways in which the method can be applied.
  59. Topological Data Analysis of Single-Cell Hi-C Contact Maps (2020)

    Mathieu Carrière, Raúl Rabadán
    Abstract Due to recent breakthroughs in high-throughput sequencing, it is now possible to use chromosome conformation capture (CCC) to understand the three dimensional conformation of DNA at the whole genome level, and to characterize it with the so-called contact maps. This is very useful since many biological processes are correlated with DNA folding, such as DNA transcription. However, the methods for the analysis of such conformations are still lacking mathematical guarantees and statistical power. To handle this issue, we propose to use the Mapper, which is a standard tool of Topological Data Analysis (TDA) that allows one to efficiently encode the inherent continuity and topology of underlying biological processes in data, in the form of a graph with various features such as branches and loops. In this article, we show how recent statistical techniques developed in TDA for the Mapper algorithm can be extended and leveraged to formally define and statistically quantify the presence of topological structures coming from biological phenomena, such as the cell cyle, in datasets of CCC contact maps.
  60. Severe Slugging Flow Identification From Topological Indicators (2022)

    Simone Casolo
    Abstract In this work, topological data analysis is used to identify the onset of severe slug flow in offshore petroleum production systems. Severe slugging is a multiphase flow regime known to be very inefficient and potentially harmful to process equipment and it is characterized by large oscillations in the production fluid pressure. Time series from pressure sensors in subsea oil wells are processed by means of Takens embedding to produce point clouds of data. Embedded sensor data is then analyzed using persistent homology to obtain topological indicators capable of revealing the occurrence of severe slugging in a condition-based monitoring approach. A large dataset of well events consisting of both real and simulated data is used to demonstrate the possibilty of authomatizing severe slugging detection from live data via topological data analysis. Methods based on persistence diagrams are shown to accurately identify severe slugging and to classify different flow regimes from pressure signals of producing wells with supervised machine learning.
  61. Imaging-Based Representation and Stratification of Intra-Tumor Heterogeneity via Tree-Edit Distance (2022)

    Lara Cavinato, Matteo Pegoraro, Alessandra Ragni, Francesca Ieva
    Abstract Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning.
  62. Topology of Viral Evolution (2013)

    Joseph Minhow Chan, Gunnar Carlsson, Raul Rabadan
    Abstract The tree structure is currently the accepted paradigm to represent evolutionary relationships between organisms, species or other taxa. However, horizontal, or reticulate, genomic exchanges are pervasive in nature and confound characterization of phylogenetic trees. Drawing from algebraic topology, we present a unique evolutionary framework that comprehensively captures both clonal and reticulate evolution. We show that whereas clonal evolution can be summarized as a tree, reticulate evolution exhibits nontrivial topology of dimension greater than zero. Our method effectively characterizes clonal evolution, reassortment, and recombination in RNA viruses. Beyond detecting reticulate evolution, we succinctly recapitulate the history of complex genetic exchanges involving more than two parental strains, such as the triple reassortment of H7N9 avian influenza and the formation of circulating HIV-1 recombinants. In addition, we identify recurrent, large-scale patterns of reticulate evolution, including frequent PB2-PB1-PA-NP cosegregation during avian influenza reassortment. Finally, we bound the rate of reticulate events (i.e., 20 reassortments per year in avian influenza). Our method provides an evolutionary perspective that not only captures reticulate events precluding phylogeny, but also indicates the evolutionary scales where phylogenetic inference could be accurate.
  63. Topological Analysis Reveals State Transitions in Human Gut and Marine Bacterial Communities (2020)

    William K. Chang, David VanInsberghe, Libusha Kelly
    Abstract Microbiome dynamics influence the health and functioning of human physiology and the environment and are driven in part by interactions between large numbers of microbial taxa, making large-scale prediction and modeling a challenge. Here, using topological data analysis, we identify states and dynamical features relevant to macroscopic processes. We show that gut disease processes and marine geochemical events are associated with transitions between community states, defined as topological features of the data density. We find a reproducible two-state succession during recovery from cholera in the gut microbiomes of multiple patients, evidence of dynamic stability in the gut microbiome of a healthy human after experiencing diarrhea during travel, and periodic state transitions in a marine Prochlorococcus community driven by water column cycling. Our approach bridges small-scale fluctuations in microbiome composition and large-scale changes in phenotype without details of underlying mechanisms, and provides an assessment of microbiome stability and its relation to human and environmental health.
  64. Cosmic Web Reconstruction Through Density Ridges: Method and Algorithm (2015)

    Yen-Chi Chen, Shirley Ho, Peter E. Freeman, Christopher R. Genovese, Larry Wasserman
    Abstract The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters that dictate the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the subspace constrained mean shift (SCMS) algorithm (Ozertem & Erdogmus 2011; Genovese et al. 2014) to uncover filamentary structure in galaxydata. The SCMS algorithm is a gradient ascent method that models filaments as density ridges, one-dimensional smooth curves that trace high-density regions within the point cloud. We also demonstrate how augmenting the SCMS algorithm with bootstrap-based methods of uncertainty estimation allows one to place uncertainty bands around putative filaments. We apply the SCMS first to the data set generated from the Voronoi model. The density ridges show strong agreement with the filaments from Voronoi method. We then apply the SCMS method data sets sampled from a P3M N-body simulation, with galaxy number densities consistent with SDSS and WFIRST-AFTA, and to LOWZ and CMASS data from the Baryon Oscillation Spectroscopic Survey (BOSS). To further assess the efficacy of SCMS, we compare the relative locations of BOSS filaments with galaxy clusters in the redMaPPer catalogue, and find that redMaPPer clusters are significantly closer (with p-values \textless10−9) to SCMS-detected filaments than to randomly selected galaxies.
  65. Persistent Homology on Grassmann Manifolds for Analysis of Hyperspectral Movies (2016)

    Sofya Chepushtanova, Michael Kirby, Chris Peterson, Lori Ziegelmeier
    Abstract The existence of characteristic structure, or shape, in complex data sets has been recognized as increasingly important for mathematical data analysis. This realization has motivated the development of new tools such as persistent homology for exploring topological invariants, or features, in large data sets. In this paper, we apply persistent homology to the characterization of gas plumes in time dependent sequences of hyperspectral cubes, i.e. the analysis of 4-way arrays. We investigate hyperspectral movies of Long-Wavelength Infrared data monitoring an experimental release of chemical simulant into the air. Our approach models regions of interest within the hyperspectral data cubes as points on the real Grassmann manifold Gk,ï źn whose points parameterize the k-dimensional subspaces of \$\$\mathbb \R\\textasciicircumn\$\$Rn, contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows a sequence of time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmann manifold. This motivates the search for topological features, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the Grassmann manifold. The proposed mathematical model affords the processing of large data sets while retaining valuable discriminatory information. In this paper, we discuss how embedding our data in the Grassmann manifold, together with topological data analysis, captures dynamical events that occur as the chemical plume is released and evolves.
  66. The Importance of Forgetting: Limiting Memory Improves Recovery of Topological Characteristics From Neural Data (2018)

    Samir Chowdhury, Bowen Dai, Facundo Mémoli
    Abstract We develop of a line of work initiated by Curto and Itskov towards understanding the amount of information contained in the spike trains of hippocampal place cells via topology considerations. Previously, it was established that simply knowing which groups of place cells fire together in an animal’s hippocampus is sufficient to extract the global topology of the animal’s physical environment. We model a system where collections of place cells group and ungroup according to short-term plasticity rules. In particular, we obtain the surprising result that in experiments with spurious firing, the accuracy of the extracted topological information decreases with the persistence (beyond a certain regime) of the cell groups. This suggests that synaptic transience, or forgetting, is a mechanism by which the brain counteracts the effects of spurious place cell activity.
  67. Path Homologies of Motifs and Temporal Network Representations (2022)

    Samir Chowdhury, Steve Huntsman, Matvey Yutin
    Abstract Path homology is a powerful method for attaching algebraic invariants to digraphs. While there have been growing theoretical developments on the algebro-topological framework surrounding path homology, bona fide applications to the study of complex networks have remained stagnant. We address this gap by presenting an algorithm for path homology that combines efficient pruning and indexing techniques and using it to topologically analyze a variety of real-world complex temporal networks. A crucial step in our analysis is the complete characterization of path homologies of certain families of small digraphs that appear as subgraphs in these complex networks. These families include all digraphs, directed acyclic graphs, and undirected graphs up to certain numbers of vertices, as well as some specially constructed cases. Using information from this analysis, we identify small digraphs contributing to path homology in dimension two for three temporal networks in an aggregated representation and relate these digraphs to network behavior. We then investigate alternative temporal network representations and identify complementary subgraphs as well as behavior that is preserved across representations. We conclude that path homology provides insight into temporal network structure, and in turn, emergent structures in temporal networks provide us with new subgraphs having interesting path homology.
  68. The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)

    Salvador Chulián, Bernadette J. Stolz, Álvaro Martínez-Rubio, Cristina Blázquez Goñi, Juan F. Rodríguez Gutiérrez, Teresa Caballero Velázquez, Águeda Molinos Quintana, Manuel Ramírez Orellana, Ana Castillo Robleda, José Luis Fuster Soler, Alfredo Minguela Puras, María Victoria Martínez Sánchez, María Rosa, Víctor M. Pérez-García, Helen Byrne
    Abstract Acute Lymphoblastic Leukaemia (ALL) is the most frequent paediatric cancer. Modern therapies have improved survival rates, but approximately 15-20 % of patients relapse. At present, patients’ risk of relapse are assessed by projecting high-dimensional flow cytometry data onto a subset of biomarkers and manually estimating the shape of this reduced data. Here, we apply methods from topological data analysis (TDA), which quantify shape in data via features such as connected components and loops, to pre-treatment ALL datasets with known outcomes. We combine these fully unsupervised analyses with machine learning to identify features in the pre-treatment data that are prognostic for risk of relapse. We find significant topological differences between relapsing and non-relapsing patients and confirm the predictive power of CD10, CD20, CD38, and CD45. Further, we are able to use the TDA descriptors to predict patients who relapsed. We propose three prognostic pipelines that readily extend to other haematological malignancies. Teaser Topology reveals features in flow cytometry data which predict relapse of patients with acute lymphoblastic leukemia
  69. Pattern Characterization Using Topological Data Analysis: Application to Piezo Vibration Striking Treatment (2023)

    Max M. Chumley, Melih C. Yesilli, Jisheng Chen, Firas A. Khasawneh, Yang Guo
    Abstract Quantifying patterns in visual or tactile textures provides important information about the process or phenomena that generated these patterns. In manufacturing, these patterns can be intentionally introduced as a design feature, or they can be a byproduct of a specific process. Since surface texture has significant impact on the mechanical properties and the longevity of the workpiece, it is important to develop tools for quantifying surface patterns and, when applicable, comparing them to their nominal counterparts. While existing tools may be able to indicate the existence of a pattern, they typically do not provide more information about the pattern structure, or how much it deviates from a nominal pattern. Further, prior works do not provide automatic or algorithmic approaches for quantifying other pattern characteristics such as depths’ consistency, and variations in the pattern motifs at different level sets. This paper leverages persistent homology from Topological Data Analysis (TDA) to derive noise-robust scores for quantifying motifs’ depth and roundness in a pattern. Specifically, sublevel persistence is used to derive scores that quantify the consistency of indentation depths at any level set in Piezo Vibration Striking Treatment (PVST) surfaces. Moreover, we combine sublevel persistence with the distance transform to quantify the consistency of the indentation radii, and to compare them with the nominal ones. Although the tool in our PVST experiments had a semi-spherical profile, we present a generalization of our approach to tools/motifs of arbitrary shapes thus making our method applicable to other pattern-generating manufacturing processes.
  70. A Multi-Parameter Persistence Framework for Mathematical Morphology (2021)

    Yu-Min Chung, Sarah Day, Chuan-Shen Hu
    Abstract The field of mathematical morphology offers well-studied techniques for image processing. In this work, we view morphological operations through the lens of persistent homology, a tool at the heart of the field of topological data analysis. We demonstrate that morphological operations naturally form a multiparameter filtration and that persistent homology can then be used to extract information about both topology and geometry in the images as well as to automate methods for optimizing the study and rendering of structure in images. For illustration, we apply this framework to analyze noisy binary, grayscale, and color images.
  71. Reviews: Topological Distances and Losses for Brain Networks (2021)

    Moo K. Chung, Alexander Smith, Gary Shiu
    Abstract Almost all statistical and machine learning methods in analyzing brain networks rely on distances and loss functions, which are mostly Euclidean or matrix norms. The Euclidean or matrix distances may fail to capture underlying subtle topological differences in brain networks. Further, Euclidean distances are sensitive to outliers. A few extreme edge weights may severely affect the distance. Thus it is necessary to use distances and loss functions that recognize topology of data. In this review paper, we survey various topological distance and loss functions from topological data analysis (TDA) and persistent homology that can be used in brain network analysis more effectively. Although there are many recent brain imaging studies that are based on TDA methods, possibly due to the lack of method awareness, TDA has not taken as the mainstream tool in brain imaging field yet. The main purpose of this paper is provide the relevant technical survey of these powerful tools that are immediately applicable to brain network data.
  72. Lipschitz Functions Have Lp-Stable Persistence (2010)

    David Cohen-Steiner, Herbert Edelsbrunner, John Harer, Yuriy Mileyko
    Abstract We prove two stability results for Lipschitz functions on triangulable, compact metric spaces and consider applications of both to problems in systems biology. Given two functions, the first result is formulated in terms of the Wasserstein distance between their persistence diagrams and the second in terms of their total persistence.
  73. Interpretable Phase Detection and Classification With Persistent Homology (2020)

    Alex Cole, Gregory J. Loges, Gary Shiu
    Abstract We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for conducting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortex-antivortex structure are identified as relevant features for characterizing phase transitions.
  74. Quantitative and Interpretable Order Parameters for Phase Transitions From Persistent Homology (2020)

    Alex Cole, Gregory J. Loges, Gary Shiu
    Abstract We apply modern methods in computational topology to the task of discovering and characterizing phase transitions. As illustrations, we apply our method to four two-dimensional lattice spin models: the Ising, square ice, XY, and fully-frustrated XY models. In particular, we use persistent homology, which computes the births and deaths of individual topological features as a coarse-graining scale or sublevel threshold is increased, to summarize multiscale and high-point correlations in a spin configuration. We employ vector representations of this information called persistence images to formulate and perform the statistical task of distinguishing phases. For the models we consider, a simple logistic regression on these images is sufficient to identify the phase transition. Interpretable order parameters are then read from the weights of the regression. This method suffices to identify magnetization, frustration, and vortex-antivortex structure as relevant features for phase transitions in our models. We also define "persistence" critical exponents and study how they are related to those critical exponents usually considered.
  75. Topological Echoes of Primordial Physics in the Universe at Large Scales (2020)

    Alex Cole, Matteo Biagetti, Gary Shiu
    Abstract We present a pipeline for characterizing and constraining initial conditions in cosmology via persistent homology. The cosmological observable of interest is the cosmic web of large scale structure, and the initial conditions in question are non-Gaussianities (NG) of primordial density perturbations. We compute persistence diagrams and derived statistics for simulations of dark matter halos with Gaussian and non-Gaussian initial conditions. For computational reasons and to make contact with experimental observations, our pipeline computes persistence in sub-boxes of full simulations and simulations are subsampled to uniform halo number. We use simulations with large NG (\$f_\\rm NL\\textasciicircum\\rm loc\=250\$) as templates for identifying data with mild NG (\$f_\\rm NL\\textasciicircum\\rm loc\=10\$), and running the pipeline on several cubic volumes of size \$40~(\textrm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes for our best single statistic. Throughout we benefit from the interpretability of topological features as input for statistical inference, which allows us to make contact with previous first-principles calculations and make new predictions.
  76. A Barcode Shape Descriptor for Curve Point Cloud Data (2004)

    Anne Collins, Afra Zomorodian, Gunnar Carlsson, Leonidas J. Guibas
    Abstract In this paper, we present a complete computational pipeline for extracting a compact shape descriptor for curve point cloud data (PCD). Our shape descriptor, called a barcode, is based on a blend of techniques from differential geometry and algebraic topology. We also provide a metric over the space of barcodes, enabling fast comparison of PCDs for shape recognition and clustering. To demonstrate the feasibility of our approach, we implement our pipeline and provide experimental evidence in shape classification and parametrization.
  77. A Topological Machine Learning Pipeline for Classification (2022)

    Francesco Conti, Davide Moroni, Maria Antonietta Pascali
    Abstract In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.
  78. Modelling Topological Features of Swarm Behaviour in Space and Time With Persistence Landscapes (2017)

    P. Corcoran, C. B. Jones
    Abstract This paper presents a model of swarm behavior that encodes the spatial-temporal characteristics of topological features, such as holes and connected components. Specifically, the persistence of topological features with respect to time is computed using zig-zag persistent homology. This information is in turn modelled as a persistence landscape, which forms a normed vector space and facilitates the application of statistical and data mining techniques. Validation of the proposed model is performed using a real data set corresponding to a swarm of fish. It is demonstrated that the proposed model may be used to perform retrieval and clustering of swarm behavior in terms of topological features. In fact, it is discovered that clustering returns clusters corresponding to the swarm behaviors of flock, torus, and disordered. These are the most frequently occurring types of behavior exhibited by swarms in general.
  79. Tenfold Topology of Crystals (2020)

    Eyal Cornfeld, Shachar Carmeli
    Abstract The celebrated tenfold-way of Altland-Zirnbauer symmetry classes discern any quantum system by its pattern of non-spatial symmetries. It lays at the core of the periodic table of topological insulators and superconductors which provided a complete classification of weakly-interacting electrons' non-crystalline topological phases for all symmetry classes. Over recent years, a plethora of topological phenomena with diverse surface states has been discovered in crystalline materials. In this paper, we obtain an exhaustive classification of topologically distinct groundstates as well as topological phases with anomalous surface states of crystalline topological insulators and superconductors for key space-groups, layer-groups, and rod-groups. This is done in a unified manner for the full tenfold-way of Altland-Zirnbauer non-spatial symmetry classes. We establish a comprehensive paradigm that harnesses the modern mathematical framework of equivariant spectra; it allows us to obtain results applicable to generic topological classification problems. In particular, this paradigm provides efficient computational tools that enable an inherently unified treatment of the full tenfold-way.
  80. Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis (2019)

    Lorin Crawford, Anthea Monod, Andrew X. Chen, Sayan Mukherjee, Raúl Rabadán
    Abstract Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study in the field of cancer biology. Its rapid progression and the relative time cost of obtaining molecular data make other readily available forms of data, such as images, an important resource for actionable measures in patients. Our goal is to use information given by medical images taken from GBM patients in statistical settings. To do this, we design a novel statistic—the smooth Euler characteristic transform (SECT)—that quantifies magnetic resonance images of tumors. Due to its well-defined inner product structure, the SECT can be used in a wider range of functional and nonparametric modeling approaches than other previously proposed topological summary statistics. When applied to a cohort of GBM patients, we find that the SECT is a better predictor of clinical outcomes than both existing tumor shape quantifications and common molecular assays. Specifically, we demonstrate that SECT features alone explain more of the variance in GBM patient survival than gene expression, volumetric features, and morphometric features. The main takeaways from our findings are thus 2-fold. First, they suggest that images contain valuable information that can play an important role in clinical prognosis and other medical decisions. Second, they show that the SECT is a viable tool for the broader study of medical imaging informatics. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
  81. What Can Topology Tell Us About the Neural Code? (2017)

    Carina Curto
    Abstract Neuroscience is undergoing a period of rapid experimental progress and expansion. New mathematical tools, previously unknown in the neuroscience community, are now being used to tackle fundamental questions and analyze emerging data sets. Consistent with this trend, the last decade has seen an uptick in the use of topological ideas and methods in neuroscience. In this paper I will survey recent applications of topology in neuroscience, and explain why topology is an especially natural tool for understanding neural codes.
  82. Reconceiving the Hippocampal Map as a Topological Template (2014)

    Yuri Dabaghian, Vicky L. Brandt, Loren M. Frank
    Abstract The role of the hippocampus in spatial cognition is incontrovertible yet controversial. Place cells, initially thought to be location-specifiers, turn out to respond promiscuously to a wide range of stimuli. Here we test the idea, which we have recently demonstrated in a computational model, that the hippocampal place cells may ultimately be interested in a space's topological qualities (its connectivity) more than its geometry (distances and angles); such higher-order functioning would be more consistent with other known hippocampal functions. We recorded place cell activity in rats exploring morphing linear tracks that allowed us to dissociate the geometry of the track from its topology. The resulting place fields preserved the relative sequence of places visited along the track but did not vary with the metrical features of the track or the direction of the rat's movement. These results suggest a reinterpretation of previous studies and new directions for future experiments.
  83. A Topological Paradigm for Hippocampal Spatial Map Formation Using Persistent Homology (2012)

    Y. Dabaghian, F. Mémoli, L. Frank, G. Carlsson
    Abstract An animal's ability to navigate through space rests on its ability to create a mental map of its environment. The hippocampus is the brain region centrally responsible for such maps, and it has been assumed to encode geometric information (distances, angles). Given, however, that hippocampal output consists of patterns of spiking across many neurons, and downstream regions must be able to translate those patterns into accurate information about an animal's spatial environment, we hypothesized that 1) the temporal pattern of neuronal firing, particularly co-firing, is key to decoding spatial information, and 2) since co-firing implies spatial overlap of place fields, a map encoded by co-firing will be based on connectivity and adjacency, i.e., it will be a topological map. Here we test this topological hypothesis with a simple model of hippocampal activity, varying three parameters (firing rate, place field size, and number of neurons) in computer simulations of rat trajectories in three topologically and geometrically distinct test environments. Using a computational algorithm based on recently developed tools from Persistent Homology theory in the field of algebraic topology, we find that the patterns of neuronal co-firing can, in fact, convey topological information about the environment in a biologically realistic length of time. Furthermore, our simulations reveal a “learning region” that highlights the interplay between the parameters in combining to produce hippocampal states that are more or less adept at map formation. For example, within the learning region a lower number of neurons firing can be compensated by adjustments in firing rate or place field size, but beyond a certain point map formation begins to fail. We propose that this learning region provides a coherent theoretical lens through which to view conditions that impair spatial learning by altering place cell firing rates or spatial specificity., Our ability to navigate our environments relies on the ability of our brains to form an internal representation of the spaces we're in. The hippocampus plays a central role in forming this internal spatial map, and it is thought that the ensemble of active “place cells” (neurons that are sensitive to location) somehow encode metrical information about the environment, akin to a street map. Several considerations suggested to us, however, that the brain might be more interested in topological information—i.e., connectivity, containment, and adjacency, more akin to a subway map— so we employed new methods in computational topology to estimate how basic properties of neuronal firing affect the time required to form a hippocampal spatial map of three test environments. Our analysis suggests that, in order to encode topological information correctly and in a biologically reasonable amount of time, the hippocampal place cells must operate within certain parameters of neuronal activity that vary with both the geometric and topological properties of the environment. The interplay of these parameters forms a “learning region” in which changes in one parameter can successfully compensate for changes in the others; values beyond the limits of this region, however, impair map formation.
  84. From Topological Analyses to Functional Modeling: The Case of Hippocampus (2021)

    Yuri Dabaghian
    Abstract Topological data analyses are widely used for describing and conceptualizing large volumes of neurobiological data, e.g., for quantifying spiking outputs of large neuronal ensembles and thus understanding the functions of the corresponding networks. Below we discuss an approach in which convergent topological analyses produce insights into how information may be processed in mammalian hippocampus—a brain part that plays a key role in learning and memory. The resulting functional model provides a unifying framework for integrating spiking data at different timescales and following the course of spatial learning at different levels of spatiotemporal granularity. This approach allows accounting for contributions from various physiological phenomena into spatial cognition—the neuronal spiking statistics, the effects of spiking synchronization by different brain waves, the roles played by synaptic efficacies and so forth. In particular, it is possible to demonstrate that networks with plastic and transient synaptic architectures can encode stable cognitive maps, revealing the characteristic timescales of memory processing.
  85. Raw Material Flow Optimization as a Capacitated Vehicle Routing Problem: A Visual Benchmarking Approach for Sustainable Manufacturing (2017)

    Michele Dassisti, Yasamin Eslami, Matin Mohaghegh
    Abstract Optimisation problem concerning material flows, to increase the efficiency while reducing relative resource consumption is one of the most pressing problems today. The focus point of this study is to propose a new visual benchmarking approach to select the best material-flow path from the depot to the production lines, referring to the well-known Capacitated Vehicle Routing Problem (CVRP). An example industrial case study is considered to this aim. Two different solution techniques were adopted (namely Mixed Integer Linear Programming and the Ant Colony Optimization) in searching optimal solutions to the CVRP. The visual benchmarking proposed, based on the persistent homology approach, allowed to support the comparison of the optimal solutions based on the entropy of the output in different scenarios. Finally, based on the non-standard measurements of Crossing Length Percentage (CLP), the visual benchmarking procedure makes it possible to find the most practical and applicable solution to CVRP by considering the visual attractiveness and the quality of the routes.
  86. Coordinate-Free Coverage in Sensor Networks With Controlled Boundaries via Homology (2006)

    V. de Silva, R. Ghrist
    Abstract Tools from computational homology are introduced to verify coverage in an idealized sensor network. These methods are unique in that, while they are coordinate-free and assume no localization or orientation capabilities for the nodes, there are also no probabilistic assumptions. The key ingredient is the theory of homology from algebraic topology. The robustness of these tools is demonstrated by adapting them to a variety of settings, including static planar coverage, 3-D barrier coverage, and time-dependent sweeping coverage. Results are also given on hole repair, error tolerance, optimal coverage, and variable radii. An overview of implementation is given.
  87. Morse Theory and Persistent Homology for Topological Analysis of 3D Images of Complex Materials (2014)

    O. Delgado-Friedrichs, V. Robins, A. Sheppard
    Abstract We develop topologically accurate and compatible definitions for the skeleton and watershed segmentation of a 3D digital object that are computed by a single algorithm. These definitions are based on a discrete gradient vector field derived from a signed distance transform. This gradient vector field is amenable to topological analysis and simplification via For-man's discrete Morse theory and provides a filtration that can be used as input to persistent homology algorithms. Efficient implementations allow us to process large-scale x-ray micro-CT data of rock cores and other materials.
  88. Skeletonization and Partitioning of Digital Images Using Discrete Morse Theory (2015)

    Olaf Delgado-Friedrichs, Vanessa Robins, Adrian Sheppard
    Abstract We show how discrete Morse theory provides a rigorous and unifying foundation for defining skeletons and partitions of grayscale digital images. We model a grayscale image as a cubical complex with a real-valued function defined on its vertices (the voxel values). This function is extended to a discrete gradient vector field using the algorithm presented in Robins, Wood, Sheppard TPAMI 33:1646 (2011). In the current paper we define basins (the building blocks of a partition) and segments of the skeleton using the stable and unstable sets associated with critical cells. The natural connection between Morse theory and homology allows us to prove the topological validity of these constructions; for example, that the skeleton is homotopic to the initial object. We simplify the basins and skeletons via Morse-theoretic cancellation of critical cells in the discrete gradient vector field using a strategy informed by persistent homology. Simple working Python code for our algorithms for efficient vector field traversal is included. Example data are taken from micro-CT images of porous materials, an application area where accurate topological models of pore connectivity are vital for fluid-flow modelling.
  89. Gene Expression Data Classification Using Topology and Machine Learning Models (2022)

    Tamal K. Dey, Sayan Mandal, Soham Mukherjee
    Abstract Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.
  90. A Mayer–Vietoris Formula for Persistent Homology With an Application to Shape Recognition in the Presence of Occlusions (2011)

    Barbara Di Fabio, Claudia Landi
    Abstract In algebraic topology it is well known that, using the Mayer–Vietoris sequence, the homology of a space X can be studied by splitting X into subspaces A and B and computing the homology of A, B, and A∩B. A natural question is: To what extent does persistent homology benefit from a similar property? In this paper we show that persistent homology has a Mayer–Vietoris sequence that is generally not exact but only of order 2. However, we obtain a Mayer–Vietoris formula involving the ranks of the persistent homology groups of X, A, B, and A∩B plus three extra terms. This implies that persistent homological features of A and B can be found either as persistent homological features of X or of A∩B. As an application of this result, we show that persistence diagrams are able to recognize an occluded shape by showing a common subset of points.
  91. Image-Based Phenotyping for Identification of QTL Determining Fruit Shape and Size in American Cranberry (Vaccinium Macrocarpon L.) (2018)

    Luis Diaz-Garcia, Giovanny Covarrubias-Pazaran, Brandon Schlautman, Edward Grygleski, Juan Zalapa
    Abstract Image-based phenotyping methodologies are powerful tools to determine quality parameters for fruit breeders and processors. The fruit size and shape of American cranberry (Vaccinium macrocarpon L.) are particularly important characteristics that determine the harvests’ processing value and potential end-use products (e.g., juice vs. sweetened dried cranberries). However, cranberry fruit size and shape attributes can be difficult and time consuming for breeders and processors to measure, especially when relying on manual measurements and visual ratings. Therefore, in this study, we implemented image-based phenotyping techniques for gathering data regarding basic cranberry fruit parameters such as length, width, length-to-width ratio, and eccentricity. Additionally, we applied a persistent homology algorithm to better characterize complex shape parameters. Using this high-throughput artificial vision approach, we characterized fruit from 351 progeny from a full-sib cranberry population over three field seasons. Using a covariate analysis to maximize the identification of well-supported quantitative trait loci (QTL), we found 252 single QTL in a 3-year period for cranberry fruit size and shape descriptors from which 20% were consistently found in all years. The present study highlights the potential for the identified QTL and the image-based methods to serve as a basis for future explorations of the genetic architecture of fruit size and shape in cranberry and other fruit crops.
  92. Topological Data Analysis for Arrhythmia Detection Through Modular Neural Networks (2020)

    Meryll Dindin, Yuhei Umeda, Frederic Chazal
    Abstract This paper presents an innovative and generic deep learning approach to monitor heart conditions from ECG signals. We focus our attention on both the detection and classification of abnormal heartbeats, known as arrhythmia. We strongly insist on generalization throughout the construction of a shallow deep-learning model that turns out to be effective for new unseen patient. The novelty of our approach relies on the use of topological data analysis to deal with individual differences. We show that our structure reaches the performances of the state-of-the-art methods for both arrhythmia detection and classification.
  93. Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

    Arjuna P. H. Don, James F. Peters
    Abstract This article introduces an application of Ghrist barcodes in the study of persistent Betti numbers derived from vortex nerve complexes found in triangulations of video frames. A Ghrist barcode (also called a persistence barcode) is a topology of data pic- tograph useful in representing the persistence of the features of changing shapes. The basic approach is to introduce a free Abelian group representation of intersecting filled polygons on the barycenters of the triangles of Alexandroff nerves. An Alexandroff nerve is a maximal collection of triangles of a common vertex in the triangulation of a finite, bounded planar region. In our case, the planar region is a video frame. A Betti number is a count of the number of generators is a finite Abelian group. The focus here is on the persistent Betti numbers across sequences of triangulated video frames. Each Betti number is mapped to an entry in a Ghrist barcode. Two main results are given, namely, vortex nerves are Edelsbrunner-Harer nerve complexes and the Betti number of a vortex nerve equals k + 2 for a vortex nerve containing k edges attached between a pair of vortex cycles in the nerve.
  94. RGB Image-Based Data Analysis via Discrete Morse Theory and Persistent Homology (2018)

    Chuan Du, Christopher Szul, Adarsh Manawa, Nima Rasekh, Rosemary Guzman, Ruth Davidson
    Abstract Understanding and comparing images for the purposes of data analysis is currently a very computationally demanding task. A group at Australian National University (ANU) recently developed open-source code that can detect fundamental topological features of a grayscale image in a computationally feasible manner. This is made possible by the fact that computers store grayscale images as cubical cellular complexes. These complexes can be studied using the techniques of discrete Morse theory. We expand the functionality of the ANU code by introducing methods and software for analyzing images encoded in red, green, and blue (RGB), because this image encoding is very popular for publicly available data. Our methods allow the extraction of key topological information from RGB images via informative persistence diagrams by introducing novel methods for transforming RGB-to-grayscale. This paradigm allows us to perform data analysis directly on RGB images representing water scarcity variability as well as crime variability. We introduce software enabling a a user to predict future image properties, towards the eventual aim of more rapid image-based data behavior prediction.
  95. The (Homological) Persistence of Gerrymandering (2021)

    Moon Duchin, Tom Needham, Thomas Weighill
    Abstract \textlessp style='text-indent:20px;'\textgreaterWe apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. We begin by combining geographic and electoral data from a districting plan to produce a persistence diagram. Then, to see beyond a particular plan and understand the possibilities afforded by the choices made in redistricting, we build methods to visualize and analyze large ensembles of alternative plans. Our detailed case studies use zero-dimensional homology (persistent components) of filtered graphs constructed from voting data to analyze redistricting in Pennsylvania and North Carolina. We find that, across large ensembles of partitions, the features cluster in the persistence diagrams in a way that corresponds strongly to geographic location, so that we can construct an average diagram for an ensemble, with each point identified with a geographical region. Using this localization lets us produce zonings of each state at Congressional, state Senate, and state House scales, show the regional non-uniformity of election shifts, and identify attributes of partitions that tend to correspond to partisan advantage.\textless/p\textgreater\textlessp style='text-indent:20px;'\textgreaterThe methods here are set up to be broadly applicable to the use of TDA on large ensembles of data. Many studies will benefit from interpretable summaries of large sets of samples or simulations, and the work here on localization and zoning will readily generalize to other partition problems, which are abundant in scientific applications. For the mathematically and politically rich problem of redistricting in particular, TDA provides a powerful and elegant summarization tool whose findings will be useful for practitioners.\textless/p\textgreater
  96. Improving Health Care Management Through Persistent Homology of Time-Varying Variability of Emergency Department Patient Flow (2018)

    Mael Dugast, Guillaume Bouleux, Olivier Mory, Eric Marcon
    Abstract Excessive admissions at the Emergency Department (ED) is a phenomenon very closely linked to the propagation of viruses. It is a cause of overcrowding for EDs and a public health problem. The aim of this work is to give EDs’ leaders more time for decision making during this period. Based on the admissions time series associated with specific clinical diagnoses, we will first perform a Detrended Fluctuation Analysis (DFA) to obtain the corresponding variability time series. Next, we will embed this time series on a manifold to obtain a point cloud representation and use Topological Data Analysis (TDA) through persistent homology technic to propose two early realtime indicators. One is the early indicator of abnormal arrivals at the ED whereas the second gives the information on the time index of the maximum number of arrivals. The performance of the detectors is parameter dependent and it can evolve each year. That is why we also propose to solve a bi-objective optimization problem to track the variations of this parameter.
  97. Gene Coexpression Network Comparison via Persistent Homology (2018)

    Ali Nabi Duman, Harun Pirim
    Abstract Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies.
  98. The Classification of Endoscopy Images With Persistent Homology (2016)

    Olga Dunaeva, Herbert Edelsbrunner, Anton Lukyanov, Michael Machin, Daria Malkova, Roman Kuvaev, Sergey Kashin
    Abstract Aiming at the automatic diagnosis of tumors using narrow band imaging (NBI) magnifying endoscopic (ME) images of the stomach, we combine methods from image processing, topology, geometry, and machine learning to classify patterns into three classes: oval, tubular and irregular. Training the algorithm on a small number of images of each type, we achieve a high rate of correct classifications. The analysis of the learning algorithm reveals that a handful of geometric and topological features are responsible for the overwhelming majority of decisions.
  99. When Remote Sensing Meets Topological Data Analysis (2018)

    Ludovic Duponchel
    Abstract Author Summary: Hyperspectral remote sensing plays an increasingly important role in many scientific domains and everyday life problems. Indeed, this imaging concept ends up in applications as varied as catching tax-evaders red-handed by locating new construction and building alterations, searching for aircraft and saving lives after fatal crashes, detecting oil spills for marine life and environmental preservation, spying on enemies with reconnaissance satellites, watching algae grow as an indicator of environmental health, forecasting weather to warn about natural disasters and much more. From an instrumental point of view, we can say that the actual spectrometers have rather good characteristics, even if we can always increase spatial resolution and spectral range. In order to extract ever more information from such experiments and develop new applications, we must, therefore, propose multivariate data analysis tools able to capture the shape of data sets and their specific features. Nevertheless, actual methods often impose a data model which implicitly defines the geometry of the data set. The aim of the paper is thus to introduce the concept of topological data analysis in the framework of remote sensing, making no assumptions about the global shape of the data set, but also allowing the capture of its local features.
  100. Simplicial Neural Networks (2020)

    Stefania Ebli, Michaël Defferrard, Gard Spreemann
    Abstract We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices - allowing us to consider richer data, including vector fields and \$n\$-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes.
  101. TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through Topology-Based Machine Learning (2021)

    Parker Edwards, Kristen Skruber, Nikola Milićević, James B. Heidings, Tracy-Ann Read, Peter Bubenik, Eric A. Vitriol
    Abstract Recent advances in machine learning have greatly enhanced automatic methods to extract information from fluorescence microscopy data. However, current machine-learning-based models can require hundreds to thousands of images to train, and the most readily accessible models classify images without describing which parts of an image contributed to classification. Here, we introduce TDAExplore, a machine learning image analysis pipeline based on topological data analysis. It can classify different types of cellular perturbations after training with only 20–30 high-resolution images and performs robustly on images from multiple subjects and microscopy modes. Using only images and whole-image labels for training, TDAExplore provides quantitative, spatial information, characterizing which image regions contribute to classification. Computational requirements to train TDAExplore models are modest and a standard PC can perform training with minimal user input. TDAExplore is therefore an accessible, powerful option for obtaining quantitative information about imaging data in a wide variety of applications.
  102. Classification of Skin Lesions by Topological Data Analysis Alongside With Neural Network (2020)

    Naiereh Elyasi, Mehdi Hosseini Moghadam
    Abstract In this paper we use TDA mapper alongside with deep convolutional neural networks in the classification of 7 major skin diseases. First we apply kepler mapper with neural network as one of its filter steps to classify the dataset HAM10000. Mapper visualizes the classification result by a simplicial complex, where neural network can not do this alone, but as a filter step neural network helps to classify data better. Furthermore we apply TDA mapper and persistent homology to understand the weights of layers of mobilenet network in different training epochs of HAM10000. Also we use persistent diagrams to visualize the results of analysis of layers of mobilenet network.
  103. Characterizing Scales of Genetic Recombination and Antibiotic Resistance in Pathogenic Bacteria Using Topological Data Analysis (2014)

    Kevin J. Emmett, Raul Rabadan
    Abstract Pathogenic bacteria present a large disease burden on human health. Control of these pathogens is hampered by rampant lateral gene transfer, whereby pathogenic strains may acquire genes conferring resistance to common antibiotics. Here we introduce tools from topological data analysis to characterize the frequency and scale of lateral gene transfer in bacteria, focusing on a set of pathogens of significant public health relevance. As a case study, we examine the spread of antibiotic resistance in Staphylococcus aureus. Finally, we consider the possible role of the human microbiome as a reservoir for antibiotic resistance genes.
  104. Parametric Inference Using Persistence Diagrams: a Case Study in Population Genetics (2014)

    Kevin Emmett, Daniel Rosenbloom, Pablo Camara, Raul Rabadan
    Abstract Persistent homology computes topological invariants from point cloud data. Recent work has focused on developing statistical methods for data analysis in this framework. We show that, in certain models, parametric inference can be performed using statistics defined on the computed invariants. We develop this idea with a model from population genetics, the coalescent with recombination. We apply our model to an influenza dataset, identifying two scales of topological structure which have a distinct biological interpretation.
  105. Mapping Firms' Locations in Technological Space: A Topological Analysis of Patent Statistics (2020)

    Emerson G. Escolar, Yasuaki Hiraoka, Mitsuru Igami, Yasin Ozcan
    Abstract Where do firms innovate? Mapping their locations in technological space is difficult, because it is high dimensional and unstructured. We address this issue by using a method in computational topology called the Mapper algorithm, which combines local clustering with global reconstruction. We apply this method to a panel of 333 major firms’ patent portfolios in 1976–2005 across 430 technological areas. Results suggest the Mapper graph captures salient patterns in firms’ patenting histories, and our measures of their uniqueness (the length of “flares”) are correlated with firms’ financial performances in a statistically and economically significant manner. We then compare this approach with a widely used clustering method by Jaffe (1989) to highlight additional findings.
  106. Persistent Homology of Geospatial Data: A Case Study With Voting (2021)

    Michelle Feng, Mason A. Porter
    Abstract A crucial step in the analysis of persistent homology is the transformation of data into an appropriate topological object (which, in our case, is a simplicial complex). Software packages for computing persistent homology typically construct Vietoris--Rips or other distance-based simplicial complexes on point clouds because they are relatively easy to compute. We investigate alternative methods of constructing simplicial complexes and the effects of making associated choices during simplicial-complex construction on the output of persistent-homology algorithms. We present two new methods for constructing simplicial complexes from two-dimensional geospatial data (such as maps). We apply these methods to a California precinct-level voting data set, and we thereby demonstrate that our new constructions can capture geometric characteristics that are missed by distance-based constructions. Our new constructions can thus yield more interpretable persistence modules and barcodes for geospatial data. In particular, they are able to distinguish short-persistence features that occur only for a narrow range of distance scales (e.g., voting patterns in densely populated cities) from short-persistence noise by incorporating information about other spatial relationships between regions.
  107. Spatial Applications of Topological Data Analysis: Cities, Snowflakes, Random Structures, and Spiders Spinning Under the Influence (2020)

    Michelle Feng, Mason A. Porter
    Abstract Spatial networks are ubiquitous in social, geographic, physical, and biological applications. To understand their large-scale structure, it is important to develop methods that allow one to directly probe the effects of space on structure and dynamics. Historically, algebraic topology has provided one framework for rigorously and quantitatively describing the global structure of a space, and recent advances in topological data analysis (TDA) have given scholars a new lens for analyzing network data. In this paper, we study a variety of spatial networks --- including both synthetic and natural ones --- using novel topological methods that we recently developed specifically for analyzing spatial networks. We demonstrate that our methods are able to capture meaningful quantities, with specifics that depend on context, in spatial networks and thereby provide useful insights into the structure of those networks, including a novel approach for characterizing them based on their topological structures. We illustrate these ideas with examples of synthetic networks and dynamics on them, street networks in cities, snowflakes, and webs spun by spiders under the influence of various psychotropic substances.
  108. Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology (2021)

    David Pérez Fernández, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas
    Abstract Artificial Neural Networks (ANNs) are widely used for approximating complex functions. The process that is usually followed to define the most appropriate architecture for an ANN given a specific function is mostly empirical. Once this architecture has been defined, weights are usually optimized according to the error function. On the other hand, we observe that ANNs can be represented as graphs and their topological 'fingerprints' can be obtained using Persistent Homology (PH). In this paper, we describe a proposal focused on designing more principled architecture search procedures. To do this, different architectures for solving problems related to a heterogeneous set of datasets have been analyzed. The results of the evaluation corroborate that PH effectively characterizes the ANN invariants: when ANN density (layers and neurons) or sample feeding order is the only difference, PH topological invariants appear; in the opposite direction in different sub-problems (i.e. different labels), PH varies. This approach based on topological analysis helps towards the goal of designing more principled architecture search procedures and having a better understanding of ANNs.
  109. Topological Biomarkers for Real-Time Detection of Epileptic Seizures (2022)

    Ximena Fernández, Diego Mateos
    Abstract Automated seizure detection is a fundamental problem in computational neuroscience towards diagnosis and treatment's improvement of epileptic disease. We propose a real-time computational method for automated tracking and detection of epileptic seizures from raw neurophysiological recordings. Our mechanism is based on the topological analysis of the sliding-window embedding of the time series derived from simultaneously recorded channels. We extract topological biomarkers from the signals via the computation of the persistent homology of time-evolving topological spaces. Remarkably, the proposed biomarkers robustly captures the change in the brain dynamics during the ictal state. We apply our methods in different types of signals including scalp and intracranial EEG and MEG, in patients during interictal and ictal states, showing high accuracy in a range of clinical situations.
  110. Persistent Betti Numbers for a Noise Tolerant Shape-Based Approach to Image Retrieval (2011)

    Patrizio Frosini, Claudia Landi
    Abstract In content-based image retrieval a major problem is the presence of noisy shapes. It is well known that persistent Betti numbers are a shape descriptor that admits a dissimilarity distance, the matching distance, stable under continuous shape deformations. In this paper we focus on the problem of dealing with noise that changes the topology of the studied objects. We present a general method to turn persistent Betti numbers into stable descriptors also in the presence of topological changes. Retrieval tests on the Kimia-99 database show the effectiveness of the method.
  111. Topological Regularization for Dense Prediction (2021)

    Deqing Fu, Bradley J. Nelson
    Abstract Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with these topological descriptions. Experimental results show that the output topology can also appear in the internal activations of trained neural networks which allows for a novel use of topological regularization to the internal states of neural networks during training, reducing the computational cost of the regularization. We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures.
  112. A Topological Measurement of Protein Compressibility (2015)

    Marcio Gameiro, Yasuaki Hiraoka, Shunsuke Izumi, Miroslav Kramar, Konstantin Mischaikow, Vidit Nanda
    Abstract In this paper we partially clarify the relation between the compressibility of a protein and its molecular geometric structure. To identify and understand the relevant topological features within a given protein, we model its molecule as an alpha filtration and hence obtain multi-scale insight into the structure of its tunnels and cavities. The persistence diagrams of this alpha filtration capture the sizes and robustness of such tunnels and cavities in a compact and meaningful manner. From these persistence diagrams, we extract a measure of compressibility derived from those topological features whose relevance is suggested by physical and chemical properties. Due to recent advances in combinatorial topology, this measure is efficiently and directly computable from information found in the Protein Data Bank (PDB). Our main result establishes a clear linear correlation between the topological measure and the experimentally-determined compressibility of most proteins for which both PDB information and experimental compressibility data are available. Finally, we establish that both the topological measurement and the linear correlation are stable with respect to small perturbations in the input data, such as those arising from experimental errors in compressibility and X-ray crystallography experiments.
  113. The Geometry of Synchronization Problems and Learning Group Actions (2019)

    Tingran Gao, Jacek Brodzki, Sayan Mukherjee
    Abstract We develop a geometric framework, based on the classical theory of fibre bundles, to characterize the cohomological nature of a large class of synchronization-type problems in the context of graph inference and combinatorial optimization. We identify each synchronization problem in topological group G on connected graph ΓΓ\Gamma with a flat principal G-bundle over ΓΓ\Gamma , thus establishing a classification result for synchronization problems using the representation variety of the fundamental group of ΓΓ\Gamma into G. We then develop a twisted Hodge theory on flat vector bundles associated with these flat principal G-bundles, and provide a geometric realization of the graph connection Laplacian as the lowest-degree Hodge Laplacian in the twisted de Rham–Hodge cochain complex. Motivated by these geometric intuitions, we propose to study the problem of learning group actions—partitioning a collection of objects based on the local synchronizability of pairwise correspondence relations—and provide a heuristic synchronization-based algorithm for solving this type of problems. We demonstrate the efficacy of this algorithm on simulated and real datasets.
  114. Toroidal Topology of Population Activity in Grid Cells (2022)

    Richard J. Gardner, Erik Hermansen, Marius Pachitariu, Yoram Burak, Nils A. Baas, Benjamin A. Dunn, May-Britt Moser, Edvard I. Moser
    Abstract The medial entorhinal cortex is part of a neural system for mapping the position of an individual within a physical environment1. Grid cells, a key component of this system, fire in a characteristic hexagonal pattern of locations2, and are organized in modules3 that collectively form a population code for the animal’s allocentric position1. The invariance of the correlation structure of this population code across environments4,5 and behavioural states6,7, independent of specific sensory inputs, has pointed to intrinsic, recurrently connected continuous attractor networks (CANs) as a possible substrate of the grid pattern1,8–11. However, whether grid cell networks show continuous attractor dynamics, and how they interface with inputs from the environment, has remained unclear owing to the small samples of cells obtained so far. Here, using simultaneous recordings from many hundreds of grid cells and subsequent topological data analysis, we show that the joint activity of grid cells from an individual module resides on a toroidal manifold, as expected in a two-dimensional CAN. Positions on the torus correspond to positions of the moving animal in the environment. Individual cells are preferentially active at singular positions on the torus. Their positions are maintained between environments and from wakefulness to sleep, as predicted by CAN models for grid cells but not by alternative feedforward models12. This demonstration of network dynamics on a toroidal manifold provides a population-level visualization of CAN dynamics in grid cells.
  115. The Emergence of Higher-Order Structure in Scientific and Technological Knowledge Networks (2020)

    Thomas Gebhart, Russell J. Funk
    Abstract The growth of science and technology is primarily a recombinative process, wherein new discoveries and inventions are generally built from prior knowledge. While the recent past has seen rapid growth in scientific and technological knowledge, relatively little is known about the manner in which science and technology develop and coalesce knowledge into larger structures that enable or constrain future breakthroughs. Network science has recently emerged as a framework for measuring the structure and dynamics of knowledge. While helpful, these existing approaches struggle to capture the global structural properties of the underlying networks, leading to conflicting observations about the nature of scientific and technological progress. We bridge this methodological gap using tools from algebraic topology to characterize the higher-order structure of knowledge networks in science and technology across scale. We observe rapid and varied growth in the high-dimensional structure in many fields of science and technology, and find this high-dimensional growth coincides with decline in lower-dimensional structure. This higher-order growth in knowledge networks has historically far outpaced the growth in scientific and technological collaboration networks. We also characterize the relationship between higher-order structure and the nature of the science and technology produced within these structural environments and find a positive relationship between the abstractness of language used within fields and increasing high-dimensional structure. We also find a robust relationship between high-dimensional structure and number of metrics for publication success, implying this high-dimensional structure may be linked to discovery and invention.
  116. Go With the Flow? A Large-Scale Analysis of Health Care Delivery Networks in the United States Using Hodge Theory (2021)

    Thomas Gebhart, Xiaojun Fu, Russell J. Funk
    Abstract Health care delivery is a collaborative process, requiring close coordination among networks of providers with specialized expertise. Yet in the United States, care is often spread across multiple disconnected providers (e.g., primary care physicians, specialists), leading to fragmented care delivery networks, and contributing to higher costs and lower quality. While this problem is well known, there are relatively few quantitative tools available for characterizing the dynamics of care delivery networks at scale, thereby inhibiting deeper understanding of care fragmentation and efforts to address it. In this, study, we conduct a large-scale analysis of care delivery networks across the United States using the discrete Hodge decomposition, an emerging method of topological data analysis. Using this technique, we decompose networks of patient flows among physicians into three orthogonal subspaces: gradient (acyclic flow), harmonic (global cyclic flow), and curl (local cyclic flow). We document substantial variation in the relative importance of each subspace, suggesting that there may be systematic differences in the organization of care delivery networks across health care markets. Moreover, we find that the relative importance of each subspace is predictive of local care cost and quality, with outcomes tending to be better with greater curl flow and worse with greater harmonic flow.
  117. Phase-Field Investigation of the Coarsening of Porous Structures by Surface Diffusion (2019)

    Pierre-Antoine Geslin, Mickaël Buchet, Takeshi Wada, Hidemi Kato
    Abstract Nano and microporous connected structures have attracted increasing attention in the past decades due to their high surface area, presenting interesting properties for a number of applications. These structures generally coarsen by surface diffusion, leading to an enlargement of the structure characteristic length scale. We propose to study this coarsening behavior using a phase-field model for surface diffusion. In addition to reproducing the expected scaling law, our simulations enable to investigate precisely the evolution of the topological and morphological characteristics along the coarsening process. In particular, we show that after a transient regime, the coarsening is self-similar as exhibited by the evolution of both morphological and topological features. In addition, the influence of surface anisotropy is discussed and comparisons with experimental tomographic observations are presented.
  118. A Novel Method of Extracting Topological Features From Word Embeddings (2020)

    Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
    Abstract In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.
  119. Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)

    Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
    Abstract Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.
  120. Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

    Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny
    Abstract While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.
  121. Positive Alexander Duality for Pursuit and Evasion (2017)

    Robert Ghrist, Sanjeevi Krishnan
    Abstract Considered is a class of pursuit-evasion games, in which an evader tries to avoid detection. Such games can be formulated as the search for sections to the complement of a coverage region in a Euclidean space over time. Prior results give homological criteria for evasion in the general case that are not necessary and sufficient. This paper provides a necessary and sufficient positive cohomological criterion for evasion in the general case. The principal tools are (1) a refinement of the Čech cohomology of a coverage region with a positive cone encoding spatial orientation, (2) a refinement of the Borel--Moore homology of the coverage gaps with a positive cone encoding time orientation, and (3) a positive variant of Alexander Duality. Positive cohomology decomposes as the global sections of a sheaf of local positive cohomology over the time axis; we show how this decomposition makes positive cohomology computable using techniques of computational polyhedral geometry and linear programming.
  122. Persistent Homology Machine Learning for Fingerprint Classification (2019)

    N. Giansiracusa, R. Giansiracusa, C. Moon
    Abstract The fingerprint classification problem is to sort fingerprints into predetermined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near state-of-the-art classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3-dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint ink-roll images, which yields a lower accuracy rate but still shows promise; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD-27.
  123. Topological Data Analysis of Financial Time Series: Landscapes of Crashes (2017)

    Marian Gidea, Yuri Katz
    Abstract We explore the evolution of daily returns of four major US stock market indices during the technology crash of 2000, and the financial crisis of 2007-2009. Our methodology is based on topological data analysis (TDA). We use persistence homology to detect and quantify topological patterns that appear in multidimensional time series. Using a sliding window, we extract time-dependent point cloud data sets, to which we associate a topological space. We detect transient loops that appear in this space, and we measure their persistence. This is encoded in real-valued functions referred to as a 'persistence landscapes'. We quantify the temporal changes in persistence landscapes via their \$L\textasciicircump\$-norms. We test this procedure on multidimensional time series generated by various non-linear and non-equilibrium models. We find that, in the vicinity of financial meltdowns, the \$L\textasciicircump\$-norms exhibit strong growth prior to the primary peak, which ascends during a crash. Remarkably, the average spectral density at low frequencies of the time series of \$L\textasciicircump\$-norms of the persistence landscapes demonstrates a strong rising trend for 250 trading days prior to either dotcom crash on 03/10/2000, or to the Lehman bankruptcy on 09/15/2008. Our study suggests that TDA provides a new type of econometric analysis, which goes beyond the standard statistical measures. The method can be used to detect early warning signals of imminent market crashes. We believe that this approach can be used beyond the analysis of financial time series presented here.
  124. WDR76 Co-Localizes With Heterochromatin Related Proteins and Rapidly Responds to DNA Damage (2016)

    Joshua M. Gilmore, Mihaela E. Sardiu, Brad D. Groppe, Janet L. Thornton, Xingyu Liu, Gerald Dayebgadoh, Charles A. Banks, Brian D. Slaughter, Jay R. Unruh, Jerry L. Workman, Laurence Florens, Michael P. Washburn
    Abstract Proteins that respond to DNA damage play critical roles in normal and diseased states in human biology. Studies have suggested that the S. cerevisiae protein CMR1/YDL156w is associated with histones and is possibly associated with DNA repair and replication processes. Through a quantitative proteomic analysis of affinity purifications here we show that the human homologue of this protein, WDR76, shares multiple protein associations with the histones H2A, H2B, and H4. Furthermore, our quantitative proteomic analysis of WDR76 associated proteins demonstrated links to proteins in the DNA damage response like PARP1 and XRCC5 and heterochromatin related proteins like CBX1, CBX3, and CBX5. Co-immunoprecipitation studies validated these interactions. Next, quantitative imaging studies demonstrated that WDR76 was recruited to laser induced DNA damage immediately after induction, and we compared the recruitment of WDR76 to laser induced DNA damage to known DNA damage proteins like PARP1, XRCC5, and RPA1. In addition, WDR76 co-localizes to puncta with the heterochromatin proteins CBX1 and CBX5, which are also recruited to DNA damage but much less intensely than WDR76. This work demonstrates the chromatin and DNA damage protein associations of WDR76 and demonstrates the rapid response of WDR76 to laser induced DNA damage.
  125. Clique Topology Reveals Intrinsic Geometric Structure in Neural Correlations (2015)

    Chad Giusti, Eva Pastalkova, Carina Curto, Vladimir Itskov
    Abstract Detecting structure in neural activity is critical for understanding the function of neural circuits. The coding properties of neurons are typically investigated by correlating their responses to external stimuli. It is not clear, however, if the structure of neural activity can be inferred intrinsically, without a priori knowledge of the relevant stimuli. We introduce a novel method, called clique topology, that detects intrinsic structure in neural activity that is invariant under nonlinear monotone transformations. Using pairwise correlations of neurons in the hippocampus, we demonstrate that our method is capable of detecting geometric structure from neural activity alone, without appealing to external stimuli or receptive fields.Detecting meaningful structure in neural activity and connectivity data is challenging in the presence of hidden nonlinearities, where traditional eigenvalue-based methods may be misleading. We introduce a novel approach to matrix analysis, called clique topology, that extracts features of the data invariant under nonlinear monotone transformations. These features can be used to detect both random and geometric structure, and depend only on the relative ordering of matrix entries. We then analyzed the activity of pyramidal neurons in rat hippocampus, recorded while the animal was exploring a 2D environment, and confirmed that our method is able to detect geometric organization using only the intrinsic pattern of neural correlations. Remarkably, we found similar results during nonspatial behaviors such as wheel running and rapid eye movement (REM) sleep. This suggests that the geometric structure of correlations is shaped by the underlying hippocampal circuits and is not merely a consequence of position coding. We propose that clique topology is a powerful new tool for matrix analysis in biological settings, where the relationship of observed quantities to more meaningful variables is often nonlinear and unknown.
  126. Topological Electronic Structure and Weyl Points in Nonsymmorphic Hexagonal Materials (2020)

    Rafael González-Hernández, Erick Tuiran, Bernardo Uribe
    Abstract Using topological band theory analysis we show that the nonsymmorphic symmetry operations in hexagonal lattices enforce Weyl points at the screw-invariant high-symmetry lines of the band structure. The corepresentation theory and connectivity group theory show that Weyl points are generated by band crossings in accordion-like and hourglass-like dispersion relations. These Weyl points are stable against weak perturbations and are protected by the screw rotation symmetry. Based on first-principles calculations we found a complete agreement between the topological predicted energy dispersion relations and real hexagonal materials. Topological charge (chirality) and Berry curvature calculations show the simultaneous formation of Weyl points and nodal-lines in 4d transition-metal trifluorides such as AgF3 and AuF3. Furthermore, a large intrinsic spin-Hall conductivity was found due to the combined strong spin-orbit coupling and multiple Weyl-point crossings in the electronic structure. These materials could be used to the spin/charge conversion in more energy-efficient spintronic devices.
  127. Prediction in Cancer Genomics Using Topological Signatures and Machine Learning (2020)

    Georgina Gonzalez, Arina Ushakova, Radmila Sazdanovic, Javier Arsuaga
    Abstract Copy Number Aberrations, gains and losses of genomic regions, are a hallmark of cancer and can be experimentally detected using microarray comparative genomic hybridization (aCGH). In previous works, we developed a topology based method to analyze aCGH data whose output are regions of the genome where copy number is altered in patients with a predetermined cancer phenotype. We call this method Topological Analysis of array CGH (TAaCGH). Here we combine TAaCGH with machine learning techniques to build classifiers using copy number aberrations. We chose logistic regression on two different binary phenotypes related to breast cancer to illustrate this approach. The first case consists of patients with over-expression of the ERBB2 gene. Over-expression of ERBB2 is commonly regulated by a copy number gain in chromosome arm 17q. TAaCGH found the region 17q11-q22 associated with the phenotype and using logistic regression we reduced this region to 17q12-q21.31 correctly classifying 78% of the ERBB2 positive individuals (sensitivity) in a validation data set. We also analyzed over-expression in Estrogen Receptor (ER), a second phenotype commonly observed in breast cancer patients and found that the region 5p14.3-12 together with six full arms were associated with the phenotype. Our method identified 4p, 6p and 16q as the strongest predictors correctly classifying 76% of ER positives in our validation data set. However, for this set there was a significant increase in the false positive rate (specificity). We suggest that topological and machine learning methods can be combined for prediction of phenotypes using genetic data.
  128. Complexes of Tournaments, Directionality Filtrations and Persistent Homology (2020)

    Dejan Govc, Ran Levi, Jason P. Smith
    Abstract Complete digraphs are referred to in the combinatorics literature as tournaments. We consider a family of semi-simplicial complexes, that we refer to as "tournaplexes", whose simplices are tournaments. In particular, given a digraph \$\mathcal\G\\$, we associate with it a "flag tournaplex" which is a tournaplex containing the directed flag complex of \$\mathcal\G\\$, but also the geometric realisation of cliques that are not directed. We define several types of filtrations on tournaplexes, and exploiting persistent homology, we observe that flag tournaplexes provide finer means of distinguishing graph dynamics than the directed flag complex. We then demonstrate the power of these ideas by applying them to graph data arising from the Blue Brain Project's digital reconstruction of a rat's neocortex.
  129. Development of the Functional Connectome Topology in Adolescence: Evidence From Topological Data Analysis (2021)

    Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Martha Beatriz Moreno, Fernando A. Barrios, Sarael Alcauter
    Abstract Adolescence is a crucial developmental period in terms of behavior and mental health. Therefore, understanding how the brain develops during this stage is a fundamental challenge for neuroscience. Recent studies have modelled the brain as a network or connectome, mainly applying measures from graph theory, showing a change in its functional organization such as an increase in its segregation and integration. Topological Data Analysis (TDA) complements such modelling by extracting high-dimensional features across the whole range of connectivity values, instead of exploring a fixed set of connections. This study enquiries into the developmental trajectories of such properties using a longitudinal sample of typically developing participants (N = 98; 53/45 F/M; 6.7-18.1 years), applying TDA into their functional connectomes. In addition, we explore the effect of puberty on the individual developmental trajectories. Results showed that compared to random networks, the adolescent brain is more segregated at the global level, but more densely connected at the local level. Furthermore, developmental effects showed nonlinear trajectories for the integration of the whole brain and fronto-parietal networks, with an inflection point and increasing trajectories after puberty onset. These results add to the insights in the development of the functional organization of the adolescent. Significance Statement Topological Data Analysis may be used to explore the topology of the brain along the whole range of connectivity values instead of selecting only a fixed set of connectivity thresholds. Here, we explored some properties of the topology of the brain functional connectome, and how they develop in adolescence. First, we show that developmental trajectories are nonlinear and better explained by the puberty status than chronological age, with an inflection point around the puberty onset. The greatest effect is the increase in functional integration for the whole brain, and particularly for the Fronto-Parietal Network when exploring functional subnetworks.
  130. Topological Data Analysis Reveals Robust Alterations in the Whole-Brain and Frontal Lobe Functional Connectomes in Attention-Deficit/Hyperactivity Disorder (2020)

    Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Sarael Alcauter
    Abstract Visual Abstract \textlessimg class="highwire-fragment fragment-image" alt="Figure" src="https://www.eneuro.org/content/eneuro/7/3/ENEURO.0543-19.2020/F1.medium.gif" width="369" height="440"/\textgreaterDownload figureOpen in new tabDownload powerpoint Attention-deficit/hyperactivity disorder (ADHD) is a developmental disorder characterized by difficulty to control the own behavior. Neuroimaging studies have related ADHD with the interplay of fronto-parietal attention systems with the default mode network (DMN; Castellanos and Aoki, 2016). However, some results have been inconsistent, potentially due to methodological differences in the analytical strategies when defining the brain functional network, i.e., the functional connectivity threshold and/or the brain parcellation scheme. Here, we make use of topological data analysis (TDA) to explore the brain connectome as a function of the filtration value (i.e., the connectivity threshold), instead of using a static connectivity threshold. Specifically, we characterized the transition from all nodes being isolated to being connected into a single component as a function of the filtration value. We explored the utility of such a method to identify differences between 81 children with ADHD (45 male, age: 7.26–17.61 years old) and 96 typically developing children (TDC; 59 male, age: 7.17–17.96 years old), using a public dataset of resting state (rs)fMRI in human subjects. Results were highly congruent when using four different brain segmentations (atlases), and exhibited significant differences for the brain topology of children with ADHD, both at the whole-brain network and the functional subnetwork levels, particularly involving the frontal lobe and the DMN. Therefore, this is a solid approach that complements connectomics-related methods and may contribute to identify the neurophysio-pathology of ADHD.
  131. Topological Extraction and Tracking of Defects in Crystal Structures (2011)

    Sebastian Grottel, Carlos A. Dietrich, João L. D. Comba, Thomas Ertl
    Abstract Interfaces between materials with different mechanical properties play an important role in technical applications. Nowadays molecular dynamics simulations are used to observe the behavior of such compound materials at the atomic level. Due to different atom crystal sizes, dislocations in the atom crystal structure occur once external forces are applied, and it has been observed that studying the change of thesedislocations can provide further understanding of macroscopic attributes like elasticity and plasticity. Standard visualization techniques such as the rendering of individual atoms work for 2D data or sectional views; however, visualizingdislocations in 3D using such methods usually fail due to occlusion and clutter. In this work we propose to extract and visualize the structure ofdislocations, which summarizes the commonly employed filtered atomistic renderings into a concise representation. The benefits of our approach are clearer images while retaining relevant data and easier visual tracking of topological changes over time.
  132. Homological Scaffold via Minimal Homology Bases (2021)

    Marco Guerra, Alessandro De Gregorio, Ulderico Fugacci, Giovanni Petri, Francesco Vaccarino
    Abstract The homological scaffold leverages persistent homology to construct a topologically sound summary of a weighted network. However, its crucial dependency on the choice of representative cycles hinders the ability to trace back global features onto individual network components, unless one provides a principled way to make such a choice. In this paper, we apply recent advances in the computation of minimal homology bases to introduce a quasi-canonical version of the scaffold, called minimal, and employ it to analyze data both real and in silico. At the same time, we verify that, statistically, the standard scaffold is a good proxy of the minimal one for sufficiently complex networks.
  133. Identification of Key Features Using Topological Data Analysis for Accurate Prediction of Manufacturing System Outputs (2017)

    Wei Guo, Ashis G. Banerjee
    Abstract Topological data analysis (TDA) has emerged as one of the most promising approaches to extract insights from high-dimensional data of varying types such as images, point clouds, and meshes, in an unsupervised manner. To the best of our knowledge, here, we provide the first successful application of TDA in the manufacturing systems domain. We apply a widely used TDA method, known as the Mapper algorithm, on two benchmark data sets for chemical process yield prediction and semiconductor wafer fault detection, respectively. The algorithm yields topological networks that capture the intrinsic clusters and connections among the clusters present in the data sets, which are difficult to detect using traditional methods. We select key process variables or features that impact the system outcomes by analyzing the network shapes. We then use predictive models to evaluate the impact of the selected features. Results show that the models achieve at least the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.
  134. Persistent Homology to Quantify the Quality of Surface-Supported Covalent Networks (2019)

    Abraham Gutierrez, Mickaël Buchet, Sylvain Clair
    Abstract Covalent networks formed by on-surface synthesis usually suffer from the presence of a large number of defects. We report on a methodology to characterize such two-dimensional networks from their experimental images obtained by scanning probe microscopy. The computation is based on a persistent homology approach and provides a quantitative score indicative of the network homogeneity. We compare our scoring method with results previously obtained using minimal spanning tree analyses and we apply it to some molecular systems appearing in the existing literature.
  135. Hierarchical Clustering and Zeroth Persistent Homology (2020)

    İsmail Güzel, Atabey Kaygun
    Abstract In this article, we show that hierarchical clustering and the zeroth persistent homology do deliver the same topological information about a given data set. We show this fact using cophenetic matrices constructed out of the filtered Vietoris-Rips complex of the data set at hand. As in any cophenetic matrix, one can also display the inter-relations of zeroth homology classes via a rooted tree, also known as a dendogram. Since homological cophenetic matrices can be calculated for higher homologies, one can also sketch similar dendograms for higher persistent homology classes.
  136. Signal Enrichment With Strain-Level Resolution in Metagenomes Using Topological Data Analysis (2019)

    Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida
    Abstract Background A metagenome is a collection of genomes, usually in a micro-environment, and sequencing a metagenomic sample en masse is a powerful means for investigating the community of the constituent microorganisms. One of the challenges is in distinguishing between similar organisms due to rampant multiple possible assignments of sequencing reads, resulting in false positive identifications. We map the problem to a topological data analysis (TDA) framework that extracts information from the geometric structure of data. Here the structure is defined by multi-way relationships between the sequencing reads using a reference database. Results Based primarily on the patterns of co-mapping of the reads to multiple organisms in the reference database, we use two models: one a subcomplex of a Barycentric subdivision complex and the other a Čech complex. The Barycentric subcomplex allows a natural mapping of the reads along with their coverage of organisms while the Čech complex takes simply the number of reads into account to map the problem to homology computation. Using simulated genome mixtures we show not just enrichment of signal but also microbe identification with strain-level resolution. Conclusions In particular, in the most refractory of cases where alternative algorithms that exploit unique reads (i.e., mapped to unique organisms) fail, we show that the TDA approach continues to show consistent performance. The Čech model that uses less information is equally effective, suggesting that even partial information when augmented with the appropriate structure is quite powerful.
  137. Cell Complex Neural Networks (2020)

    Mustafa Hajij, Kyle Istvan, Ghada Zamzami
    Abstract Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. We propose a general, combinatorial, and unifying construction for performing neural network-type computations on cell complexes. Furthermore, we introduce inter-cellular message passing schemes, message passing schemes on cell complexes that take the topology of the underlying space into account. In particular, our method generalizes many of the most popular types of graph neural networks.
  138. TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images (2021)

    Mustafa Hajij, Ghada Zamzmi, Fawwaz Batayneh
    Abstract Topological Data Analysis (TDA) has emerged recently as a robust tool to extract and compare the structure of datasets. TDA identifies features in data (e.g., connected components and holes) and assigns a quantitative measure to these features. Several studies reported that topological features extracted by TDA tools provide unique information about the data, discover new insights, and determine which feature is more related to the outcome. On the other hand, the overwhelming success of deep neural networks in learning patterns and relationships has been proven on various data applications including images. To capture the characteristics of both worlds, we propose TDA-Net, a novel ensemble network that fuses topological and deep features for the purpose of enhancing model generalizability and accuracy. We apply the proposed TDA-Net to a critical application, which is the automated detection of COVID-19 from CXR images. Experimental results showed that the proposed network achieved excellent performance and suggested the applicability of our method in practice.
  139. A Topological Framework for Deep Learning (2020)

    Mustafa Hajij, Kyle Istvan
    Abstract We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective.
  140. Visual Detection of Structural Changes in Time-Varying Graphs Using Persistent Homology (2018)

    Mustafa Hajij, Bei Wang, Carlos Scheidegger, Paul Rosen
    Abstract Topological data analysis is an emerging area in exploratory data analysis and data mining. Its main tool, persistent homology, has become a popular technique to study the structure of complex, high-dimensional data. In this paper, we propose a novel method using persistent homology to quantify structural changes in time-varying graphs. Specifically, we transform each instance of the time-varying graph into a metric space, extract topological features using persistent homology, and compare those features over time. We provide a visualization that assists in time-varying graph exploration and helps to identify patterns of behavior within the data. To validate our approach, we conduct several case studies on real-world datasets and show how our method can find cyclic patterns, deviations from those patterns, and one-time events in time-varying graphs. We also examine whether a persistence-based similarity measure satisfies a set of well-established, desirable properties for graph metrics.
  141. Shape Terra: Mechanical Feature Recognition Based on a Persistent Heat Signature (2017)

    Ramy Harik, Yang Shi, Stephen Baek
    Abstract This paper presents a novel approach to recognizing mechanical features through a multiscale persistent heat signature similarity identification technique. First, heat signature is computed using a modified Laplacian in the application of the heat kernel. Regularly, matrices tend to include an indicator to the manifold curvature (the cotangent in our case), but we add a mesh uniformity factor to overcome mesh proportionality and skewness. Second, once heat retention values are computed, we apply persistent homology to extract significant subsets of the global mesh at different time intervals. Subsets are computed based on similarity of heat retention levels and/or retention values. Third, we present a multiscale persistence identification approach where we scan the part at different persistence levels to detect the presence of a feature. Once features are recognized and their geometrical descriptors identified, the next stage in future work will be feature matching.
  142. The Growing Topology of the C. Elegans Connectome (2020)

    Alec Helm, Ann S. Blevins, Danielle S. Bassett
    Abstract Probing the developing neural circuitry in Caenorhabditis elegans has enhanced our understanding of nervous systems. The C. elegans connectome, like those of other species, is characterized by a rich club of densely connected neurons embedded within a small-world architecture. This organization of neuronal connections, captured by quantitative network statistics, provides insight into the system's capacity to perform integrative computations. Yet these network measures are limited in their ability to detect weakly connected motifs, such as topological cavities, that may support the systems capacity to perform segregated computations. We address this limitation by using persistent homology to track the evolution of topological cavities in the growing C. elegans connectome throughout neural development, and assess the degree to which the growing connectomes topology is resistant to biological noise. We show that the developing connectome topology is both relatively robust to changes in neuron birth times and not captured by similar growth models. Additionally, we quantify the consequence of a neurons specific birth time and ask if this metric tracks other biological properties of neurons. Our results suggest that the connectomes growing topology is a robust feature of the developing connectome that is distinct from other network properties, and that the growing topology is particularly sensitive to the exact birth times of a small set of predominantly motor neurons. By utilizing novel measurements that track biological features, we anticipate that our study will be helpful in the construction of more accurate models of neuronal development in C. elegans
  143. Topological Persistence for Relating Microstructure and Capillary Fluid Trapping in Sandstones (2019)

    A. L. Herring, V. Robins, A. P. Sheppard
    Abstract Results from a series of two-phase fluid flow experiments in Leopard, Berea, and Bentheimer sandstones are presented. Fluid configurations are characterized using laboratory-based and synchrotron based 3-D X-ray computed tomography. All flow experiments are conducted under capillary-dominated conditions. We conduct geometry-topology analysis via persistent homology and compare this to standard topological and watershed-partition-based pore-network statistics. Metrics identified as predictors of nonwetting fluid trapping are calculated from the different analytical methods and are compared to levels of trapping measured during drainage-imbibition cycles in the experiments. Metrics calculated from pore networks (i.e., pore body-throat aspect ratio and coordination number) and topological analysis (Euler characteristic) do not correlate well with trapping in these samples. In contrast, a new metric derived from the persistent homology analysis, which incorporates counts of topological features as well as their length scale and spatial distribution, correlates very well (R2 = 0.97) to trapping for all systems. This correlation encompasses a wide range of porous media and initial fluid configurations, and also applies to data sets of different imaging and image processing protocols.
  144. Persistent Homology in Cosmic Shear: Constraining Parameters With Topological Data Analysis (2021)

    Sven Heydenreich, Benjamin Brück, Joachim Harnois-Déraps
    Abstract In recent years, cosmic shear has emerged as a powerful tool for studying the statistical distribution of matter in our Universe. Apart from the standard two-point correlation functions, several alternative methods such as peak count statistics offer competitive results. Here we show that persistent homology, a tool from topological data analysis, can extract more cosmological information than previous methods from the same data set. For this, we use persistent Betti numbers to efficiently summarise the full topological structure of weak lensing aperture mass maps. This method can be seen as an extension of the peak count statistics, in which we additionally capture information about the environment surrounding the maxima. We first demonstrate the performance in a mock analysis of the KiDS+VIKING-450 data: We extract the Betti functions from a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations and use these to train a Gaussian process emulator that provides rapid model predictions; we next run a Markov chain Monte Carlo analysis on independent mock data to infer the cosmological parameters and their uncertainties. When comparing our results, we recover the input cosmology and achieve a constraining power on that is 3% tighter than that on peak count statistics. Performing the same analysis on 100 deg\textlesssup\textgreater2\textlesssup/\textgreater of \textlessi\textgreaterEuclid\textlessi/\textgreater-like simulations, we are able to improve the constraints on \textlessi\textgreaterS\textlessi/\textgreater\textlesssub\textgreater8\textlesssub/\textgreater and Ω\textlesssub\textgreaterm\textlesssub/\textgreater by 19% and 12%, respectively, while breaking some of the degeneracy between \textlessi\textgreaterS\textlessi/\textgreater\textlesssub\textgreater8\textlesssub/\textgreater and the dark energy equation of state. To our knowledge, the methods presented here are the most powerful topological tools for constraining cosmological parameters with lensing data.
  145. Persistent Homology in Cosmic Shear - II. A Tomographic Analysis of DES-Y1 (2022)

    Sven Heydenreich, Benjamin Brück, Pierre Burger, Joachim Harnois-Déraps, Sandra Unruh, Tiago Castro, Klaus Dolag, Nicolas Martinet
    Abstract We demonstrate how to use persistent homology for cosmological parameter inference in a tomographic cosmic shear survey. We obtain the first cosmological parameter constraints from persistent homology by applying our method to the first-year data of the Dark Energy Survey. To obtain these constraints, we analyse the topological structure of the matter distribution by extracting persistence diagrams from signal-to-noise maps of aperture masses. This presents a natural extension to the widely used peak count statistics. Extracting the persistence diagrams from the cosmo-SLICS, a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations with variable cosmological parameters, we interpolate the signal using Gaussian processes and marginalise over the most relevant systematic effects, including intrinsic alignments and baryonic effects. For the structure growth parameter, we find , which is in full agreement with other late-time probes. We also constrain the intrinsic alignment parameter to \textlessi\textgreaterA\textlessi/\textgreater = 1.54 ± 0.52, which constitutes a detection of the intrinsic alignment effect at almost 3\textlessi\textgreaterσ\textlessi/\textgreater.
  146. Coverage Criterion in Sensor Networks Stable Under Perturbation (2014)

    Yasuaki Hiraoka, Genki Kusano
    Abstract To the coverage problem of sensor networks, V. de Silva and R. Ghrist (2007) developed several approaches based on (persistent) homology theory. Their criteria for the coverage are formulated on the Rips complexes constructed by the sensors, in which their locations are supposed to be fixed. However, the sensors are in general affected by perturbations (e.g., natural phenomena), and hence the stability of the coverage criteria should be also discussed. In this paper, we present a coverage theorem stable under perturbation. Furthermore, we also introduce a method of eliminating redundant cover after perturbation. The coverage theorem is derived by extending the Rips interleaving theorem studied by F. Chazal, V. de Silva, and S. Oudot (2013) into an appropriate relative version.
  147. Hierarchical Structures of Amorphous Solids Characterized by Persistent Homology (2016)

    Yasuaki Hiraoka, Takenobu Nakamura, Akihiko Hirata, Emerson G. Escolar, Kaname Matsue, Yasumasa Nishiura
    Abstract This article proposes a topological method that extracts hierarchical structures of various amorphous solids. The method is based on the persistence diagram (PD), a mathematical tool for capturing shapes of multiscale data. The input to the PDs is given by an atomic configuration and the output is expressed as 2D histograms. Then, specific distributions such as curves and islands in the PDs identify meaningful shape characteristics of the atomic configuration. Although the method can be applied to a wide variety of disordered systems, it is applied here to silica glass, the Lennard-Jones system, and Cu-Zr metallic glass as standard examples of continuous random network and random packing structures. In silica glass, the method classified the atomic rings as short-range and medium-range orders and unveiled hierarchical ring structures among them. These detailed geometric characterizations clarified a real space origin of the first sharp diffraction peak and also indicated that PDs contain information on elastic response. Even in the Lennard-Jones system and Cu-Zr metallic glass, the hierarchical structures in the atomic configurations were derived in a similar way using PDs, although the glass structures and properties substantially differ from silica glass. These results suggest that the PDs provide a unified method that extracts greater depth of geometric information in amorphous solids than conventional methods.
  148. A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science (2023)

    Lander Ver Hoef, Henry Adams, Emily J. King, Imme Ebert-Uphoff
    Abstract Abstract Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code. Significance Statement Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.
  149. Constructing Shape Spaces From a Topological Perspective (2017)

    Christoph Hofer, Roland Kwitt, Marc Niethammer, Yvonne Höller, Eugen Trinka, Andreas Uhl
    Abstract We consider the task of constructing (metric) shape space(s) from a topological perspective. In particular, we present a generic construction scheme and demonstrate how to apply this scheme when shape is interpreted as the differences that remain after factoring out translation, scaling and rotation. This is achieved by leveraging a recently proposed injective functional transform of 2D/3D (binary) objects, based on persistent homology. The resulting shape space is then equipped with a similarity measure that is (1) by design robust to noise and (2) fulfills all metric axioms. From a practical point of view, analyses of object shape can then be carried out directly on segmented objects obtained from some imaging modality without any preprocessing, such as alignment, smoothing, or landmark selection. We demonstrate the utility of the approach on the problem of distinguishing segmented hippocampi from normal controls vs. patients with Alzheimer’s disease in a challenging setup where volume changes are no longer discriminative.
  150. Graph Filtration Learning (2020)

    Christoph Hofer, Florian Graf, Bastian Rieck, Marc Niethammer, Roland Kwitt
    Abstract We propose an approach to learning with graph-structured data in the problem domain of graph classification. In particular, we present a novel type of readout operation to aggregate node features into a graph-level representation. To this end, we leverage persistent homology computed via a real-valued, learnable, filter function. We establish the theoretical foundation for differentiating through the persistent homology computation. Empirically, we show that this type of readout operation compares favorably to previous techniques, especially when the graph connectivity structure is informative for the learning problem.
  151. Learning Representations of Persistence Barcodes (2019)

    Christoph D. Hofer, Roland Kwitt, Marc Niethammer
    Abstract We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals.
  152. Topologically Densified Distributions (2020)

    Christoph Hofer, Florian Graf, Marc Niethammer, Roland Kwitt
    Abstract We study regularization in the context of small sample-size learning with over-parametrized neural networks. Specifically, we shift focus from architectural properties, such as norms on the network weights, to properties of the internal representations before a linear classifier. Specifically, we impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances, i.e., a property beneficial for generalization. By leveraging previous work to impose topological constrains in a neural network setting, we provide empirical evidence (across various vision benchmarks) to support our claim for better generalization.
  153. Using Persistent Homology as a New Approach for Super-Resolution Localization Microscopy Data Analysis and Classification of γH2AX Foci/Clusters (2018)

    Andreas Hofmann, Matthias Krufczik, Dieter W. Heermann, Michael Hausmann
    Abstract DNA double strand breaks (DSB) are the most severe damages in chromatin induced by ionizing radiation. In response to such environmentally determined stress situations, cells have developed repair mechanisms. Although many investigations have contributed to a detailed understanding of repair processes, e.g., homologous recombination repair or non-homologous end-joining, the question is not sufficiently answered, how a cell decides to apply a certain repair process at a certain damage site, since all different repair pathways could simultaneously occur in the same cell nucleus. One of the first processes after DSB induction is phosphorylation of the histone variant H2AX to γH2AX in the given surroundings of the damaged locus. Since the spatial organization of chromatin is not random, it may be conclusive that the spatial organization of γH2AX foci is also not random, and rather, contributes to accessibility of special repair proteins to the damaged site, and thus, to the following repair pathway at this given site. The aim of this article is to demonstrate a new approach to analyze repair foci by their topology in order to obtain a cell independent method of categorization. During the last decade, novel super-resolution fluorescence light microscopic techniques have enabled new insights into genome structure and spatial organization on the nano-scale in the order of 10 nm. One of these techniques is single molecule localization microscopy (SMLM) with which the spatial coordinates of single fluorescence molecules can precisely be determined and density and distance distributions can be calculated. This method is an appropriate tool to quantify complex changes of chromatin and to describe repair foci on the single molecule level. Based on the pointillist information obtained by SMLM from specifically labeled heterochromatin and γH2AX foci reflecting the chromatin morphology and repair foci topology, we have developed a new analytical methodology of foci or foci cluster characterization, respectively, by means of persistence homology. This method allows, for the first time, a cell independent comparison of two point distributions (here the point distributions of two γH2AX clusters) with each other of a selected ensample and to give a mathematical measure of their similarity. In order to demonstrate the feasibility of this approach, cells were irradiated by low LET (linear energy transfer) radiation with different doses and the heterochromatin and γH2AX foci were fluorescently labeled by antibodies for SMLM. By means of our new analysis method, we were able to show that the topology of clusters of γH2AX foci can be categorized depending on the distance to heterochromatin. This method opens up new possibilities to categorize spatial organization of point patterns by parameterization of topological similarity.
  154. Topological Graph Neural Networks (2021)

    Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt
    Abstract Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data.
  155. Exploring the Geometry and Topology of Neural Network Loss Landscapes (2022)

    Stefan Horoi, Jessie Huang, Bastian Rieck, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy
    Abstract Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have proposed visualizing the loss landscape through the use of simple dimensionality reduction techniques. However, such visualization methods have been limited by their linear nature and only capture features in one or two dimensions, thus restricting sampling of the loss landscape to lines or planes. Here, we expand and improve upon these in three ways. First, we present a novel “jump and retrain” procedure for sampling relevant portions of the loss landscape. We show that the resulting sampled data holds more meaningful information about the network’s ability to generalize. Next, we show that non-linear dimensionality reduction of the jump and retrain trajectories via PHATE, a trajectory and manifold-preserving method, allows us to visualize differences between networks that are generalizing well vs poorly. Finally, we combine PHATE trajectories with a computational homology characterization to quantify trajectory differences.
  156. A Novel Quality Clustering Methodology on Fab-Wide Wafer Map Images in Semiconductor Manufacturing (2022)

    Yuan-Ming Hsu, Xiaodong Jia, Wenzhe Li, Jay Lee
    Abstract Abstract. In semiconductor manufacturing, clustering the fab-wide wafer map images is of critical importance for practitioners to understand the subclusters of wafer defects, recognize novel clusters or anomalies, and develop fast reactions to quality issues. However, due to the high-mix manufacturing of diversified wafer products of different sizes and technologies, it is difficult to cluster the wafer map images across the fab. This paper addresses this challenge by proposing a novel methodology for fab-wide wafer map data clustering. In the proposed methodology, a well-known deep learning technique, vision transformer with multi-head attention is first trained to convert binary wafer images of different sizes into condensed feature vectors for efficient clustering. Then, the Topological Data Analysis (TDA), which is widely used in biomedical applications, is employed to visualize the data clusters and identify the anomalies. The TDA yields a topological representation of high-dimensional big data as well as its local clusters by creating a graph that shows nodes corresponding to the clusters within the data. The effectiveness of the proposed methodology is demonstrated by clustering the public wafer map dataset WM-811k from the real application which has a total of 811,457 wafer map images. We further demonstrate the potential applicability of topology data analytics in the semiconductor area by visualization.
  157. A Sheaf and Topology Approach to Generating Local Branch Numbers in Digital Images (2020)

    Chuan-Shen Hu, Yu-Min Chung
    Abstract This paper concerns a theoretical approach that combines topological data analysis (TDA) and sheaf theory. Topological data analysis, a rising field in mathematics and computer science, concerns the shape of the data and has been proven effective in many scientific disciplines. Sheaf theory, a mathematics subject in algebraic geometry, provides a framework for describing the local consistency in geometric objects. Persistent homology (PH) is one of the main driving forces in TDA, and the idea is to track changes of geometric objects at different scales. The persistence diagram (PD) summarizes the information of PH in the form of a multi-set. While PD provides useful information about the underlying objects, it lacks fine relations about the local consistency of specific pairs of generators in PD, such as the merging relation between two connected components in the PH. The sheaf structure provides a novel point of view for describing the merging relation of local objects in PH. It is the goal of this paper to establish a theoretic framework that utilizes the sheaf theory to uncover finer information from the PH. We also show that the proposed theory can be applied to identify the branch numbers of local objects in digital images.
  158. Topology-Aware Segmentation Using Discrete Morse Theory (2021)

    Xiaoling Hu, Yusu Wang, Li Fuxin, Dimitris Samaras, Chao Chen
    Abstract In the segmentation of fine-scale structures from natural and biomedical images, per-pixel accuracy is not the only metric of concern. Topological correctness, such as vessel connectivity and membrane closure, is crucial for downstream analysis tasks. In this paper, we propose a new approach to train deep image segmentation networks for better topological accuracy. In particular, leveraging the power of discrete Morse theory (DMT), we identify global structures, including 1D skeletons and 2D patches, which are important for topological accuracy. Trained with a novel loss based on these global structures, the network performance is significantly improved especially near topologically challenging locations (such as weak spots of connections and membranes). On diverse datasets, our method achieves superior performance on both the DICE score and topological metrics.
  159. Time-Inhomogeneous Diffusion Geometry and Topology (2022)

    Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy
    Abstract Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
  160. Fast Estimation of Recombination Rates Using Topological Data Analysis (2019)

    Devon P. Humphreys, Melissa R. McGuirl, Michael Miyagi, Andrew J. Blumberg
    Abstract Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number (\textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-1.gif"/\textgreater) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call ψ, with a natural connection to coalescent models, and present novel arguments relating \textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-2.gif"/\textgreater to population genetic models. Using simulations, we show that ψ and \textlessimg class="highwire-embed" alt="Embedded Image" src="http://www.genetics.org/sites/default/files/highwire/genetics/211/4/1191/embed/mml-math-3.gif"/\textgreater are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE’s efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.
  161. Path Homology as a Stronger Analogue of Cyclomatic Complexity (2020)

    Steve Huntsman
    Abstract Cyclomatic complexity is an incompletely specified but mathematically principled software metric that can be usefully applied to both source and binary code. We consider the application of path homology as a stronger analogue of cyclomatic complexity. We have implemented an algorithm to compute path homology in arbitrary dimension and applied it to several classes of relevant flow graphs, including randomly generated flow graphs representing structured and unstructured control flow. We also compared path homology and cyclomatic complexity on a set of disassembled binaries obtained from the grep utility. There exist control flow graphs realizable at the assembly level with nontrivial path homology in arbitrary dimension. We exhibit several classes of examples in this vein while also experimentally demonstrating that path homology gives identicial results to cyclomatic complexity for at least one detailed notion of structured control flow. We also experimentally demonstrate that the two notions differ on disassembled binaries, and we highlight an example of extreme disagreement. Path homology empirically generalizes cyclomatic complexity for an elementary notion of structured code and appears to identify more structurally relevant features of control flow in general. Path homology therefore has the potential to substantially improve upon cyclomatic complexity.
  162. Topology in Cyber Research (2022)

    Steve Huntsman, Jimmy Palladino, Michael Robinson
    Abstract We give an idiosyncratic overview of applications of topology to cyber research, spanning the analysis of variables/assignments and control flow in computer programs, a brief sketch of topological data analysis in one dimension, and the use of sheaves to analyze wireless networks. The text is from a chapter in the forthcoming book Mathematics in Cyber Research, to be published by Taylor and Francis.
  163. Topology Highlights Mesoscopic Functional Equivalence Between Imagery and Perception: The Case of Hypnotizability (2019)

    Esther Ibáñez-Marcelo, Lisa Campioni, Angkoon Phinyomark, Giovanni Petri, Enrica L. Santarcangelo
    Abstract The functional equivalence (FE) between imagery and perception or motion has been proposed on the basis of neuroimaging evidence of large spatially overlapping activations between real and imagined sensori-motor conditions. However, similar local activation patterns do not imply the same mesoscopic integration of brain regions, which can be described by tools from Topological Data Analysis (TDA). On the basis of behavioral findings, stronger FE has been hypothesized in the individuals with high scores of hypnotizability scores (highs) with respect to low hypnotizable participants (lows) who differ between each other in the proneness to modify memory, perception and behavior according to specific imaginative suggestions. Here we present the first EEG evidence of stronger FE in highs. In fact, persistent homology shows that the highs EEG topological asset during real and imagined sensory conditions is significantly more similar than the lows. As a corollary finding, persistent homology shows lower restructuring of the EEG asset in highs than in lows during both sensory and imagery tasks with respect to basal conditions. Present findings support the view that greater embodiment of mental images may be responsible for the highs greater proneness to respond to sensori-motor suggestions and to report involuntariness in action. In addition, findings indicate hypnotizability-related sensory and cognitive information processing and suggest that the psycho-physiological trait of hypnotizability may modulate more than one aspect of the everyday life.
  164. Topological Data Analysis of Escherichia Coli O157:H7 and Non-O157 Survival in Soils (2014)

    Abasiofiok M. Ibekwe, Jincai Ma, David E. Crowley, Ching-Hong Yang, Alexis M. Johnson, Tanya C. Petrossian, Pek Y. Lum
    Abstract Shiga toxin-producing E. coli O157:H7 and non-O157 have been implicated in many foodborne illnesses caused by the consumption of contaminated fresh produce. However, data on their persistence in soils are limited due to the complexity in datasets generated from different environmental variables and bacterial taxa. There is a continuing need to distinguish the various environmental variables and different bacterial groups to understand the relationships among these factors and the pathogen survival. Using an approach called Topological Data Analysis (TDA); we reconstructed the relationship structure of E. coli O157 and non-O157 survival in 32 soils (16 organic and 16 conventionally managed soils) from California (CA) and Arizona (AZ) with a multi-resolution output. In our study, we took a community approach based on total soil microbiome to study community level survival and examining the network of the community as a whole and the relationship between its topology and biological processes. TDA produces a geometric representation of complex data sets. Network analysis showed that Shiga toxin negative strain E. coli O157:H7 4554 survived significantly longer in comparison to E. coli O157:H7 EDL933, while the survival time of E. coli O157:NM was comparable to that of E. coli O157:H7 strain 933 in all of the tested soils. Two non-O157 strains, E. coli O26:H11 and E. coli O103:H2 survived much longer than E. coli O91:H21 and the three strains of E. coli O157. We show that there are complex interactions between E. coli strain survival, microbial community structures, and soil parameters.
  165. Protein-Folding Analysis Using Features Obtained by Persistent Homology (2020)

    Takashi Ichinomiya, Ippei Obayashi, Yasuaki Hiraoka
    Abstract Understanding the protein-folding process is an outstanding issue in biophysics; recent developments in molecular dynamics simulation have provided insights into this phenomenon. However, the large freedom of atomic motion hinders the understanding of this process. In this study, we applied persistent homology, an emerging method to analyze topological features in a data set, to reveal protein-folding dynamics. We developed a new, to our knowledge, method to characterize the protein structure based on persistent homology and applied this method to molecular dynamics simulations of chignolin. Using principle component analysis or nonnegative matrix factorization, our analysis method revealed two stable states and one saddle state, corresponding to the native, misfolded, and transition states, respectively. We also identified an unfolded state with slow dynamics in the reduced space. Our method serves as a promising tool to understand the protein-folding process.
  166. Classification of COVID-19 via Homology of CT-SCAN (2021)

    Sohail Iqbal, H. Fareed Ahmed, Talha Qaiser, Muhammad Imran Qureshi, Nasir Rajpoot
    Abstract In this worldwide spread of SARS-CoV-2 (COVID-19) infection, it is of utmost importance to detect the disease at an early stage especially in the hot spots of this epidemic. There are more than 110 Million infected cases on the globe, sofar. Due to its promptness and effective results computed tomography (CT)-scan image is preferred to the reverse-transcription polymerase chain reaction (RT-PCR). Early detection and isolation of the patient is the only possible way of controlling the spread of the disease. Automated analysis of CT-Scans can provide enormous support in this process. In this article, We propose a novel approach to detect SARS-CoV-2 using CT-scan images. Our method is based on a very intuitive and natural idea of analyzing shapes, an attempt to mimic a professional medic. We mainly trace SARS-CoV-2 features by quantifying their topological properties. We primarily use a tool called persistent homology, from Topological Data Analysis (TDA), to compute these topological properties. We train and test our model on the "SARS-CoV-2 CT-scan dataset" i̧tep\soares2020sars\, an open-source dataset, containing 2,481 CT-scans of normal and COVID-19 patients. Our model yielded an overall benchmark F1 score of \$99.42\% \$, accuracy \$99.416\%\$, precision \$99.41\%\$, and recall \$99.42\%\$. The TDA techniques have great potential that can be utilized for efficient and prompt detection of COVID-19. The immense potential of TDA may be exploited in clinics for rapid and safe detection of COVID-19 globally, in particular in the low and middle-income countries where RT-PCR labs and/or kits are in a serious crisis.
  167. Two-Tier Mapper, an Unbiased Topology-Based Clustering Method for Enhanced Global Gene Expression Analysis (2019)

    Rachel Jeitziner, Mathieu Carrière, Jacques Rougemont, Steve Oudot, Kathryn Hess, Cathrin Brisken
    Abstract MOTIVATION: Unbiased clustering methods are needed to analyze growing numbers of complex datasets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small datasets. To overcome these shortcomings we used topological data analysis, an emerging field of mathematics that discerns additional feature and discovers hidden insights on datasets and has a wide application range. RESULTS: We have developed a topology-based clustering method called Two-Tier Mapper (TTMap) for enhanced analysis of global gene expression datasets. First, TTMap discerns divergent features in the control group, adjusts for them, and identifies outliers. Second, the deviation of each test sample from the control group in a high-dimensional space is computed, and the test samples are clustered using a new Mapper-based topological algorithm at two levels: a global tier and local tiers. All parameters are either carefully chosen or data-driven, avoiding any user-induced bias. The method is stable, different datasets can be combined for analysis, and significant subgroups can be identified. It outperforms current clustering methods in sensitivity and stability on synthetic and biological datasets, in particular when sample sizes are small; outcome is not affected by removal of control samples, by choice of normalization, or by subselection of data. TTMap is readily applicable to complex, highly variable biological samples and holds promise for personalized medicine. AVAILABILITY AND IMPLEMENTATION: TTMap is supplied as an R package in Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  168. Pore Geometry Characterization by Persistent Homology Theory (2018)

    Fei Jiang, Takeshi Tsuji, Tomoyuki Shirai
    Abstract Rock pore geometry has heterogeneous characteristics and is scale dependent. This feature in a geological formation differs significantly from artificial materials and makes it difficult to predict hydrologic and elastic properties. To characterize pore heterogeneity, we propose an evaluation method that exploits the recently developed persistent homology theory. In the proposed method, complex pore geometry is first represented as sphere cloud data using a pore-network extraction method. Then, a persistence diagram (PD) is calculated from the point cloud, which represents the spatial distribution of pore bodies. A new parameter (distance index H) derived from the PD is proposed to characterize the degree of rock heterogeneity. Low H value indicates high heterogeneity. A new empirical equation using this index H is proposed to predict the effective elastic modulus of porous media. The results indicate that the proposed PD analysis is very efficient for extracting topological feature of pore geometry.
  169. The Weighted Euler Curve Transform for Shape and Image Analysis (2020)

    Qitong Jiang, Sebastian Kurtek, Tom Needham
    Abstract The Euler Curve Transform (ECT) of Turner et al. is a complete invariant of an embedded simplicial complex, which is amenable to statistical analysis. We generalize the ECT to provide a similarly convenient representation for weighted simplicial complexes, objects which arise naturally, for example, in certain medical imaging applications. We leverage work of Ghrist et al. on Euler integral calculus to prove that this invariant—dubbed the Weighted Euler Curve Transform (WECT)—is also complete. We explain how to transform a segmented region of interest in a grayscale image into a weighted simplicial complex and then into a WECT representation. This WECT representation is applied to study Glioblastoma Multiforme brain tumor shape and texture data. We show that the WECT representation is effective at clustering tumors based on qualitative shape and texture features and that this clustering correlates with patient survival time.
  170. Cubical Ripser: Software for Computing Persistent Homology of Image and Volume Data (2020)

    Shizuo Kaji, Takeki Sudo, Kazushi Ahara
    Abstract We introduce Cubical Ripser for computing persistent homology of image and volume data. To our best knowledge, Cubical Ripser is currently the fastest and the most memory-efficient program for computing persistent homology of image and volume data. We demonstrate our software with an example of image analysis in which persistent homology and convolutional neural networks are successfully combined. Our open source implementation is available at [14].
  171. From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives (2020)

    Lida Kanari, Adélie Garin, Kathryn Hess
    Abstract Methods of topological data analysis have been successfully applied in a wide range of fields to provide useful summaries of the structure of complex data sets in terms of topological descriptors, such as persistence diagrams. While there are many powerful techniques for computing topological descriptors, the inverse problem, i.e., recovering the input data from topological descriptors, has proved to be challenging. In this article we study in detail the Topological Morphology Descriptor (TMD), which assigns a persistence diagram to any tree embedded in Euclidean space, and a sort of stochastic inverse to the TMD, the Topological Neuron Synthesis (TNS) algorithm, gaining both theoretical and computational insights into the relation between the two. We propose a new approach to classify barcodes using symmetric groups, which provides a concrete language to formulate our results. We investigate to what extent the TNS recovers a geometric tree from its TMD and describe the effect of different types of noise on the process of tree generation from persistence diagrams. We prove moreover that the TNS algorithm is stable with respect to specific types of noise.
  172. Inferring COVID-19 Biological Pathways From Clinical Phenotypes via Topological Analysis (2021)

    Negin Karisani, Daniel E. Platt, Saugata Basu, Laxmi Parida
    Abstract COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this process in a timely manner. However, medical records are often unstructured clinical notes, and this poses significant challenges to developing the automated systems. In this article, we propose a pipeline to aid practitioners in analyzing clinical notes and revealing the pathways associated with this disease. Our pipeline relies on topological properties and consists of three steps: 1) pre-processing the clinical notes to extract the salient concepts, 2) constructing a feature space of the patients to characterize the extracted concepts, and finally, 3) leveraging the topological properties to distill the available knowledge and visualize the result. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.
  173. Persistent Voids: A New Structural Metric for Membrane Fusion (2007)

    Peter M. Kasson, Afra Zomorodian, Sanghyun Park, Nina Singhal, Leonidas J. Guibas, Vijay S. Pande
    Abstract Motivation: Membrane fusion constitutes a key stage in cellular processes such as synaptic neurotransmission and infection by enveloped viruses. Current experimental assays for fusion have thus far been unable to resolve early fusion events in fine structural detail. We have previously used molecular dynamics simulations to develop mechanistic models of fusion by small lipid vesicles. Here, we introduce a novel structural measurement of vesicle topology and fusion geometry: persistent voids.Results: Persistent voids calculations enable systematic measurement of structural changes in vesicle fusion by assessing fusion stalk widths. They also constitute a generally applicable technique for assessing lipid topological change. We use persistent voids to compute dynamic relationships between hemifusion neck widening and formation of a full fusion pore in our simulation data. We predict that a tightly coordinated process of hemifusion neck expansion and pore formation is responsible for the rapid vesicle fusion mechanism, while isolated enlargement of the hemifusion diaphragm leads to the formation of a metastable hemifused intermediate. These findings suggest that rapid fusion between small vesicles proceeds via a small hemifusion diaphragm rather than a fully expanded one.Availability: Software available upon request pending public release.Contact:kasson@cmgm.stanford-edu or pande@stanford.eduSupplementary information: Supplementary data are available on Bioinformatics online.
  174. Chatter Classification in Turning Using Machine Learning and Topological Data Analysis (2018)

    Firas A. Khasawneh, Elizabeth Munch, Jose A. Perea
    Abstract Chatter identification and detection in machining processes has been an active area of research in the past two decades. Part of the challenge in studying chatter is that machining equations that describe its occurrence are often nonlinear delay differential equations. The majority of the available tools for chatter identification rely on defining a metric that captures the characteristics of chatter, and a threshold that signals its occurrence. The difficulty in choosing these parameters can be somewhat alleviated by utilizing machine learning techniques. However, even with a successful classification algorithm, the transferability of typical machine learning methods from one data set to another remains very limited. In this paper we combine supervised machine learning with Topological Data Analysis (TDA) to obtain a descriptor of the process which can detect chatter. The features we use are derived from the persistence diagram of an attractor reconstructed from the time series via Takens embedding. We test the approach using deterministic and stochastic turning models, where the stochasticity is introduced via the cutting coefficient term. Our results show a 97% successful classification rate on the deterministic model labeled by the stability diagram obtained using the spectral element method. The features gleaned from the deterministic model are then utilized for characterization of chatter in a stochastic turning model where there are very limited analysis methods.
  175. Topological Data Analysis for True Step Detection in Periodic Piecewise Constant Signals (2018)

    Firas A. Khasawneh, Elizabeth Munch
    Abstract This paper introduces a simple yet powerful approach based on topological data analysis for detecting true steps in a periodic, piecewise constant (PWC) signal. The signal is a two-state square wave with randomly varying in-between-pulse spacing, subject to spurious steps at the rising or falling edges which we call digital ringing. We use persistent homology to derive mathematical guarantees for the resulting change detection which enables accurate identification and counting of the true pulses. The approach is tested using both synthetic and experimental data obtained using an engine lathe instrumented with a laser tachometer. The described algorithm enables accurate and automatic calculations of the spindle speed without any choice of parameters. The results are compared with the frequency and sequency methods of the Fourier and Walsh–Hadamard transforms, respectively. Both our approach and the Fourier analysis yield comparable results for pulses with regular spacing and digital ringing while the latter causes large errors using the Walsh–Hadamard method. Further, the described approach significantly outperforms the frequency/sequency analyses when the spacing between the peaks is varied. We discuss generalizing the approach to higher dimensional PWC signals, although using this extension remains an interesting question for future research.
  176. Unifying Immunology With Informatics and Multiscale Biology (2014)

    Brian A Kidd, Lauren A Peters, Eric E Schadt, Joel T Dudley
    Abstract The immune system is a highly complex and dynamic system. Historically, the most common scientific and clinical practice has been to evaluate its individual components. This kind of approach cannot always expose the interconnecting pathways that control immune-system responses and does not reveal how the immune system works across multiple biological systems and scales. High-throughput technologies can be used to measure thousands of parameters of the immune system at a genome-wide scale. These system-wide surveys yield massive amounts of quantitative data that provide a means to monitor and probe immune-system function. New integrative analyses can help synthesize and transform these data into valuable biological insight. Here we review some of the computational analysis tools for high-dimensional data and how they can be applied to immunology.
  177. Investigation of Flash Crash via Topological Data Analysis (2020)

    Wonse Kim, Younng-Jin Kim, Gihyun Lee, Woong Kook
    Abstract Topological data analysis has been acknowledged as one of the most successful mathematical data analytic methodologies in various fields including medicine, genetics, and image analysis. In this paper, we explore the potential of this methodology in finance by applying persistence landscape and dynamic time series analysis to analyze an extreme event in the stock market, known as Flash Crash. We will provide results of our empirical investigation to confirm the effectiveness of our new method not only for the characterization of this extreme event but also for its prediction purposes.
  178. Stable Signatures for Dynamic Graphs and Dynamic Metric Spaces via Zigzag Persistence (2018)

    Woojin Kim, Facundo Memoli
    Abstract When studying flocking/swarming behaviors in animals one is interested in quantifying and comparing the dynamics of the clustering induced by the coalescence and disbanding of animals in different groups. In a similar vein, studying the dynamics of social networks leads to the problem of characterizing groups/communities as they form and disperse throughout time. Motivated by this, we study the problem of obtaining persistent homology based summaries of time-dependent data. Given a finite dynamic graph (DG), we first construct a zigzag persistence module arising from linearizing the dynamic transitive graph naturally induced from the input DG. Based on standard results, we then obtain a persistence diagram or barcode from this zigzag persistence module. We prove that these barcodes are stable under perturbations in the input DG under a suitable distance between DGs that we identify. More precisely, our stability theorem can be interpreted as providing a lower bound for the distance between DGs. Since it relies on barcodes, and their bottleneck distance, this lower bound can be computed in polynomial time from the DG inputs. Since DGs can be given rise by applying the Rips functor (with a fixed threshold) to dynamic metric spaces, we are also able to derive related stable invariants for these richer class of dynamic objects. Along the way, we propose a summarization of dynamic graphs that captures their time-dependent clustering features which we call formigrams. These set-valued functions generalize the notion of dendrogram, a prevalent tool for hierarchical clustering. In order to elucidate the relationship between our distance between two DGs and the bottleneck distance between their associated barcodes, we exploit recent advances in the stability of zigzag persistence due to Botnan and Lesnick, and to Bjerkevik.
  179. Using Persistent Homology and Dynamical Distances to Analyze Protein Binding (2016)

    Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolić, Giseon Heo
    Abstract Persistent homology captures the evolution of topological features of a model as a parameter changes. The most commonly used summary statistics of persistent homology are the barcode and the persistence diagram. Another summary statistic, the persistence landscape, was recently introduced by Bubenik. It is a functional summary, so it is easy to calculate sample means and variances, and it is straightforward to construct various test statistics. Implementing a permutation test we detect conformational changes between closed and open forms of the maltose-binding protein, a large biomolecule consisting of 370 amino acid residues. Furthermore, persistence landscapes can be applied to machine learning methods. A hyperplane from a support vector machine shows the clear separation between the closed and open proteins conformations. Moreover, because our approach captures dynamical properties of the protein our results may help in identifying residues susceptible to ligand binding; we show that the majority of active site residues and allosteric pathway residues are located in the vicinity of the most persistent loop in the corresponding filtered Vietoris-Rips complex. This finding was not observed in the classical anisotropic network model.
  180. Finite Topology as Applied to Image Analysis (1989)

    V. A Kovalevsky
    Abstract The notion of a cellular complex which is well known in the topology is applied to describe the structure of images. It is shown that the topology of cellular complexes is the only possible topology of finite sets. Under this topology no contradictions or paradoxes arise when defining connected subsets and their boundaries. Ways of encoding images as cellular complexes are discussed. The process of image segmentation is considered as splitting (in the topological sense) a cellular complex into blocks of cells. The notion of a cell list is introduced as a precise and compact data structure for encoding segmented images. Some applications of this data structure to the image analysis are demonstrated.
  181. Analysis of Kolmogorov Flow and Rayleigh–Bénard Convection Using Persistent Homology (2016)

    Miroslav Kramár, Rachel Levanger, Jeffrey Tithof, Balachandra Suri, Mu Xu, Mark Paul, Michael F. Schatz, Konstantin Mischaikow
    Abstract We use persistent homology to build a quantitative understanding of large complex systems that are driven far-from-equilibrium. In particular, we analyze image time series of flow field patterns from numerical simulations of two important problems in fluid dynamics: Kolmogorov flow and Rayleigh–Bénard convection. For each image we compute a persistence diagram to yield a reduced description of the flow field; by applying different metrics to the space of persistence diagrams, we relate characteristic features in persistence diagrams to the geometry of the corresponding flow patterns. We also examine the dynamics of the flow patterns by a second application of persistent homology to the time series of persistence diagrams. We demonstrate that persistent homology provides an effective method both for quotienting out symmetries in families of solutions and for identifying multiscale recurrent dynamics. Our approach is quite general and it is anticipated to be applicable to a broad range of open problems exhibiting complex spatio-temporal behavior.
  182. Persistent Homology Advances Interpretable Machine Learning for Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Joseph Montoya, Jens Hummelshøj, Dmitriy Morozov
    Abstract Machine learning for nanoporous materials design and discovery has emerged as a promising alternative to more time-consuming experiments and simulations. The challenge with this approach is the selection of features that enable universal and interpretable materials representations across multiple prediction tasks. We use persistent homology to construct holistic representations of the materials structure. We show that these representations can also be augmented with other generic features such as word embeddings from natural language processing to capture chemical information. We demonstrate our approach on multiple metal-organic framework datasets by predicting a variety of gas adsorption targets. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from commonly used manually curated features. Persistent homology features allow us to locate the pores that correlate best to adsorption at different pressures, contributing to understanding atomic level structure-property relationships for materials design.
  183. Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov
    Abstract Machine learning has emerged as an attractive alternative to experiments and simulations for predicting material properties. Usually, such an approach relies on specific domain knowledge for feature design: each learning target requires careful selection of features that an expert recognizes as important for the specific task. The major drawback of this approach is that computation of only a few structural features has been implemented so far, and it is difficult to tell a priori which features are important for a particular application. The latter problem has been empirically observed for predictors of guest uptake in nanoporous materials: local and global porosity features become dominant descriptors at low and high pressures, respectively. We investigate a feature representation of materials using tools from topological data analysis. Specifically, we use persistent homology to describe the geometry of nanoporous materials at various scales. We combine our topological descriptor with traditional structural features and investigate the relative importance of each to the prediction tasks. We demonstrate an application of this feature representation by predicting methane adsorption in zeolites, for pressures in the range 1–200 bar. Our results not only show a considerable improvement compared to the baseline, but they also highlight that topological features capture information complementary to the structural features. This is especially important for the adsorption at low pressure, a task particularly difficult for the traditional features. Furthermore, by investigation of the importance of individual topological features in the adsorption model, we are able to pinpoint the location of the pores that correlate best to adsorption at different pressure, contributing to our atom-level understanding of structure–property relationships.
  184. Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov
    Abstract Machine learning has emerged as an attractive alternative to experiments and simulations for predicting material properties. Usually, such an approach relies on specific domain knowledge for feature design: each learning target requires careful selection of features that an expert recognizes as important for the specific task. The major drawback of this approach is that computation of only a few structural features has been implemented so far, and it is difficult to tell a priori which features are important for a particular application. The latter problem has been empirically observed for predictors of guest uptake in nanoporous materials: local and global porosity features become dominant descriptors at low and high pressures, respectively. We investigate a feature representation of materials using tools from topological data analysis. Specifically, we use persistent homology to describe the geometry of nanoporous materials at various scales. We combine our topological descriptor with traditional structural features and investigate the relative importance of each to the prediction tasks. We demonstrate an application of this feature representation by predicting methane adsorption in zeolites, for pressures in the range of 1-200 bar. Our results not only show a considerable improvement compared to the baseline, but they also highlight that topological features capture information complementary to the structural features: this is especially important for the adsorption at low pressure, a task particularly difficult for the traditional features. Furthermore, by investigation of the importance of individual topological features in the adsorption model, we are able to pinpoint the location of the pores that correlate best to adsorption at different pressure, contributing to our atom-level understanding of structure-property relationships.
  185. Statistical Topological Data Analysis - A Kernel Perspective (2015)

    Roland Kwitt, Stefan Huber, Marc Niethammer, Weili Lin, Ulrich Bauer
    Abstract We consider the problem of statistical computations with persistence diagrams, a summary representation of topological features in data. These diagrams encode persistent homology, a widely used invariant in topological data analysis. While several avenues towards a statistical treatment of the diagrams have been explored recently, we follow an alternative route that is motivated by the success of methods based on the embedding of probability measures into reproducing kernel Hilbert spaces. In fact, a positive definite kernel on persistence diagrams has recently been proposed, connecting persistent homology to popular kernel-based learning techniques such as support vector machines. However, important properties of that kernel enabling a principled use in the context of probability measure embeddings remain to be explored. Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data.
  186. Persistent Brain Network Homology From the Perspective of Dendrogram (2012)

    Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, Dong Soo Lee
    Abstract The brain network is usually constructed by estimating the connectivity matrix and thresholding it at an arbitrary level. The problem with this standard method is that we do not have any generally accepted criteria for determining a proper threshold. Thus, we propose a novel multiscale framework that models all brain networks generated over every possible threshold. Our approach is based on persistent homology and its various representations such as the Rips filtration, barcodes, and dendrograms. This new persistent homological framework enables us to quantify various persistent topological features at different scales in a coherent manner. The barcode is used to quantify and visualize the evolutionary changes of topological features such as the Betti numbers over different scales. By incorporating additional geometric information to the barcode, we obtain a single linkage dendrogram that shows the overall evolution of the network. The difference between the two networks is then measured by the Gromov-Hausdorff distance over the dendrograms. As an illustration, we modeled and differentiated the FDG-PET based functional brain networks of 24 attention-deficit hyperactivity disorder children, 26 autism spectrum disorder children, and 11 pediatric control subjects.
  187. Possible Clinical Use of Big Data: Personal Brain Connectomics (2018)

    Dong Soo Lee
    Abstract The biggest data is brain imaging data, which waited for clinical use during the last three decades. Topographic data interpretation prevailed for the first two decades, and only during the last decade, connectivity or connectomics data began to be analyzed properly. Owing to topological data interpretation and timely introduction of likelihood method based on hierarchical generalized linear model, we now foresee the clinical use of personal connectomics for classification and prediction of disease prognosis for brain diseases without any clue by currently available diagnostic methods.
  188. Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution (2020)

    Michael Lesnick, Raúl Rabadán, Daniel I. S. Rosenbloom
    Abstract A topological approach to the study of genetic recombination, based on persistent homology, was introduced by Chan, Carlsson, and Rabadán in 2013. This associates a sequence of signatures called barcodes to genomic data sampled from an evolutionary history. In this paper, we develop theoretical foundations for this approach. First, we present a novel formulation of the underlying inference problem. Specifically, we introduce and study the novelty profile, a simple, stable statistic of an evolutionary history which not only counts recombination events but also quantifies how recombination creates genetic diversity. We propose that the (hitherto implicit) goal of the topological approach to recombination is the estimation of novelty profiles. We then study the problem of obtaining a lower bound on the novelty profile using barcodes. We focus on a low-recombination regime, where the evolutionary history can be described by a directed acyclic graph called a galled tree, which differs from a tree only by isolated topological defects. We show that in this regime, under a complete sampling assumption, the \$1\textasciicircum\mathrm\st\\$ barcode yields a lower bound on the novelty profile, and hence on the number of recombination events. For \$i\textgreater1\$, the \$i\textasciicircum\\mathrm\th\\\$ barcode is empty. In addition, we use a stability principle to strengthen these results to ones which hold for any subsample of an arbitrary evolutionary history. To establish these results, we describe the topology of the Vietoris--Rips filtrations arising from evolutionary histories indexed by galled trees. As a step towards a probabilistic theory, we also show that for a random history indexed by a fixed galled tree and satisfying biologically reasonable conditions, the intervals of the \$1\textasciicircum\\mathrm\st\\\$ barcode are independent random variables. Using simulations, we explore the sensitivity of these intervals to recombination.
  189. The Persistent Homology Mathematical Framework Provides Enhanced Genotype-to-Phenotype Associations for Plant Morphology (2018)

    Mao Li, Margaret H. Frank, Viktoriya Coneva, Washington Mio, Daniel H. Chitwood, Christopher N. Topp
    Abstract Efforts to understand the genetic and environmental conditioning of plant morphology are hindered by the lack of flexible and effective tools for quantifying morphology. Here, we demonstrate that persistent-homology-based topological methods can improve measurement of variation in leaf shape, serrations, and root architecture. We apply these methods to 2D images of leaves and root systems in field-grown plants of a domesticated introgression line population of tomato (Solanum pennellii). We find that compared with some commonly used conventional traits, (1) persistent-homology-based methods can more comprehensively capture morphological variation; (2) these techniques discriminate between genotypes with a larger normalized effect size and detect a greater number of unique quantitative trait loci (QTLs); (3) multivariate traits, whether statistically derived from univariate or persistent-homology-based traits, improve our ability to understand the genetic basis of phenotype; and (4) persistent-homology-based techniques detect unique QTLs compared to conventional traits or their multivariate derivatives, indicating that previously unmeasured aspects of morphology are now detectable. The QTL results further imply that genetic contributions to morphology can affect both the shoot and root, revealing a pleiotropic basis to natural variation in tomato. Persistent homology is a versatile framework to quantify plant morphology and developmental processes that complements and extends existing methods.
  190. Topological Data Analysis as a Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace (2018)

    Mao Li, Hong An, Ruthie Angelovici, Clement Bagaza, Albert Batushansky, Lynn Clark, Viktoriya Coneva, Michael J. Donoghue, Erika Edwards, Diego Fajardo, Hui Fang, Margaret H. Frank, Timothy Gallaher, Sarah Gebken, Theresa Hill, Shelley Jansky, Baljinder Kaur, Phillip C. Klahs, Laura L. Klein, Vasu Kuraparthy, Jason Londo, Zoë Migicovsky, Allison Miller, Rebekah Mohn, Sean Myles, Wagner C. Otoni, J. C. Pires, Edmond Rieffer, Sam Schmerler, Elizabeth Spriggs, Christopher N. Topp, Allen Van Deynze, Kuang Zhang, Linglong Zhu, Braden M. Zink, Daniel H. Chitwood
    Abstract Current morphometric methods that comprehensively measure shape cannot compare the disparate leaf shapes found in seed plants and are sensitive to processing artifacts. We explore the use of persistent homology, a topological method applied as a filtration across simplicial complexes (or more simply, a method to measure topological features of spaces across different spatial resolutions), to overcome these limitations. The described method isolates subsets of shape features and measures the spatial relationship of neighboring pixel densities in a shape. We apply the method to the analysis of 182,707 leaves, both published and unpublished, representing 141 plant families collected from 75 sites throughout the world. By measuring leaves from throughout the seed plants using persistent homology, a defined morphospace comparing all leaves is demarcated. Clear differences in shape between major phylogenetic groups are detected and estimates of leaf shape diversity within plant families are made. The approach predicts plant family above chance. The application of a persistent homology method, using topological features, to measure leaf shape allows for a unified morphometric framework to measure plant form, including shapes, textures, patterns, and branching architectures.
  191. Topological Data Analysis for Aviation Applications (2019)

    Max Z. Li, Megan S. Ryerson, Hamsa Balakrishnan
    Abstract Aviation data sets are increasingly high-dimensional and sparse. Consequently, the underlying features and interactions are not easily uncovered by traditional data analysis methods. Recent advancements in applied mathematics introduce topological methods, offering a new approach to obtain these features. This paper applies the fundamental notions underlying topological data analysis and persistent homology (TDA/PH) to aviation data analytics. We review past aviation research that leverage topological methods, and present a new computational case study exploring the topology of airport surface connectivity. In each case, we connect abstract topological features with real-world processes in aviation, and highlight potential operational and managerial insights.
  192. Topology of Force Networks in Granular Media Under Impact (2017)

    M. X. Lim, R. P. Behringer
    Abstract We investigate the evolution of the force network in experimental systems of two-dimensional granular materials under impact. We use the first Betti number, , and persistence diagrams, as measures of the topological properties of the force network. We show that the structure of the network has a complex, hysteretic dependence on both the intruder acceleration and the total force response of the granular material. can also distinguish between the nonlinear formation and relaxation of the force network. In addition, using the persistence diagram of the force network, we show that the size of the loops in the force network has a Poisson-like distribution, the characteristic size of which changes over the course of the impact.
  193. Extracting Insights From the Shape of Complex Data Using Topology (2013)

    P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson
    Abstract This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.
  194. Combining Geometric and Topological Information in Image Segmentation (2019)

    Hengrui Luo, Justin Strait
    Abstract A fundamental problem in computer vision is image segmentation, where the goal is to delineate the boundary of an object in the image. The focus of this work is on the segmentation of grayscale images and its purpose is two-fold. First, we conduct an in-depth study comparing active contour and topology-based methods in a statistical framework, two popular approaches for boundary detection of 2-dimensional images. Certain properties of the image dataset may favor one method over the other, both from an interpretability perspective as well as through evaluation of performance measures. Second, we propose the use of topological knowledge to assist an active contour method, which can potentially incorporate prior shape information. The latter is known to be extremely sensitive to algorithm initialization, and thus, we use a topological model to provide an automatic initialization. In addition, our proposed model can handle objects in images with more complex topological structures, including objects with holes and multiple objects within one image. We demonstrate this on artificially-constructed image datasets from computer vision, as well as real medical image data.
  195. Generalized Penalty for Circular Coordinate Representation (2020)

    Hengrui Luo, Alice Patania, Jisu Kim, Mikael Vejdemo-Johansson
    Abstract Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account sparsity in high-dimensional applications. We use a generalized penalty function instead of an \$L_\2\\$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analysis to support our claim that circular coordinates with generalized penalty will accommodate the sparsity in high-dimensional datasets under different sampling schemes while preserving the topological structures.
  196. Contagion Dynamics for Manifold Learning (2020)

    Barbara I. Mahler
    Abstract Contagion maps exploit activation times in threshold contagions to assign vectors in high-dimensional Euclidean space to the nodes of a network. A point cloud that is the image of a contagion map reflects both the structure underlying the network and the spreading behaviour of the contagion on it. Intuitively, such a point cloud exhibits features of the network's underlying structure if the contagion spreads along that structure, an observation which suggests contagion maps as a viable manifold-learning technique. We test contagion maps as a manifold-learning tool on a number of different real-world and synthetic data sets, and we compare their performance to that of Isomap, one of the most well-known manifold-learning algorithms. We find that, under certain conditions, contagion maps are able to reliably detect underlying manifold structure in noisy data, while Isomap fails due to noise-induced error. This consolidates contagion maps as a technique for manifold learning.
  197. Topological Data Analysis and Diagnostics of Compressible Magnetohydrodynamic Turbulence (2018)

    Irina Makarenko, Paul Bushby, Andrew Fletcher, Robin Henderson, Nikolay Makarenko, Anvar Shukurov
    Abstract The predictions of mean-field electrodynamics can now be probed using direct numerical simulations of random flows and magnetic fields. When modelling astrophysical magnetohydrodynamics, it is important to verify that such simulations are in agreement with observations. One of the main challenges in this area is to identify robust quantitative measures to compare structures found in simulations with those inferred from astrophysical observations. A similar challenge is to compare quantitatively results from different simulations. Topological data analysis offers a range of techniques, including the Betti numbers and persistence diagrams, that can be used to facilitate such a comparison. After describing these tools, we first apply them to synthetic random fields and demonstrate that, when the data are standardized in a straightforward manner, some topological measures are insensitive to either large-scale trends or the resolution of the data. Focusing upon one particular astrophysical example, we apply topological data analysis to H i observations of the turbulent interstellar medium (ISM) in the Milky Way and to recent magnetohydrodynamic simulations of the random, strongly compressible ISM. We stress that these topological techniques are generic and could be applied to any complex, multi-dimensional random field.
  198. Automatic Tree Ring Detection Using Jacobi Sets (2020)

    Kayla Makela, Tim Ophelders, Michelle Quigley, Elizabeth Munch, Daniel Chitwood, Asia Dowtin
    Abstract Tree ring widths are an important source of climatic and historical data, but measuring these widths typically requires extensive manual work. Computer vision techniques provide promising directions towards the automation of tree ring detection, but most automated methods still require a substantial amount of user interaction to obtain high accuracy. We perform analysis on 3D X-ray CT images of a cross-section of a tree trunk, known as a tree disk. We present novel automated methods for locating the pith (center) of a tree disk, and ring boundaries. Our methods use a combination of standard image processing techniques and tools from topological data analysis. We evaluate the efficacy of our method for two different CT scans by comparing its results to manually located rings and centers and show that it is better than current automatic methods in terms of correctly counting each ring and its location. Our methods have several parameters, which we optimize experimentally by minimizing edit distances to the manually obtained locations.
  199. A Topological Data Analysis Approach On Predicting Phenotypes From Gene Expression Data (2020)

    Sayan Mandal, Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida
    Abstract The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis., We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods., This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
  200. A Morphometric Analysis of Vegetation Patterns in Dryland Ecosystems (2017)

    Luke Mander, Stefan C. Dekker, Mao Li, Washington Mio, Surangi W. Punyasena, Timothy M. Lenton
    Abstract Vegetation in dryland ecosystems often forms remarkable spatial patterns. These range from regular bands of vegetation alternating with bare ground, to vegetated spots and labyrinths, to regular gaps of bare ground within an otherwise continuous expanse of vegetation. It has been suggested that spotted vegetation patterns could indicate that collapse into a bare ground state is imminent, and the morphology of spatial vegetation patterns, therefore, represents a potentially valuable source of information on the proximity of regime shifts in dryland ecosystems. In this paper, we have developed quantitative methods to characterize the morphology of spatial patterns in dryland vegetation. Our approach is based on algorithmic techniques that have been used to classify pollen grains on the basis of textural patterning, and involves constructing feature vectors to quantify the shapes formed by vegetation patterns. We have analysed images of patterned vegetation produced by a computational model and a small set of satellite images from South Kordofan (South Sudan), which illustrates that our methods are applicable to both simulated and real-world data. Our approach provides a means of quantifying patterns that are frequently described using qualitative terminology, and could be used to classify vegetation patterns in large-scale satellite surveys of dryland ecosystems.
  201. Cooperative Grasping Through Topological Object Representation (2014)

    A. Marzinotto, J. A. Stork, D. V. Dimarogonas, D. Kragic
    Abstract We present a cooperative grasping approach based on a topological representation of objects. Using point cloud data we extract loops on objects suitable for generating entanglement. We use the Gauss Linking Integral to derive controllers for multi-agent systems that generate hooking grasps on such loops while minimizing the entanglement between robots. The approach copes well with noisy point cloud data, it is computationally simple and robust. We demonstrate the method for performing object grasping and transportation, through a hooking maneuver, with two coordinated NAO robots.
  202. Topological Feature Extraction for Comparison of Terascale Combustion Simulation Data (2011)

    Ajith Mascarenhas, Ray W. Grout, Peer-Timo Bremer, Evatt R. Hawkes, Valerio Pascucci, Jacqueline H. Chen
    Abstract We describe a combinatorial streaming algorithm to extract features which identify regions of local intense rates of mixing in twoterascale turbulent combustion simulations. Our algorithm allows simulation data comprised of scalar fields represented on 728x896x512 or 2025x1600x400 grids to be processed on a single relatively lightweight machine. The turbulence-induced mixing governs the rate of reaction and hence is of principal interest in these combustion simulations. We use our feature extraction algorithm to compare two very different simulations and find that in both the thickness of the extracted features grows with decreasing turbulence intensity. Simultaneous consideration of results of applying the algorithm to the HO2 mass fraction field indicates that autoignition kernels near the base of a lifted flame tend not to overlap with the high mixing rate regions.
  203. Molecular Phenotyping Using Networks, Diffusion, and Topology: Soft Tissue Sarcoma (2019)

    James C. Mathews, Maryam Pouryahya, Caroline Moosmüller, Yannis G. Kevrekidis, Joseph O. Deasy, Allen Tannenbaum
    Abstract Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
  204. Delineation of a Conserved Arrestin-Biased Signaling Repertoire in Vivo (2015)

    Stuart Maudsley, Bronwen Martin, Diane Gesty-Palmer, Huey Cheung, Calvin Johnson, Shamit Patel, Kevin G. Becker, William H. Wood, Yongqing Zhang, Elin Lehrmann, Louis M. Luttrell
    Abstract Biased G protein–coupled receptor agonists engender a restricted repertoire of downstream events from their cognate receptors, permitting them to produce mixed agonist-antagonist effects in vivo. While this opens the possibility of novel therapeutics, it complicates rational drug design, since the in vivo response to a biased agonist cannot be reliably predicted from its in cellula efficacy. We have employed novel informatic approaches to characterize the in vivo transcriptomic signature of the arrestin pathway-selective parathyroid hormone analog [d-Trp12, Tyr34]bovine PTH(7-34) in six different murine tissues after chronic drug exposure. We find that [d-Trp12, Tyr34]bovine PTH(7-34) elicits a distinctive arrestin-signaling focused transcriptomic response that is more coherently regulated across tissues than that of the pluripotent agonist, human PTH(1-34). This arrestin-focused network is closely associated with transcriptional control of cell growth and development. Our demonstration of a conserved arrestin-dependent transcriptomic signature suggests a framework within which the in vivo outcomes of arrestin-biased signaling may be generalized.
  205. Specimen-Based Analysis of Morphology and the Environment in Ecologically Dominant Grasses: The Power of the Herbarium (2019)

    Christine A. McAllister, Michael R. McKain, Mao Li, Bess Bookout, Elizabeth A. Kellogg
    Abstract Herbaria contain a cumulative sample of the world's flora, assembled by thousands of people over centuries. To capitalize on this resource, we conducted a specimen-based analysis of a major clade in the grass tribe Andropogoneae, including the dominant species of the world's grasslands in the genera Andropogon, Schizachyrium, Hyparrhenia and several others. We imaged 186 of the 250 named species of the clade, georeferenced the specimens and extracted climatic variables for each. Using semi- and fully automated image analysis techniques, we extracted spikelet morphological characters and correlated these with environmental variables. We generated chloroplast genome sequences to correct for phylogenetic covariance and here present a new phylogeny for 81 of the species. We confirm and extend earlier studies to show that Andropogon and Schizachyrium are not monophyletic. In addition, we find all morphological and ecological characters are homoplasious but variable among clades. For example, sessile spikelet length is positively correlated with awn length when all accessions are considered, but when separated by clade, the relationship is positive for three sub-clades and negative for three others. Climate variables showed no correlation with morphological variation in the spikelet pair; only very weak effects of temperature and precipitation were detected on macrohair density. This article is part of the theme issue ‘Biological collections for understanding biodiversity in the Anthropocene'.
  206. Topological Descriptors for Coral Reef Resilience Using a Stochastic Spatial Model (2022)

    Robert A. McDonald, Rosanna Neuhausler, Martin Robinson, Laurel G. Larsen, Heather A. Harrington, Maria Bruna
    Abstract A complex interplay between species governs the evolution of spatial patterns in ecology. An open problem in the biological sciences is characterizing spatio-temporal data and understanding how changes at the local scale affect global dynamics/behavior. We present a toolkit of multiscale methods and use them to analyze coral reef resilience and dynamics.Here, we extend a well-studied temporal mathematical model of coral reef dynamics to include stochastic and spatial interactions and then generate data to study different ecological scenarios. We present descriptors to characterize patterns in heterogeneous spatio-temporal data surpassing spatially averaged measures. We apply these descriptors to simulated coral data and demonstrate the utility of two topological data analysis techniques--persistent homology and zigzag persistence--for characterizing the spatiotemporal evolution of reefs and generating insight into mechanisms of reef resilience. We show that the introduction of local competition between species leads to the appearance of coral clusters in the reef. Furthermore, we use our analyses to distinguish the temporal dynamics that stem from different initial configurations of coral, showing that the neighborhood composition of coral sites determines their long-term survival. Finally, we use zigzag persistence to quantify spatial behavior in the metastable regime as the level of fish grazing on algae varies and determine which spatial configurations protect coral from extinction in different environments.
  207. Topological Data Analysis of Zebrafish Patterns (2020)

    Melissa R. McGuirl, Alexandria Volkening, Björn Sandstede
    Abstract Self-organized pattern behavior is ubiquitous throughout nature, from fish schooling to collective cell dynamics during organism development. Qualitatively these patterns display impressive consistency, yet variability inevitably exists within pattern-forming systems on both microscopic and macroscopic scales. Quantifying variability and measuring pattern features can inform the underlying agent interactions and allow for predictive analyses. Nevertheless, current methods for analyzing patterns that arise from collective behavior capture only macroscopic features or rely on either manual inspection or smoothing algorithms that lose the underlying agent-based nature of the data. Here we introduce methods based on topological data analysis and interpretable machine learning for quantifying both agent-level features and global pattern attributes on a large scale. Because the zebrafish is a model organism for skin pattern formation, we focus specifically on analyzing its skin patterns as a means of illustrating our approach. Using a recent agent-based model, we simulate thousands of wild-type and mutant zebrafish patterns and apply our methodology to better understand pattern variability in zebrafish. Our methodology is able to quantify the differential impact of stochasticity in cell interactions on wild-type and mutant patterns, and we use our methods to predict stripe and spot statistics as a function of varying cellular communication. Our work provides an approach to automatically quantifying biological patterns and analyzing agent-based dynamics so that we can now answer critical questions in pattern formation at a much larger scale.
  208. Weighted Persistent Homology for Biomolecular Data Analysis (2020)

    Zhenyu Meng, D. Vijay Anand, Yunpeng Lu, Jie Wu, Kelin Xia
    Abstract In this paper, we systematically review weighted persistent homology (WPH) models and their applications in biomolecular data analysis. Essentially, the weight value, which reflects physical, chemical and biological properties, can be assigned to vertices (atom centers), edges (bonds), or higher order simplexes (cluster of atoms), depending on the biomolecular structure, function, and dynamics properties. Further, we propose the first localized weighted persistent homology (LWPH). Inspired by the great success of element specific persistent homology (ESPH), we do not treat biomolecules as an inseparable system like all previous weighted models, instead we decompose them into a series of local domains, which may be overlapped with each other. The general persistent homology or weighted persistent homology analysis is then applied on each of these local domains. In this way, functional properties, that are embedded in local structures, can be revealed. Our model has been applied to systematically study DNA structures. It has been found that our LWPH based features can be used to successfully discriminate the A-, B-, and Z-types of DNA. More importantly, our LWPH based principal component analysis (PCA) model can identify two configurational states of DNA structures in ion liquid environment, which can be revealed only by the complicated helical coordinate system. The great consistence with the helical-coordinate model demonstrates that our model captures local structure variations so well that it is comparable with geometric models. Moreover, geometric measurements are usually defined in local regions. For instance, the helical-coordinate system is limited to one or two basepairs. However, our LWPH can quantitatively characterize structure information in regions or domains with arbitrary sizes and shapes, where traditional geometrical measurements fail.
  209. Morphometrics Reveals Complex and Heritable Apple Leaf Shapes (2018)

    Zoë Migicovsky, Mao Li, Daniel H. Chitwood, Sean Myles
    Abstract Apple (Malus spp.) is a widely grown and valuable fruit crop. Leaf shape is important for flowering in apple and may also be an early indicator for other agriculturally valuable traits. We examined 9,000 leaves from 869 unique apple accessions using linear measurements and comprehensive morphometric techniques. We identified allometric variation as the result of differing length-to-width aspect ratios between accessions and species of apple. The allometric variation was due to variation in the width of the leaf blade, not the length. Aspect ratio was highly correlated with the first principal component (PC1) of morphometric variation quantified using elliptical Fourier descriptors (EFDs) and persistent homology (PH). While the primary source of variation was aspect ratio, subsequent PCs corresponded to complex shape variation not captured by linear measurements. After linking the morphometric information with over 122,000 genome-wide single nucleotide polymorphisms (SNPs), we found high SNP heritability values even at later PCs, indicating that comprehensive morphometrics can capture complex, heritable phenotypes. Thus, techniques such as EFDs and PH are capturing heritable biological variation that would be missed using linear measurements alone.
  210. Rootstock Effects on Scion Phenotypes in a ‘Chambourcin’ Experimental Vineyard (2019)

    Zoë Migicovsky, Zachary N Harris, Laura L Klein, Mao Li, Adam McDermaid, Daniel H Chitwood, Anne Fennell, Laszlo G Kovacs, Misha Kwasniewski, Jason P Londo, Qin Ma, Allison J Miller
    Abstract Understanding how root systems modulate shoot system phenotypes is a fundamental question in plant biology and will be useful in developing resilient agricultural crops. Grafting is a common horticultural practice that joins the roots (rootstock) of one plant to the shoot (scion) of another, providing an excellent method for investigating how these two organ systems affect each other. In this study, we used the French-American hybrid grapevine ‘Chambourcin’ (Vitis L.) as a model to explore the rootstock–scion relationship. We examined leaf shape, ion concentrations, and gene expression in ‘Chambourcin’ grown ungrafted as well as grafted to three different rootstocks (‘SO4’, ‘1103P’ and ‘3309C’) across 2 years and three different irrigation treatments. We found that a significant amount of the variation in leaf shape could be explained by the interaction between rootstock and irrigation. For ion concentrations, the primary source of variation identified was the position of a leaf in a shoot, although rootstock and rootstock by irrigation interaction also explained a significant amount of variation for most ions. Lastly, we found rootstock-specific patterns of gene expression in grafted plants when compared to ungrafted vines. Thus, our work reveals the subtle and complex effect of grafting on ‘Chambourcin’ leaf morphology, ionomics, and gene expression.
  211. Musical Stylistic Analysis: A Study of Intervallic Transition Graphs via Persistent Homology (2022)

    Martín Mijangos, Alessandro Bravetti, Pablo Padilla
    Abstract Topological data analysis has been recently applied to investigate stylistic signatures and trends in musical compositions. A useful tool in this area is Persistent Homology. In this paper, we develop a novel method to represent a weighted directed graph as a finite metric space and then use persistent homology to extract useful features. We apply this method to weighted directed graphs obtained from pitch transitions information of a given musical fragment and use these techniques to the study of stylistic trends. In particular, we are interested in using these tools to make quantitative stylistic comparisons. As a first illustration, we analyze a selection of string quartets by Haydn, Mozart and Beethoven and discuss possible implications of our results in terms of different approaches by these composers to stylistic exploration and variety. We observe that Haydn is stylistically the most conservative, followed by Mozart, while Beethoven is the most innovative, expanding and modifying the string quartet as a musical form. Finally we also compare the variability of different genres, namely minuets, allegros, prestos and adagios, by a given composer and conclude that the minuet is the most stable form of the string quartet movements.
  212. Fruit Flies and Moduli: Interactions Between Biology and Mathematics (2015)

    Ezra Miller
    Abstract Possibilities for using geometry and topology to analyze statistical problems in biology raise a host of novel questions in geometry, probability, algebra, and combinatorics that demonstrate the power of biology to influence the future of pure mathematics. This expository article is a tour through some biological explorations and their mathematical ramifications. The article starts with evolution of novel topological features in wing veins of fruit flies, which are quantified using the algebraic structure of multiparameter persistent homology. The statistical issues involved highlight mathematical implications of sampling from moduli spaces. These lead to geometric probability on stratified spaces, including the sticky phenomenon for Frechet means and the origin of this mathematical area in the reconstruction of phylogenetic trees.
  213. Representations of Energy Landscapes by Sublevelset Persistent Homology: An Example With N-Alkanes (2020)

    Joshua Mirth, Yanqin Zhai, Johnathan Bush, Enrique G. Alvarado, Howie Jordan, Mark Heim, Bala Krishnamoorthy, Markus Pflaum, Aurora Clark, Y. Z, Henry Adams
    Abstract Encoding the complex features of an energy landscape is a challenging task, and often chemists pursue the most salient features (minima and barriers) along a highly reduced space, i.e. 2- or 3-dimensions. Even though disconnectivity graphs or merge trees summarize the connectivity of the local minima of an energy landscape via the lowest-barrier pathways, there is more information to be gained by also considering the topology of each connected component at different energy thresholds (or sublevelsets). We propose sublevelset persistent homology as an appropriate tool for this purpose. Our computations on the configuration phase space of n-alkanes from butane to octane allow us to conjecture, and then prove, a complete characterization of the sublevelset persistent homology of the alkane \$C_m H_\2m+2\\$ potential energy landscapes, for all \$m\$, and in all homological dimensions. We further compare both the analytical configurational potential energy landscapes and sampled data from molecular dynamics simulation, using the united and all-atom descriptions of the intramolecular interactions. In turn, this supports the application of distance metrics to quantify sampling fidelity and lays the foundation for future work regarding new metrics that quantify differences between the topological features of high-dimensional energy landscapes.
  214. Construction of Symbolic Dynamics From Experimental Time Series (1999)

    K. Mischaikow, M. Mrozek, J. Reiss, A. Szymczak
    Abstract Symbolic dynamics play a central role in the description of the evolution of nonlinear systems. Yet there are few methods for determining symbolic dynamics of chaotic data. One difficulty is that the data contains random fluctuations associated with the experimental process. Using data obtained from a magnetoelastic ribbon experiment we show how a topological approach that allows for experimental error and bounded noise can be used to obtain a description of the dynamics in terms of subshift dynamics on a finite set of symbols.
  215. Diverse 3D Cellular Patterns Underlie the Development of Cardamine Hirsuta and Arabidopsis Thaliana Ovules (2023)

    Tejasvinee Atul Mody, Alexander Rolle, Nico Stucki, Fabian Roll, Ulrich Bauer, Kay Schneitz
    Abstract A fundamental question in biology is how organ morphogenesis comes about. The ovules of Arabidopsis thaliana have been established as a successful model to study numerous aspects of tissue morphogenesis; however, little is known regarding the relative contributions and dynamics of differential tissue and cellular growth and architecture in establishing ovule morphogenesis in different species. To address this issue, we generated a 3D digital atlas of Cardamine hirsuta ovule development with full cellular resolution. We combined quantitative comparative morphometrics and topological analysis to explore similarities and differences in the 3D cellular architectures underlying ovule development of the two species. We discovered that they show diversity in the way the three radial cell layers of the primordium contribute to its growth, in the formation of a new cell layer in the inner integument and, in certain cases, in the topological properties of the 3D cell architectures of homologous tissues despite their similar shape. Our work demonstrates the power of comparative 3D cellular morphometry and the importance of internal tissues and their cellular architecture in organ morphogenesis. Summary Statement Quantitative morphometric comparison of 3D digital ovules at full cellular resolution reveals diversity in internal 3D cellular architectures between similarly shaped ovules of Cardamine hirsuta and Arabidopsis thaliana.
  216. Can Neural Networks Learn Persistent Homology Features? (2020)

    Guido Montúfar, Nina Otter, Yuguang Wang
    Abstract Topological data analysis uses tools from topology -- the mathematical area that studies shapes -- to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.
  217. Hypothesis Testing for Shapes Using Vectorized Persistence Diagrams (2020)

    Chul Moon, Nicole A. Lazar
    Abstract Topological data analysis involves the statistical characterization of the shape of data. Persistent homology is a primary tool of topological data analysis, which can be used to analyze those topological features and perform statistical inference. In this paper, we present a two-stage hypothesis test for vectorized persistence diagrams. The first stage filters elements in the vectorized persistence diagrams to reduce false positives. The second stage consists of multiple hypothesis tests, with false positives controlled by false discovery rates. We demonstrate applications of the proposed procedure on simulated point clouds and three-dimensional rock image data. Our results show that the proposed hypothesis tests can provide flexible and informative inferences on the shape of data with lower computational cost compared to the permutation test.
  218. Topological Autoencoders (2020)

    Michael Moor, Max Horn, Bastian Rieck, Karsten Borgwardt
    Abstract We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors.
  219. Hyperparameter Optimization of Topological Features for Machine Learning Applications (2019)

    Francis Motta, Christopher Tralie, Rossella Bedini, Fabiano Bini, Gilberto Bini, Hamed Eramian, Marcio Gameiro, Steve Haase, Hugh Haddox, John Harer, Nick Leiby, Franco Marinozzi, Scott Novotney, Gabe Rocklin, Jed Singer, Devin Strickland, Matt Vaughn
    Abstract This paper describes a general pipeline for generating optimal vector representations of topological features of data for use with machine learning algorithms. This pipeline can be viewed as a costly black-box function defined over a complex configuration space, each point of which specifies both how features are generated and how predictive models are trained on those features. We propose using state-of-the-art Bayesian optimization algorithms to inform the choice of topological vectorization hyperparameters while simultaneously choosing learning model parameters. We demonstrate the need for and effectiveness of this pipeline using two difficult biological learning problems, and illustrate the nontrivial interactions between topological feature generation and learning model hyperparameters.
  220. Determining Clinically Relevant Features in Cytometry Data Using Persistent Homology (2022)

    Soham Mukherjee, Darren Wethington, Tamal K. Dey, Jayajit Das
    Abstract Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as ‘elbows’. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.

    Community Resources

  221. Applications of Persistent Homology to Time Varying Systems (2013)

    Elizabeth Munch
    Abstract \textlessp\textgreaterThis dissertation extends the theory of persistent homology to time varying systems. Most of the previous work has been dedicated to using this powerful tool in topological data analysis to study static point clouds. In particular, given a point cloud, we can construct its persistence diagram. Since the diagram varies continuously as the point cloud varies continuously, we study the space of time varying persistence diagrams, called vineyards when they were introduced by Cohen-Steiner, Edelsbrunner, and Morozov.\textless/p\textgreater\textlessp\textgreaterWe will first show that with a good choice of metric, these vineyards are stable for small perturbations of their associated point clouds. We will also define a new mean for a set of persistence diagrams based on the work of Mileyko et al. which, unlike the previously defined mean, is continuous for geodesic vineyards. \textless/p\textgreater\textlessp\textgreaterNext, we study the sensor network problem posed by Ghrist and de Silva, and their application of persistent homology to understand when a set of sensors covers a given region. Giving each of these sensors a probability of failure over time, we show that an exact computation of the probability of failure of the whole system is NP-hard, but give an algorithm which can predict failure in the case of a monitored system.\textless/p\textgreater\textlessp\textgreaterFinally, we apply these methods to an automated system which can cluster agents moving in aerial images by their behaviors. We build a data structure for storing and querying the information in real-time, and define behavior vectors which quantify behaviors of interest. This clustering by behavior can be used to find groups of interest, for which we can also quantify behaviors in order to determine whether the group is working together to achieve a common goal, and we speculate that this work can be extended to improving tracking algorithms as well as behavioral predictors.\textless/p\textgreater
  222. Ultrahigh-Pressure Form of \$\Mathrm\Si\\\mathrm\O\\_\2\\$ Glass With Dense Pyrite-Type Crystalline Homology (2019)

    M. Murakami, S. Kohara, N. Kitamura, J. Akola, H. Inoue, A. Hirata, Y. Hiraoka, Y. Onodera, I. Obayashi, J. Kalikka, N. Hirao, T. Musso, A. S. Foster, Y. Idemoto, O. Sakata, Y. Ohishi
    Abstract High-pressure synthesis of denser glass has been a longstanding interest in condensed-matter physics and materials science because of its potentially broad industrial application. Nevertheless, understanding its nature under extreme pressures has yet to be clarified due to experimental and theoretical challenges. Here we reveal the formation of OSi4 tetraclusters associated with that of SiO7 polyhedra in SiO2 glass under ultrahigh pressures to 200 gigapascal confirmed both experimentally and theoretically. Persistent homology analyses with molecular dynamics simulations found increased packing fraction of atoms whose topological diagram at ultrahigh pressures is similar to a pyrite-type crystalline phase, although the formation of tetraclusters is prohibited in the crystalline phase. This critical difference would be caused by the potential structural tolerance in the glass for distortion of oxygen clusters. Furthermore, an expanded electronic band gap demonstrates that chemical bonds survive at ultrahigh pressure. This opens up the synthesis of topologically disordered dense oxide glasses.
  223. Dynamic State Analysis of a Driven Magnetic Pendulum Using Ordinal Partition Networks and Topological Data Analysis (2020)

    Audun Myers, Firas A. Khasawneh
    Abstract Abstract. The use of complex networks for time series analysis has recently shown to be useful as a tool for detecting dynamic state changes for a wide variety of applications. In this work, we implement the commonly used ordinal partition network to transform a time series into a network for detecting these state changes for the simple magnetic pendulum. The time series that we used are obtained experimentally from a base-excited magnetic pendulum apparatus, and numerically from the corresponding governing equations. The magnetic pendulum provides a relatively simple, non-linear example demonstrating transitions from periodic to chaotic motion with the variation of system parameters. For our method, we implement persistent homology, a shape measuring tool from Topological Data Analysis (TDA), to summarize the shape of the resulting ordinal partition networks as a tool for detecting state changes. We show that this network analysis tool provides a clear distinction between periodic and chaotic time series. Another contribution of this work is the successful application of the networks-TDA pipeline, for the first time, to signals from non-autonomous nonlinear systems. This opens the door for our approach to be used as an automatic design tool for studying the effect of design parameters on the resulting system response. Other uses of this approach include fault detection from sensor signals in a wide variety of engineering operations.
  224. Persistent Homology and Many-Body Atomic Structure for Medium-Range Order in the Glass (2015)

    Takenobu Nakamura, Yasuaki Hiraoka, Akihiko Hirata, Emerson G. Escolar, Yasumasa Nishiura
    Abstract The characterization of the medium-range (MRO) order in amorphous materials and its relation to the short-range order is discussed. A new topological approach to extract a hierarchical structure of amorphous materials is presented, which is robust against small perturbations and allows us to distinguish it from periodic or random configurations. This method is called the persistence diagram (PD) and introduces scales to many-body atomic structures to facilitate size and shape characterization. We first illustrate the representation of perfect crystalline and random structures in PDs. Then, the MRO in amorphous silica is characterized using the appropriate PD. The PD approach compresses the size of the data set significantly, to much smaller geometrical summaries, and has considerable potential for application to a wide range of materials, including complex molecular liquids, granular materials, and metallic glasses.
  225. Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

    John T. Nardini, Bernadette J. Stolz, Kevin B. Flores, Heather A. Harrington, Helen M. Byrne
    Abstract Angiogenesis is the process by which blood vessels form from pre-existing vessels. It plays a key role in many biological processes, including embryonic development and wound healing, and contributes to many diseases including cancer and rheumatoid arthritis. The structure of the resulting vessel networks determines their ability to deliver nutrients and remove waste products from biological tissues. Here we simulate the Anderson-Chaplain model of angiogenesis at different parameter values and quantify the vessel architectures of the resulting synthetic data. Specifically, we propose a topological data analysis (TDA) pipeline for systematic analysis of the model. TDA is a vibrant and relatively new field of computational mathematics for studying the shape of data. We compute topological and standard descriptors of model simulations generated by different parameter values. We show that TDA of model simulation data stratifies parameter space into regions with similar vessel morphology. The methodologies proposed here are widely applicable to other synthetic and experimental data including wound healing, development, and plant biology.
  226. Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

    Monica Nicolau, Arnold J. Levine, Gunnar Carlsson
    Abstract High-throughput biological data, whether generated as sequencing, transcriptional microarrays, proteomic, or other means, continues to require analytic methods that address its high dimensional aspects. Because the computational part of data analysis ultimately identifies shape characteristics in the organization of data sets, the mathematics of shape recognition in high dimensions continues to be a crucial part of data analysis. This article introduces a method that extracts information from high-throughput microarray data and, by using topology, provides greater depth of information than current analytic techniques. The method, termed Progression Analysis of Disease (PAD), first identifies robust aspects of cluster analysis, then goes deeper to find a multitude of biologically meaningful shape characteristics in these data. Additionally, because PAD incorporates a visualization tool, it provides a simple picture or graph that can be used to further explore these data. Although PAD can be applied to a wide range of high-throughput data types, it is used here as an example to analyze breast cancer transcriptional data. This identified a unique subgroup of Estrogen Receptor-positive (ER+) breast cancers that express high levels of c-MYB and low levels of innate inflammatory genes. These patients exhibit 100% survival and no metastasis. No supervised step beyond distinction between tumor and healthy patients was used to identify this subtype. The group has a clear and distinct, statistically significant molecular signature, it highlights coherent biology but is invisible to cluster methods, and does not fit into the accepted classification of Luminal A/B, Normal-like subtypes of ER+ breast cancers. We denote the group as c-MYB+ breast cancer.
  227. Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury (2015)

    Jessica L. Nielson, Jesse Paquette, Aiwen W. Liu, Cristian F. Guandique, C. Amy Tovar, Tomoo Inoue, Karen-Amanda Irvine, John C. Gensel, Jennifer Kloke, Tanya C. Petrossian, Pek Y. Lum, Gunnar E. Carlsson, Geoffrey T. Manley, Wise Young, Michael S. Beattie, Jacqueline C. Bresnahan, Adam R. Ferguson
    Abstract Data-driven discovery in complex neurological disorders has potential to extract meaningful knowledge from large, heterogeneous datasets. Here the authors apply topological data analysis to assess therapeutic effects in preclinical traumatic brain injury and spinal cord injury research studies.
  228. Uncovering Precision Phenotype-Biomarker Associations in Traumatic Brain Injury Using Topological Data Analysis (2017)

    Jessica L. Nielson, Shelly R. Cooper, John K. Yue, Marco D. Sorani, Tomoo Inoue, Esther L. Yuh, Pratik Mukherjee, Tanya C. Petrossian, Jesse Paquette, Pek Y. Lum, Gunnar E. Carlsson, Mary J. Vassar, Hester F. Lingsma, Wayne A. Gordon, Alex B. Valadka, David O. Okonkwo, Geoffrey T. Manley, Adam R. Ferguson, Track-Tbi Investigators
    Abstract Background Traumatic brain injury (TBI) is a complex disorder that is traditionally stratified based on clinical signs and symptoms. Recent imaging and molecular biomarker innovations provide unprecedented opportunities for improved TBI precision medicine, incorporating patho-anatomical and molecular mechanisms. Complete integration of these diverse data for TBI diagnosis and patient stratification remains an unmet challenge. Methods and findings The Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Pilot multicenter study enrolled 586 acute TBI patients and collected diverse common data elements (TBI-CDEs) across the study population, including imaging, genetics, and clinical outcomes. We then applied topology-based data-driven discovery to identify natural subgroups of patients, based on the TBI-CDEs collected. Our hypothesis was two-fold: 1) A machine learning tool known as topological data analysis (TDA) would reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery, and 2) TDA-identified biomarkers would significantly predict patient outcome recovery after TBI using more traditional methods of univariate statistical tests. TDA algorithms organized and mapped the data of TBI patients in multidimensional space, identifying a subset of mild TBI patients with a specific multivariate phenotype associated with unfavorable outcome at 3 and 6 months after injury. Further analyses revealed that this patient subset had high rates of post-traumatic stress disorder (PTSD), and enrichment in several distinct genetic polymorphisms associated with cellular responses to stress and DNA damage (PARP1), and in striatal dopamine processing (ANKK1, COMT, DRD2). Conclusions TDA identified a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms. Machine learning methods such as TDA may provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials in TBI patients. Trial Registration ClinicalTrials.gov Identifier NCT01565551
  229. Visualizing Emergent Identity of Assemblages in the Consumer Internet of Things: A Topological Data Analysis Approach (2016)

    Thomas Novak, Donna L. Hoffman
    Abstract The identity of a consumer Internet of Things (IoT) assemblage emerges through a historical process of ongoing interactions among consumers, smart devices, and digital information. Topological Data Analysis (TDA), consistent with mathematical aspects of assemblage theory, is used to visualize the underlying possibility space from which individual IoT assemblages emerge.
  230. Filtration Curves for Graph Representation (2021)

    Leslie O'Bray, Bastian Rieck, Karsten Borgwardt
    Abstract The two predominant approaches to graph comparison in recent years are based on (i) enumerating matching subgraphs or (ii) comparing neighborhoods of nodes. In this work, we complement these two perspectives with a third way of representing graphs: using filtration curves from topological data analysis that capture both edge weight information and global graph structure. Filtration curves are highly efficient to compute and lead to expressive representations of graphs, which we demonstrate on graph classification benchmark datasets. Our work opens the door to a new form of graph representation in data mining.
  231. Topological Data Analysis: A Promising Big Data Exploration Tool in Biology, Analytical Chemistry and Physical Chemistry (2016)

    Marc Offroy, Ludovic Duponchel
    Abstract An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data).
  232. Understanding Diffraction Patterns of Glassy, Liquid and Amorphous Materials via Persistent Homology Analyses (2019)

    Yohei Onodera, Shinji Kohara, Shuta Tahara, Atsunobu Masuno, Hiroyuki Inoue, Motoki Shiga, Akihiko Hirata, Koichi Tsuchiya, Yasuaki Hiraoka, Ippei Obayashi, Koji Ohara, Akitoshi Mizuno, Osami Sakata
    Abstract The structure of glassy, liquid, and amorphous materials is still not well understood, due to the insufficient structural information from diffraction data. In this article, attempts are made to understand the origin of diffraction peaks, particularly of the first sharp diffraction peak (FSDP, Q1), the principal peak (PP, Q2), and the third peak (Q3), observed in the measured diffraction patterns of disordered materials whose structure contains tetrahedral motifs. It is confirmed that the FSDP (Q1) is not a signature of the formation of a network, because an FSDP is observed in tetrahedral molecular liquids. It is found that the PP (Q2) reflects orientational correlations of tetrahedra. Q3, that can be observed in all disordered materials, even in common liquid metals, stems from simple pair correlations. Moreover, information on the topology of disordered materials was revealed by utilizing persistent homology analyses. The persistence diagram of silica (SiO2) glass suggests that the shape of rings in the glass is similar not only to those in the crystalline phase with comparable density (α-cristobalite), but also to rings present in crystalline phases with higher density (α-quartz and coesite); this is thought to be the signature of disorder. Furthermore, we have succeeded in revealing the differences, in terms of persistent homology, between tetrahedral networks and tetrahedral molecular liquids, and the difference/similarity between liquid and amorphous (glassy) states. Our series of analyses demonstrated that a combination of diffraction data and persistent homology analyses is a useful tool for allowing us to uncover structural features hidden in halo pattern of disordered materials.
  233. Hepatic Tumor Classification Using Texture and Topology Analysis of Non-Contrast-Enhanced Three-Dimensional T1-Weighted MR Images With a Radiomics Approach (2019)

    Asuka Oyama, Yasuaki Hiraoka, Ippei Obayashi, Yusuke Saikawa, Shigeru Furui, Kenshiro Shiraishi, Shinobu Kumagai, Tatsuya Hayashi, Jun’ichi Kotoku
    Abstract The purpose of this study is to evaluate the accuracy for classification of hepatic tumors by characterization of T1-weighted magnetic resonance (MR) images using two radiomics approaches with machine learning models: texture analysis and topological data analysis using persistent homology. This study assessed non-contrast-enhanced fat-suppressed three-dimensional (3D) T1-weighted images of 150 hepatic tumors. The lesions included 50 hepatocellular carcinomas (HCCs), 50 metastatic tumors (MTs), and 50 hepatic hemangiomas (HHs) found respectively in 37, 23, and 33 patients. For classification, texture features were calculated, and also persistence images of three types (degree 0, degree 1 and degree 2) were obtained for each lesion from the 3D MR imaging data. We used three classification models. In the classification of HCC and MT (resp. HCC and HH, HH and MT), we obtained accuracy of 92% (resp. 90%, 73%) by texture analysis, and the highest accuracy of 85% (resp. 84%, 74%) when degree 1 (resp. degree 1, degree 2) persistence images were used. Our methods using texture analysis or topological data analysis allow for classification of the three hepatic tumors with considerable accuracy, and thus might be useful when applied for computer-aided diagnosis with MR images.
  234. Four-Dimensional Observation of Ductile Fracture in Sintered Iron Using Synchrotron X-Ray Laminography (2019)

    Y. Ozaki, Y. Mugita, M. Aramaki, O. Furukimi, S. Oue, F. Jiang, T. Tsuji, A. Takeuchi, M. Uesugi, K. Ashizuka
    Abstract Synchrotron X-ray laminography was used to examine the time-dependent evolution of the three-dimensional (3D) morphology of micropores in sintered iron during the tensile test. 3D snapshots showed that the networked open pores grow wider than 20 µm along the tensile direction, resulting in the internal necking of the specimen. Subsequently, these pores initiated the cracks perpendicular to the tensile direction by coalescing with the surrounding pre-existing microvoids or with the secondary-generated voids immediately before fracture. Topological analysis of the barycentric positions of these microvoids showed that they form the two-dimensional networks within the ∼20 µm of radius area. These observations strongly indicate that the microvoid coalescence could occur on shear planes formed close to the enlarged open pores or between closed pores by strain accumulation and play an important role in the crack initiation.
  235. Topology-Based Kernels With Application to Inference Problems in Alzheimer’s Disease (2011)

    Deepti Pachauri, Chris Hinrichs, Moo K. Chung, Sterling C. Johnson, Vikas Singh
    Abstract Alzheimer’s disease (AD) research has recently witnessed a great deal of activity focused on developing new statistical learning tools for automated inference using imaging data. The workhorse for many of these techniques is the Support Vector Machine (SVM) framework (or more generally kernel based methods). Most of these require, as a first step, specification of a kernel matrix between input examples (i.e., images). The inner product between images Ii and Ij in a feature space can generally be written in closed form, and so it is convenient to treat as “given”. However, in certain neuroimaging applications such an assumption becomes problematic. As an example, it is rather challenging to provide a scalar measure of similarity between two instances of highly attributed data such as cortical thickness measures on cortical surfaces. Note that cortical thickness is known to be discriminative for neurological disorders, so leveraging such information in an inference framework, especially within a multi-modal method, is potentially advantageous. But despite being clinically meaningful, relatively few works have successfully exploited this measure for classification or regression. Motivated by these applications, our paper presents novel techniques to compute similarity matrices for such topologically-based attributed data. Our ideas leverage recent developments to characterize signals (e.g., cortical thickness) motivated by the persistence of their topological features, leading to a scheme for simple constructions of kernel matrices. As a proof of principle, on a dataset of 356 subjects from the ADNI study, we report good performance on several statistical inference tasks without any feature selection, dimensionality reduction, or parameter tuning.
  236. Revisiting Abnormalities in Brain Network Architecture Underlying Autism Using Topology-Inspired Statistical Inference (2018)

    Sourabh Palande, Vipin Jose, Brandon Zielinski, Jeffrey Anderson, P. Thomas Fletcher, Bei Wang
    Abstract A large body of evidence relates autism with abnormal structural and functional brain connectivity. Structural covariance magnetic resonance imaging (scMRI) is a technique that maps brain regions with covarying gray matter densities across subjects. It provides a way to probe the anatomical structure underlying intrinsic connectivity networks (ICNs) through analysis of gray matter signal covariance. In this article, we apply topological data analysis in conjunction with scMRI to explore network-specific differences in the gray matter structure in subjects with autism versus age-, gender-, and IQ-matched controls. Specifically, we investigate topological differences in gray matter structure captured by structural correlation graphs derived from three ICNs strongly implicated in autism, namely the salience network, default mode network, and executive control network. By combining topological data analysis with statistical inference, our results provide evidence of statistically significant network-specific structural abnormalities in autism.
  237. Emotion Recognition in Talking-Face Videos Using Persistent Entropy and Neural Networks (2022)

    Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Guillermo Aguirre-Carrazana, Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Guillermo Aguirre-Carrazana
    Abstract \textlessabstract\textgreater\textlessp\textgreaterThe automatic recognition of a person's emotional state has become a very active research field that involves scientists specialized in different areas such as artificial intelligence, computer vision, or psychology, among others. Our main objective in this work is to develop a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos. Specifically, we combine audio-signal and image-sequence information to compute a \textlessitalic\textgreatertopology signature\textless/italic\textgreater (a 9-dimensional vector) for each video. We prove that small changes in the video produce small changes in the signature, ensuring the stability of the method. These topological signatures are used to feed a neural network to distinguish between the following emotions: calm, happy, sad, angry, fearful, disgust, and surprised. The results reached are promising and competitive, beating the performances achieved in other state-of-the-art works found in the literature.\textless/p\textgreater\textless/abstract\textgreater
  238. Towards a Philological Metric Through a Topological Data Analysis Approach (2020)

    Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Miguel A. Gutiérrez-Naranjo
    Abstract The canon of the baroque Spanish literature has been thoroughly studied with philological techniques. The major representatives of the poetry of this epoch are Francisco de Quevedo and Luis de Góngora y Argote. They are commonly classified by the literary experts in two different streams: Quevedo belongs to the Conceptismo and G\ńgora to the Culteranismo. Besides, traditionally, even if Quevedo is considered the most representative of the Conceptismo, Lope de Vega is also considered to be, at least, closely related to this literary trend. In this paper, we use Topological Data Analysis techniques to provide a first approach to a metric distance between the literary style of these poets. As a consequence, we reach results that are under the literary experts' criteria, locating the literary style of Lope de Vega, closer to the one of Quevedo than to the one of G\'ǵora.
  239. Topological Gene Expression Networks Recapitulate Brain Anatomy and Function (2019)

    Alice Patania, Pierluigi Selvaggi, Mattia Veronese, Ottavia Dipasquale, Paul Expert, Giovanni Petri
    Abstract Understanding how gene expression translates to and affects human behavior is one of the ultimate goals of neuroscience. In this paper, we present a pipeline based on Mapper, a topological simplification tool, to analyze gene co-expression data. We first validate the method by reproducing key results from the literature on the Allen Human Brain Atlas and the correlations between resting-state fMRI and gene co-expression maps. We then analyze a dopamine-related gene set and find that co-expression networks produced by Mapper return a structure that matches the well-known anatomy of the dopaminergic pathway. Our results suggest that network based descriptions can be a powerful tool to explore the relationships between genetic pathways and their association with brain function and its perturbation due to illness and/or pharmacological challenges., In this paper, we described a gene co-expression analysis pipeline that produces networks that we show to be closely related to either brain function and to neurotransmitter pathways. Our results suggest that this pipeline could be developed into a platform enabling the exploration of the effects of physiological and pathological alterations to specific gene sets, including profiling drugs effects.
  240. MRI and Biomechanics Multidimensional Data Analysis Reveals R2 -R1ρ as an Early Predictor of Cartilage Lesion Progression in Knee Osteoarthritis (2017)

    Valentina Pedoia, Jenny Haefeli, Kazuhito Morioka, Hsiang-Ling Teng, Lorenzo Nardo, Richard B. Souza, Adam R. Ferguson, Sharmila Majumdar
    Abstract PURPOSE: To couple quantitative compositional MRI, gait analysis, and machine learning multidimensional data analysis to study osteoarthritis (OA). OA is a multifactorial disorder accompanied by biochemical and morphological changes in the articular cartilage, modulated by skeletal biomechanics and gait. While we can now acquire detailed information about the knee joint structure and function, we are not yet able to leverage the multifactorial factors for diagnosis and disease management of knee OA. MATERIALS AND METHODS: We mapped 178 subjects in a multidimensional space integrating: demographic, clinical information, gait kinematics and kinetics, cartilage compositional T1ρ and T2 and R2 -R1ρ (1/T2 -1/T1ρ ) acquired at 3T and whole-organ magnetic resonance imaging score morphological grading. Topological data analysis (TDA) and Kolmogorov-Smirnov test were adopted for data integration, analysis, and hypothesis generation. Regression models were used for hypothesis testing. RESULTS: The results of the TDA showed a network composed of three main patient subpopulations, thus potentially identifying new phenotypes. T2 and T1ρ values (T2 lateral femur P = 1.45*10-8 , T1ρ medial tibia P = 1.05*10-5 ), the presence of femoral cartilage defects (P = 0.0013), lesions in the meniscus body (P = 0.0035), and race (P = 2.44*10-4 ) were key markers in the subpopulation classification. Within one of the subpopulations we observed an association between the composite metric R2 -R1ρ and the longitudinal progression of cartilage lesions. CONCLUSION: The analysis presented demonstrates some of the complex multitissue biochemical and biomechanical interactions that define joint degeneration and OA using a multidimensional approach, and potentially indicates that R2 -R1ρ may be an imaging biomarker for early OA. LEVEL OF EVIDENCE: 3 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;47:78-90.
  241. A Klein-Bottle-Based Dictionary for Texture Representation (2014)

    Jose A. Perea, Gunnar Carlsson
    Abstract A natural object of study in texture representation and material classification is the probability density function, in pixel-value space, underlying the set of small patches from the given image. Inspired by the fact that small \$\$n\times n\$\$n×nhigh-contrast patches from natural images in gray-scale accumulate with high density around a surface \$\$\fancyscript\K\\subset \\mathbb \R\\\textasciicircum\n\textasciicircum2\\$\$K⊂Rn2with the topology of a Klein bottle (Carlsson et al. International Journal of Computer Vision 76(1):1–12, 2008), we present in this paper a novel framework for the estimation and representation of distributions around \$\$\fancyscript\K\\$\$K, of patches from texture images. More specifically, we show that most \$\$n\times n\$\$n×npatches from a given image can be projected onto \$\$\fancyscript\K\\$\$Kyielding a finite sample \$\$S\subset \fancyscript\K\\$\$S⊂K, whose underlying probability density function can be represented in terms of Fourier-like coefficients, which in turn, can be estimated from \$\$S\$\$S. We show that image rotation acts as a linear transformation at the level of the estimated coefficients, and use this to define a multi-scale rotation-invariant descriptor. We test it by classifying the materials in three popular data sets: The CUReT, UIUCTex and KTH-TIPS texture databases.
  242. Multiscale Projective Coordinates via Persistent Cohomology of Sparse Filtrations (2018)

    Jose A. Perea
    Abstract We present a framework which leverages the underlying topology of a data set, in order to produce appropriate coordinate representations. In particular, we show how to construct maps to real and complex projective spaces, given appropriate persistent cohomology classes. An initial map is obtained in two steps: First, the persistent cohomology of a sparse filtration is used to compute systems of transition functions for (real and complex) line bundles over neighborhoods of the data. Next, the transition functions are used to produce explicit classifying maps for the induced bundles. A framework for dimensionality reduction in projective space (Principal Projective Components) is also developed, aimed at decreasing the target dimension of the original map. Several examples are provided as well as theorems addressing choices in the construction.
  243. Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis (2015)

    Jose A. Perea, John Harer
    Abstract We develop in this paper a theoretical framework for the topological study of time series data. Broadly speaking, we describe geometrical and topological properties of sliding window embeddings, as seen through the lens of persistent homology. In particular, we show that maximum persistence at the point-cloud level can be used to quantify periodicity at the signal level, prove structural and convergence theorems for the resulting persistence diagrams, and derive estimates for their dependency on window size and embedding dimension. We apply this methodology to quantifying periodicity in synthetic data sets and compare the results with those obtained using state-of-the-art methods in gene expression analysis. We call this new method SW1PerS, which stands for Sliding Windows and 1-Dimensional Persistence Scoring.
  244. Homological Scaffolds of Brain Functional Networks (2014)

    G. Petri, P. Expert, F. Turkheimer, R. Carhart-Harris, D. Nutt, P. J. Hellyer, F. Vaccarino
    Abstract Networks, as efficient representations of complex systems, have appealed to scientists for a long time and now permeate many areas of science, including neuroimaging (Bullmore and Sporns 2009 Nat. Rev. Neurosci.10, 186–198. (doi:10.1038/nrn2618)). Traditionally, the structure of complex networks has been studied through their statistical properties and metrics concerned with node and link properties, e.g. degree-distribution, node centrality and modularity. Here, we study the characteristics of functional brain networks at the mesoscopic level from a novel perspective that highlights the role of inhomogeneities in the fabric of functional connections. This can be done by focusing on the features of a set of topological objects—homological cycles—associated with the weighted functional network. We leverage the detected topological information to define the homological scaffolds, a new set of objects designed to represent compactly the homological features of the correlation network and simultaneously make their homological properties amenable to networks theoretical methods. As a proof of principle, we apply these tools to compare resting-state functional brain activity in 15 healthy volunteers after intravenous infusion of placebo and psilocybin—the main psychoactive component of magic mushrooms. The results show that the homological structure of the brain's functional patterns undergoes a dramatic change post-psilocybin, characterized by the appearance of many transient structures of low stability and of a small number of persistent ones that are not observed in the case of placebo.
  245. Homological Analysis of Multi-Qubit Entanglement (2018)

    Alessandra di Pierro, Stefano Mancini, Laleh Memarzadeh, Riccardo Mengoni
    Abstract We propose the usage of persistent homologies to characterize multipartite entanglement. On a multi-qubit data set we introduce metric-like measures defined in terms of bipartite entanglement and then we derive barcodes. We show that, depending on the distance, they are able to produce different classifications. In one case, it is possible to obtain the standard separability classes. In the other case, a new classification of entangled states of three and four qubits is provided.
  246. Topological Data Analysis Quantifies Biological Nano-Structure From Single Molecule Localization Microscopy (2020)

    Jeremy A. Pike, Abdullah O. Khan, Chiara Pallini, Steven G. Thomas, Markus Mund, Jonas Ries, Natalie S. Poulter, Iain B. Styles
    Abstract AbstractMotivation. Localization microscopy data is represented by a set of spatial coordinates, each corresponding to a single detection, that form a point cl
  247. Improved Understanding of Aqueous Solubility Modeling Through Topological Data Analysis (2018)

    Mariam Pirashvili, Lee Steinberg, Francisco Belchi Guillamon, Mahesan Niranjan, Jeremy G. Frey, Jacek Brodzki
    Abstract Topological data analysis is a family of recent mathematical techniques seeking to understand the ‘shape’ of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.
  248. Grasping Objects With Holes: A Topological Approach (2013)

    F. T. Pokorny, J. A. Stork, D. Kragic
    Abstract This work proposes a topologically inspired approach for generating robot grasps on objects with `holes'. Starting from a noisy point-cloud, we generate a simplicial representation of an object of interest and use a recently developed method for approximating shortest homology generators to identify graspable loops. To control the movement of the robot hand, a topologically motivated coordinate system is used in order to wrap the hand around such loops. Finally, another concept from topology - namely the Gauss linking integral - is adapted to serve as evidence for secure caging grasps after a grasp has been executed. We evaluate our approach in simulation on a Barrett hand using several target objects of different sizes and shapes and present an initial experiment with real sensor data.
  249. Loops Abound in the Cosmic Microwave Background: A \$4\sigma\$ Anomaly on Super-Horizon Scales (2021)

    Pratyush Pranav
    Abstract We present a topological analysis of the temperature fluctuation maps from the \emph\Planck 2020\ Data release 4 (DR4) based on the \texttt\NPIPE\ data processing pipeline. For comparison, we also present the topological characteristics of the maps from \emph\Planck 2018\ Data release 3 (DR3). We perform our analysis in terms of the homology characteristics of the maps, invoking relative homology to account for analysis in the presence of masks. We perform our analysis for a range of smoothing scales spanning sub- and super-horizon scales corresponding to \$FWHM = 5', 10', 20', 40', 80', 160', 320', 640'\$. Our main result indicates a significantly anomalous behavior of the loops in the observed maps compared to simulations that are modeled as isotopic and homogeneous Gaussian random fields. Specifically, we observe a \$4\sigma\$ deviation between the observation and simulations in the number of loops at \$FWHM = 320'\$ and \$FWHM = 640'\$, corresponding to super-horizon scales of \$5\$ degrees and larger. In addition, we also notice a mildly significant deviation at \$2\sigma\$ for all the topological descriptors for almost all the scales analyzed. Our results show a consistency across different data releases, and therefore, the anomalous behavior deserves a careful consideration regarding its origin and ramifications. Disregarding the unlikely source of the anomaly being instrumental systematics, the origin of the anomaly may be genuinely astrophysical -- perhaps due to a yet unresolved foreground, or truly primordial in nature. Given the nature of the topological descriptors, that potentially encodes information of all orders, non-Gaussianities, of either primordial or late-type nature, may be potential candidates. Alternate possibilities include the Universe admitting a non-trivial global topology, including effects induced by large-scale topological defects.
  250. The Topology of the Cosmic Web in Terms of Persistent Betti Numbers (2017)

    Pratyush Pranav, Herbert Edelsbrunner, Rien van de Weygaert, Gert Vegter, Michael Kerber, Bernard J. T. Jones, Mathijs Wintraecken
    Abstract Abstract. We introduce a multiscale topological description of the Megaparsec web-like cosmic matter distribution. Betti numbers and topological persistence of
  251. Unexpected Topology of the Temperature Fluctuations in the Cosmic Microwave Background (2019)

    Pratyush Pranav, Robert J. Adler, Thomas Buchert, Herbert Edelsbrunner, Bernard J. T. Jones, Armin Schwartzman, Hubert Wagner, Rien van de Weygaert
    Abstract We study the topology generated by the temperature fluctuations of the cosmic microwave background (CMB) radiation, as quantified by the number of components and holes, formally given by the Betti numbers, in the growing excursion sets. We compare CMB maps observed by the \textlessi\textgreaterPlanck\textlessi/\textgreater satellite with a thousand simulated maps generated according to the ΛCDM paradigm with Gaussian distributed fluctuations. The comparison is multi-scale, being performed on a sequence of degraded maps with mean pixel separation ranging from 0.05 to 7.33°. The survey of the CMB over 𝕊\textlesssup\textgreater2\textlesssup/\textgreater is incomplete due to obfuscation effects by bright point sources and other extended foreground objects like our own galaxy. To deal with such situations, where analysis in the presence of “masks” is of importance, we introduce the concept of relative homology. The parametric \textlessi\textgreaterχ\textlessi/\textgreater\textlesssup\textgreater2\textlesssup/\textgreater-test shows differences between observations and simulations, yielding \textlessi\textgreaterp\textlessi/\textgreater-values at percent to less than permil levels roughly between 2 and 7°, with the difference in the number of components and holes peaking at more than 3\textlessi\textgreaterσ\textlessi/\textgreater sporadically at these scales. The highest observed deviation between the observations and simulations for \textlessi\textgreaterb\textlessi/\textgreater\textlesssub\textgreater0\textlesssub/\textgreater and \textlessi\textgreaterb\textlessi/\textgreater\textlesssub\textgreater1\textlesssub/\textgreater is approximately between 3\textlessi\textgreaterσ\textlessi/\textgreater and 4\textlessi\textgreaterσ\textlessi/\textgreater at scales of 3–7°. There are reports of mildly unusual behaviour of the Euler characteristic at 3.66° in the literature, computed from independent measurements of the CMB temperature fluctuations by \textlessi\textgreaterPlanck\textlessi/\textgreater’s predecessor, the \textlessi\textgreaterWilkinson\textlessi/\textgreater Microwave Anisotropy Probe (WMAP) satellite. The mildly anomalous behaviour of the Euler characteristic is phenomenologically related to the strongly anomalous behaviour of components and holes, or the zeroth and first Betti numbers, respectively. Further, since these topological descriptors show consistent anomalous behaviour over independent measurements of \textlessi\textgreaterPlanck\textlessi/\textgreater and WMAP, instrumental and systematic errors may be an unlikely source. These are also the scales at which the observed maps exhibit low variance compared to the simulations, and approximately the range of scales at which the power spectrum exhibits a dip with respect to the theoretical model. Non-parametric tests show even stronger differences at almost all scales. Crucially, Gaussian simulations based on power-spectrum matching the characteristics of the observed dipped power spectrum are not able to resolve the anomaly. Understanding the origin of the anomalies in the CMB, whether cosmological in nature or arising due to late-time effects, is an extremely challenging task. Regardless, beyond the trivial possibility that this may still be a manifestation of an extreme Gaussian case, these observations, along with the super-horizon scales involved, may motivate the study of primordial non-Gaussianity. Alternative scenarios worth exploring may be models with non-trivial topology, including topological defect models.
  252. Weighted-Persistent-Homology-Based Machine Learning for RNA Flexibility Analysis (2020)

    Chi Seng Pun, Brandon Yung Sin Yong, Kelin Xia
    Abstract With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.
  253. Fast and Accurate Tumor Segmentation of Histology Images Using Persistent Homology and Deep Convolutional Features (2019)

    Talha Qaiser, Yee-Wah Tsang, Daiki Taniyama, Naoya Sakamoto, Kazuaki Nakane, David Epstein, Nasir Rajpoot
    Abstract Tumor segmentation in whole-slide images of histology slides is an important step towards computer-assisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. We propose an efficient way of computing topological persistence of an image, alternative to simplicial homology. The PHPs are devised to distinguish tumor regions from their normal counterparts by modeling the atypical characteristics of tumor nuclei. We propose two variants of our method for tumor segmentation: one that targets speed without compromising accuracy and the other that targets higher accuracy. The fast version is based on a selection of exemplar image patches from a convolution neural network (CNN) and patch classification by quantifying the divergence between the PHPs of exemplars and the input image patch. Detailed comparative evaluation shows that the proposed algorithm is significantly faster than competing algorithms while achieving comparable results. The accurate version combines the PHPs and high-level CNN features and employs a multi-stage ensemble strategy for image patch labeling. Experimental results demonstrate that the combination of PHPs and CNN features outperform competing algorithms. This study is performed on two independently collected colorectal datasets containing adenoma, adenocarcinoma, signet, and healthy cases. Collectively, the accurate tumor segmentation produces the highest average patch-level F1-score, as compared with competing algorithms, on malignant and healthy cases from both the datasets. Overall the proposed framework highlights the utility of persistent homology for histopathology image analysis.
  254. Identification of Relevant Genetic Alterations in Cancer Using Topological Data Analysis (2020)

    Raúl Rabadán, Yamina Mohamedi, Udi Rubin, Tim Chu, Adam N. Alghalith, Oliver Elliott, Luis Arnés, Santiago Cal, Álvaro J. Obaya, Arnold J. Levine, Pablo G. Cámara
    Abstract Large-scale cancer genomic studies enable the systematic identification of mutations that lead to the genesis and progression of tumors, uncovering the underlying molecular mechanisms and potential therapies. While some such mutations are recurrently found in many tumors, many others exist solely within a few samples, precluding detection by conventional recurrence-based statistical approaches. Integrated analysis of somatic mutations and RNA expression data across 12 tumor types reveals that mutations of cancer genes are usually accompanied by substantial changes in expression. We use topological data analysis to leverage this observation and uncover 38 elusive candidate cancer-associated genes, including inactivating mutations of the metalloproteinase ADAMTS12 in lung adenocarcinoma. We show that ADAMTS12−/− mice have a five-fold increase in the susceptibility to develop lung tumors, confirming the role of ADAMTS12 as a tumor suppressor gene. Our results demonstrate that data integration through topological techniques can increase our ability to identify previously unreported cancer-related alterations., Rare cancer mutations are often missed using recurrence-based statistical approaches, but are usually accompanied by changes in expression. Here the authors leverage this information to uncover several elusive candidate cancer-associated genes using topological data analysis.