🍩 Database of Original & Non-Theoretical Uses of Topology
(found 61 matches in 0.007836s)
-
-
Topological Detection of Trojaned Neural Networks (2021)
Songzhu Zheng, Yikai Zhang, Hubert Wagner, Mayank Goswami, Chao ChenAbstract
Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model’s behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles, we discover subtle – yet critical – structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from shallow to deep layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks. -
Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)
Rayna Andreeva, Anwesha Sarkar, Rik SarkarAbstract
The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics. -
The Persistence of Large Scale Structures I: Primordial Non-Gaussianity (2020)
Matteo Biagetti, Alex Cole, Gary ShiuAbstract
We develop an analysis pipeline for characterizing the topology of large scale structure and extracting cosmological constraints based on persistent homology. Persistent homology is a technique from topological data analysis that quantifies the multiscale topology of a data set, in our context unifying the contributions of clusters, filament loops, and cosmic voids to cosmological constraints. We describe how this method captures the imprint of primordial local non-Gaussianity on the late-time distribution of dark matter halos, using a set of N-body simulations as a proxy for real data analysis. For our best single statistic, running the pipeline on several cubic volumes of size \$40~(\rm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes. Additionally we test our ability to resolve degeneracies between the topological signature of \$f_\\rm NL\\textasciicircum\\rm loc\\$ and variation of \$\sigma_8\$ and argue that correctly identifying nonzero \$f_\\rm NL\\textasciicircum\\rm loc\\$ in this case is possible via an optimal template method. Our method relies on information living at \$\mathcal\O\(10)\$ Mpc/h, a complementary scale with respect to commonly used methods such as the scale-dependent bias in the halo/galaxy power spectrum. Therefore, while still requiring a large volume, our method does not require sampling long-wavelength modes to constrain primordial non-Gaussianity. Moreover, our statistics are interpretable: we are able to reproduce previous results in certain limits and we make new predictions for unexplored observables, such as filament loops formed by dark matter halos in a simulation box. -
Felix: A Topology Based Framework for Visual Exploration of Cosmic Filaments (2016)
Nithin Shivshankar, Pratyush Pranav, Vijay Natarajan, Rien van de Weygaert, E. G. Patrick Bos, Steven RiederAbstract
The large-scale structure of the universe is comprised of virialized blob-like clusters, linear filaments, sheet-like walls and huge near empty three-dimensional voids. Characterizing the large scale universe is essential to our understanding of the formation and evolution of galaxies. The density range of clusters, walls and voids are relatively well separated, when compared to filaments, which span a relatively larger range. The large scale filamentary network thus forms an intricate part of the cosmic web. In this paper, we describe Felix, a topology based framework for visual exploration of filaments in the cosmic web. The filamentary structure is represented by the ascending manifold geometry of the 2-saddles in the Morse-Smale complex of the density field. We generate a hierarchy of Morse-Smale complexes and query for filaments based on the density ranges at the end points of the filaments. The query is processed efficiently over the entire hierarchical Morse-Smale complex, allowing for interactive visualization. We apply Felix to computer simulations based on the heuristic Voronoi kinematic model and the standard \$\Lambda\$CDM cosmology, and demonstrate its usefulness through two case studies. First, we extract cosmic filaments within and across cluster like regions in Voronoi kinematic simulation datasets. We demonstrate that we produce similar results to existing structure finders. Filaments that form the spine of the cosmic web, which exist in high density regions in the current epoch, are isolated using Felix. Also, filaments present in void-like regions are isolated and visualized. These filamentary structures are often over shadowed by higher density range filaments and are not easily characterizable and extractable using other filament extraction methodologies. -
Determining Clinically Relevant Features in Cytometry Data Using Persistent Homology (2022)
Soham Mukherjee, Darren Wethington, Tamal K. Dey, Jayajit DasAbstract
Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as ‘elbows’. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.Community Resources
-
Persistent Homology and the Branching Topologies of Plants (2017)
Mao Li, Keith Duncan, Christopher N. Topp, Daniel H. Chitwood -
Topological Characteristics of Oil and Gas Reservoirs and Their Applications (2017)
V. A. Baikov, R. R. Gilmanov, I. A. Taimanov, A. A. YakovlevAbstract
We demonstrate applications of topological characteristics of oil and gas reservoirs considered as three-dimensional bodies to geological modeling. -
Alpha, Betti and the Megaparsec Universe: On the Topology of the Cosmic Web (2011)
Rien Van De Weygaert, Gert Vegter, Herbert Edelsbrunner, Bernard J. T. Jones, Pratyush Pranav, Changbom Park, Wojciech A. Hellwing, Bob Eldering, Nico Kruithof, E. G. P. Bos, Johan Hidding, Job Feldbrugge, Eline Ten Have, Matti Van Engelen, Manuel Caroli, Monique TeillaudAbstract
We study the topology of the Megaparsec Cosmic Web in terms of the scale-dependent Betti numbers, which formalize the topological information content of... -
A Method to the Madness: Using Persistent Homology to Measure Plant Morphology (2018)
Emily R. Larson -
Topological Data Analysis Quantifies Biological Nano-Structure From Single Molecule Localization Microscopy (2020)
Jeremy A. Pike, Abdullah O. Khan, Chiara Pallini, Steven G. Thomas, Markus Mund, Jonas Ries, Natalie S. Poulter, Iain B. StylesAbstract
AbstractMotivation. Localization microscopy data is represented by a set of spatial coordinates, each corresponding to a single detection, that form a point cl -
Reconstructing Linearly Embedded Graphs: A First Step to Stratified Space Learning (2021)
Yossi Bokor, Christopher Williams, Katharine TurnerCommunity Resources
-
Skyler (2023)
Yossi Bokor BleileAbstract
Julia package for recovering stratified spaces underlying point clouds. -
Parametric Inference Using Persistence Diagrams: a Case Study in Population Genetics (2014)
Kevin Emmett, Daniel Rosenbloom, Pablo Camara, Raul RabadanAbstract
Persistent homology computes topological invariants from point cloud data. Recent work has focused on developing statistical methods for data analysis in this framework. We show that, in certain models, parametric inference can be performed using statistics defined on the computed invariants. We develop this idea with a model from population genetics, the coalescent with recombination. We apply our model to an influenza dataset, identifying two scales of topological structure which have a distinct biological interpretation. -
A Barcode Shape Descriptor for Curve Point Cloud Data (2004)
Anne Collins, Afra Zomorodian, Gunnar Carlsson, Leonidas J. GuibasAbstract
In this paper, we present a complete computational pipeline for extracting a compact shape descriptor for curve point cloud data (PCD). Our shape descriptor, called a barcode, is based on a blend of techniques from differential geometry and algebraic topology. We also provide a metric over the space of barcodes, enabling fast comparison of PCDs for shape recognition and clustering. To demonstrate the feasibility of our approach, we implement our pipeline and provide experimental evidence in shape classification and parametrization. -
Hierarchical Clustering and Zeroth Persistent Homology (2020)
İsmail Güzel, Atabey KaygunAbstract
In this article, we show that hierarchical clustering and the zeroth persistent homology do deliver the same topological information about a given data set. We show this fact using cophenetic matrices constructed out of the filtered Vietoris-Rips complex of the data set at hand. As in any cophenetic matrix, one can also display the inter-relations of zeroth homology classes via a rooted tree, also known as a dendogram. Since homological cophenetic matrices can be calculated for higher homologies, one can also sketch similar dendograms for higher persistent homology classes. -
Interpretable Phase Detection and Classification With Persistent Homology (2020)
Alex Cole, Gregory J. Loges, Gary ShiuAbstract
We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for conducting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortex-antivortex structure are identified as relevant features for characterizing phase transitions. -
Topological Singularity Detection at Multiple Scales (2023)
Julius von Rohrscheidt, Bastian RieckAbstract
The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data. -
Hypothesis Testing for Shapes Using Vectorized Persistence Diagrams (2020)
Chul Moon, Nicole A. LazarAbstract
Topological data analysis involves the statistical characterization of the shape of data. Persistent homology is a primary tool of topological data analysis, which can be used to analyze those topological features and perform statistical inference. In this paper, we present a two-stage hypothesis test for vectorized persistence diagrams. The first stage filters elements in the vectorized persistence diagrams to reduce false positives. The second stage consists of multiple hypothesis tests, with false positives controlled by false discovery rates. We demonstrate applications of the proposed procedure on simulated point clouds and three-dimensional rock image data. Our results show that the proposed hypothesis tests can provide flexible and informative inferences on the shape of data with lower computational cost compared to the permutation test. -
Topological Data Analysis: Concepts, Computation, and Applications in Chemical Engineering (2021)
Alexander D. Smith, Paweł Dłotko, Victor M. ZavalaAbstract
A primary hypothesis that drives scientific and engineering studies is that data has structure. The dominant paradigms for describing such structure are statistics (e.g., moments, correlation functions) and signal processing (e.g., convolutional neural nets, Fourier series). Topological Data Analysis (TDA) is a field of mathematics that analyzes data from a fundamentally different perspective. TDA represents datasets as geometric objects and provides dimensionality reduction techniques that project such objects onto low-dimensional descriptors. The key properties of these descriptors (also known as topological features) are that they provide multiscale information and that they are stable under perturbations (e.g., noise, translation, and rotation). In this work, we review the key mathematical concepts and methods of TDA and present different applications in chemical engineering. -
Motor Eccentricity Fault Detection: Physics-Based and Data-Driven Approaches (2023)
Bingnan Wang, Hiroshi Inoue, Makoto KanemaruAbstract
Fault detection using motor current signature analysis (MCSA) is attractive for industrial applications due to its simplicity with no additional sensor installation required. However current components associated with faults are often very subtle and much smaller than the supply frequency component, making it challenging to detect and quantify fault levels. In this paper, we present our work on quantitative eccentricity fault diagnosis technologies for electric motors, including physical-model approach using improved winding function theory, which can simulate motor dynamics under faulty conditions and agrees well with experiment data, and data-driven approach using topological data analysis (TDA), which can effectively differentiate signals measured at different eccentricity levels. The advantages and limitations of each approach is discussed. Both methods can be extended to the detection and quantification of other types of electric motor faults. -
Contagion Dynamics for Manifold Learning (2020)
Barbara I. MahlerAbstract
Contagion maps exploit activation times in threshold contagions to assign vectors in high-dimensional Euclidean space to the nodes of a network. A point cloud that is the image of a contagion map reflects both the structure underlying the network and the spreading behaviour of the contagion on it. Intuitively, such a point cloud exhibits features of the network's underlying structure if the contagion spreads along that structure, an observation which suggests contagion maps as a viable manifold-learning technique. We test contagion maps as a manifold-learning tool on a number of different real-world and synthetic data sets, and we compare their performance to that of Isomap, one of the most well-known manifold-learning algorithms. We find that, under certain conditions, contagion maps are able to reliably detect underlying manifold structure in noisy data, while Isomap fails due to noise-induced error. This consolidates contagion maps as a technique for manifold learning. -
Shape Terra: Mechanical Feature Recognition Based on a Persistent Heat Signature (2017)
Ramy Harik, Yang Shi, Stephen BaekAbstract
This paper presents a novel approach to recognizing mechanical features through a multiscale persistent heat signature similarity identification technique. First, heat signature is computed using a modified Laplacian in the application of the heat kernel. Regularly, matrices tend to include an indicator to the manifold curvature (the cotangent in our case), but we add a mesh uniformity factor to overcome mesh proportionality and skewness. Second, once heat retention values are computed, we apply persistent homology to extract significant subsets of the global mesh at different time intervals. Subsets are computed based on similarity of heat retention levels and/or retention values. Third, we present a multiscale persistence identification approach where we scan the part at different persistence levels to detect the presence of a feature. Once features are recognized and their geometrical descriptors identified, the next stage in future work will be feature matching. -
Finding Universal Structures in Quantum Many-Body Dynamics via Persistent Homology (2020)
Daniel Spitz, Jürgen Berges, Markus K. Oberthaler, Anna WienhardAbstract
Inspired by topological data analysis techniques, we introduce persistent homology observables and apply them in a geometric analysis of the dynamics of quantum field theories. As a prototype application, we consider simulated data of a two-dimensional Bose gas far from equilibrium. We discover a continuous spectrum of dynamical scaling exponents, which provides a refined classification of nonequilibrium universal phenomena. A possible explanation of the underlying processes is provided in terms of mixing wave turbulence and vortex kinetics components in point clouds. We find that the persistent homology scaling exponents are inherently linked to the geometry of the system, as the derivation of a packing relation reveals. The approach opens new ways of analyzing quantum many-body dynamics in terms of robust topological structures beyond standard field theoretic techniques. -
A Topology-Based Object Representation for Clasping, Latching and Hooking (2013)
J. A. Stork, F. T. Pokorny, D. KragicAbstract
We present a loop-based topological object representation for objects with holes. The representation is used to model object parts suitable for grasping, e.g. handles, and it incorporates local volume information about these. Furthermore, we present a grasp synthesis framework that utilizes this representation for synthesizing caging grasps that are robust under measurement noise. The approach is complementary to a local contact-based force-closure analysis as it depends on global topological features of the object. We perform an extensive evaluation with four robotic hands on synthetic data. Additionally, we provide real world experiments using a Kinect sensor on two robotic platforms: a Schunk dexterous hand attached to a Kuka robot arm as well as a Nao humanoid robot. In the case of the Nao platform, we provide initial experiments showing that our approach can be used to plan whole arm hooking as well as caging grasps involving only one hand. -
Statistical Topological Data Analysis - A Kernel Perspective (2015)
Roland Kwitt, Stefan Huber, Marc Niethammer, Weili Lin, Ulrich BauerAbstract
We consider the problem of statistical computations with persistence diagrams, a summary representation of topological features in data. These diagrams encode persistent homology, a widely used invariant in topological data analysis. While several avenues towards a statistical treatment of the diagrams have been explored recently, we follow an alternative route that is motivated by the success of methods based on the embedding of probability measures into reproducing kernel Hilbert spaces. In fact, a positive definite kernel on persistence diagrams has recently been proposed, connecting persistent homology to popular kernel-based learning techniques such as support vector machines. However, important properties of that kernel enabling a principled use in the context of probability measure embeddings remain to be explored. Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data. -
A Probabilistic Topological Approach to Feature Identification Using a Stochastic Robotic Swarm (2018)
Ragesh K. Ramachandran, Sean Wilson, Spring BermanAbstract
This paper presents a novel automated approach to quantifying the topological features of an unknown environment using a swarm of robots with local sensing and limited or no access to global position information. The robots randomly explore the environment and record a time series of their estimated position and the covariance matrix associated with this estimate. After the robots’ deployment, a point cloud indicating the free space of the environment is extracted from their aggregated data. Tools from topological data analysis, in particular the concept of persistent homology, are applied to a subset of the point cloud to construct barcode diagrams, which are used to determine the numbers of different types of features in the domain. We demonstrate that our approach can correctly identify the number of topological features in simulations with zero to four features and in multi-robot experiments with one to three features. -
Topological Analysis of Population Activity in Visual Cortex (2008)
Gurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, Dario L. RingachAbstract
Information in the cortex is thought to be represented by the joint activity of neurons. Here we describe how fundamental questions about neural representation can be cast in terms of the topological structure of population activity. A new method, based on the concept of persistent homology, is introduced and applied to the study of population activity in primary visual cortex (V1). We found that the topological structure of activity patterns when the cortex is spontaneously active is similar to those evoked by natural image stimulation and consistent with the topology of a two sphere. We discuss how this structure could emerge from the functional organization of orientation and spatial frequency maps and their mutual relationship. Our findings extend prior results on the relationship between spontaneous and evoked activity in V1 and illustrates how computational topology can help tackle elementary questions about the representation of information in the nervous system. -
Topological Data Analysis of Biological Aggregation Models (2015)
Chad M. Topaz, Lori Ziegelmeier, Tom HalversonAbstract
We apply tools from topological data analysis to two mathematical models inspired by biological aggregations such as bird flocks, fish schools, and insect swarms. Our data consists of numerical simulation output from the models of Vicsek and D'Orsogna. These models are dynamical systems describing the movement of agents who interact via alignment, attraction, and/or repulsion. Each simulation time frame is a point cloud in position-velocity space. We analyze the topological structure of these point clouds, interpreting the persistent homology by calculating the first few Betti numbers. These Betti numbers count connected components, topological circles, and trapped volumes present in the data. To interpret our results, we introduce a visualization that displays Betti numbers over simulation time and topological persistence scale. We compare our topological results to order parameters typically used to quantify the global behavior of aggregations, such as polarization and angular momentum. The topological calculations reveal events and structure not captured by the order parameters. -
A Topological Perspective on Regimes in Dynamical Systems (2021)
Kristian Strommen, Matthew Chantry, Joshua Dorrington, Nina OtterAbstract
The existence and behaviour of so-called `regimes' has been extensively studied in dynamical systems ranging from simple toy models to the atmosphere itself, due to their potential of drastically simplifying complex and chaotic dynamics. Nevertheless, no agreed-upon and clear-cut definition of a `regime' or a `regime system' exists in the literature. We argue here for a definition which equates the existence of regimes in a system with the existence of non-trivial topological structure. We show, using persistent homology, a tool in topological data analysis, that this definition is both computationally tractable, practically informative, and accounts for a variety of different examples. We further show that alternative, more strict definitions based on clustering and/or temporal persistence criteria fail to account for one or more examples of dynamical systems typically thought of as having regimes. We finally discuss how our methodology can shed light on regime behaviour in the atmosphere, and discuss future prospects. -
The Extended Persistent Homology Transform of Manifolds With Boundary (2022)
Katharine Turner, Vanessa Robins, James MorganAbstract
The Extended Persistent Homology Transform (XPHT) is a topological transform which takes as input a shape embedded in Euclidean space, and to each unit vector assigns the extended persistence module of the height function over that shape with respect to that direction. We can define a distance between two shapes by integrating over the sphere the distance between their respective extended persistence modules. By using extended persistence we get finite distances between shapes even when they have different Betti numbers. We use Morse theory to show that the extended persistence of a height function over a manifold with boundary can be deduced from the extended persistence for that height function restricted to the boundary, alongside labels on the critical points as positive or negative critical. We study the application of the XPHT to binary images; outlining an algorithm for efficient calculation of the XPHT exploiting relationships between the PHT of the boundary curves to the extended persistence of the foreground. -
Topological Data Analysis on Simple English Wikipedia Articles (2020)
Matthew Wright, Xiaojun ZhengAbstract
Single-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems, with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology, and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. These approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. We demonstrate the applicability of our methods by distinguishing certain subsets of the Wikipedia data, and by comparison with random data. Results include insights into the construction of null distributions and stability of our methods with respect to noisy data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter. -
Topological Detection of Phenomenological Bifurcations With Unreliable Kernel Density Estimates (2024)
Sunia Tanweer, Firas A. KhasawnehAbstract
Phenomenological (P-type) bifurcations are qualitative changes in stochastic dynamical systems whereby the stationary probability density function (PDF) changes its topology. The current state of the art for detecting these bifurcations requires reliable kernel density estimates computed from an ensemble of system realizations. However, in several real world signals such as Big Data, only a single system realization is available—making it impossible to estimate a reliable kernel density. This study presents an approach for detecting P-type bifurcations using unreliable density estimates. The approach creates an ensemble of objects from Topological Data Analysis (TDA) called persistence diagrams from the system’s sole realization and statistically analyzes the resulting set. We compare several methods for replicating the original persistence diagram including Gibbs point process modelling, Pairwise Interaction Point Modelling, and subsampling. We show that for the purpose of predicting a bifurcation, the simple method of subsampling exceeds the other two methods of point process modelling in performance. -
Raw Material Flow Optimization as a Capacitated Vehicle Routing Problem: A Visual Benchmarking Approach for Sustainable Manufacturing (2017)
Michele Dassisti, Yasamin Eslami, Matin MohagheghAbstract
Optimisation problem concerning material flows, to increase the efficiency while reducing relative resource consumption is one of the most pressing problems today. The focus point of this study is to propose a new visual benchmarking approach to select the best material-flow path from the depot to the production lines, referring to the well-known Capacitated Vehicle Routing Problem (CVRP). An example industrial case study is considered to this aim. Two different solution techniques were adopted (namely Mixed Integer Linear Programming and the Ant Colony Optimization) in searching optimal solutions to the CVRP. The visual benchmarking proposed, based on the persistent homology approach, allowed to support the comparison of the optimal solutions based on the entropy of the output in different scenarios. Finally, based on the non-standard measurements of Crossing Length Percentage (CLP), the visual benchmarking procedure makes it possible to find the most practical and applicable solution to CVRP by considering the visual attractiveness and the quality of the routes. -
A Novel Approach for Wafer Defect Pattern Classification Based on Topological Data Analysis (2023)
Seungchan Ko, Dowan KooAbstract
In semiconductor manufacturing, wafer map defect pattern provides critical information for facility maintenance and yield management, so the classification of defect patterns is one of the most important tasks in the manufacturing process. In this paper, we propose a novel way to represent the shape of the defect pattern as a finite-dimensional vector, which will be used as an input for a neural network algorithm for classification. The main idea is to extract the topological features of each pattern by using the theory of persistent homology from topological data analysis (TDA). Through some experiments with a simulated dataset, we show that the proposed method is faster and much more efficient in training with higher accuracy, compared with the method using convolutional neural networks (CNN) which is the most common approach for wafer map defect pattern classification. Moreover, it was shown that our method outperforms the CNN-based method when the number of training data is not enough and is imbalanced. -
Analysis of Kolmogorov Flow and Rayleigh–Bénard Convection Using Persistent Homology (2016)
Miroslav Kramár, Rachel Levanger, Jeffrey Tithof, Balachandra Suri, Mu Xu, Mark Paul, Michael F. Schatz, Konstantin MischaikowAbstract
We use persistent homology to build a quantitative understanding of large complex systems that are driven far-from-equilibrium. In particular, we analyze image time series of flow field patterns from numerical simulations of two important problems in fluid dynamics: Kolmogorov flow and Rayleigh–Bénard convection. For each image we compute a persistence diagram to yield a reduced description of the flow field; by applying different metrics to the space of persistence diagrams, we relate characteristic features in persistence diagrams to the geometry of the corresponding flow patterns. We also examine the dynamics of the flow patterns by a second application of persistent homology to the time series of persistence diagrams. We demonstrate that persistent homology provides an effective method both for quotienting out symmetries in families of solutions and for identifying multiscale recurrent dynamics. Our approach is quite general and it is anticipated to be applicable to a broad range of open problems exhibiting complex spatio-temporal behavior. -
Rule Generation for Classifying SLT Failed Parts (2022)
Ho-Chieh Hsu, Cheng-Che Lu, Shih-Wei Wang, Kelly Jones, Kai-Chiang Wu, Mango C.-T. ChaoAbstract
System-level test (SLT) has recently gained visibility when integrated circuits become harder and harder to be fully tested due to increasing transistor density and circuit design complexity. Albeit SLT is effective for reducing test escapes, little diagnostic information can be obtained for product improvement. In this paper, we propose an unsupervised learning (UL) method to resolve the aforementioned issue by discovering correlative, potentially systematic defects during the SLT phase. Toward this end, HDBSCAN [1] is used for clustering SLT failed devices in a low-dimensional space created by UMAP [2]. Decision trees are subsequently applied to explain the HDBSCAN results based on generating explainable quantitative rules, e.g., inequality constraints, providing domain experts additional information for advanced diagnosis. Experiments on industrial data demonstrate that the proposed methodology can effectively cluster SLT failed devices and then explain the clustering results with a promising accuracy of above 90%. Our methodology is also scalable and fast, requiring two to five orders of magnitude lower runtime than the method presented in [3]. -
The Accumulated Persistence Function, a New Useful Functional Summary Statistic for Topological Data Analysis, With a View to Brain Artery Trees and Spatial Point Process Applications (2019)
C.A.N. Biscio, J. MøllerAbstract
We start with a simple introduction to topological data analysis where the most popular tool is called a persistence diagram. Briefly, a persistence diagram is a multiset of points in the plane describing the persistence of topological features of a compact set when a scale parameter varies. Since statistical methods are difficult to apply directly on persistence diagrams, various alternative functional summary statistics have been suggested, but either they do not contain the full information of the persistence diagram or they are two-dimensional functions. We suggest a new functional summary statistic that is one-dimensional and hence easier to handle, and which under mild conditions contains the full information of the persistence diagram. Its usefulness is illustrated in statistical settings concerned with point clouds and brain artery trees. The supplementary materials include additional methods and examples, technical details, and the R code used for all examples. © 2019, © 2019 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. -
Persistent Homology of Geospatial Data: A Case Study With Voting (2021)
Michelle Feng, Mason A. PorterAbstract
A crucial step in the analysis of persistent homology is the transformation of data into an appropriate topological object (which, in our case, is a simplicial complex). Software packages for computing persistent homology typically construct Vietoris--Rips or other distance-based simplicial complexes on point clouds because they are relatively easy to compute. We investigate alternative methods of constructing simplicial complexes and the effects of making associated choices during simplicial-complex construction on the output of persistent-homology algorithms. We present two new methods for constructing simplicial complexes from two-dimensional geospatial data (such as maps). We apply these methods to a California precinct-level voting data set, and we thereby demonstrate that our new constructions can capture geometric characteristics that are missed by distance-based constructions. Our new constructions can thus yield more interpretable persistence modules and barcodes for geospatial data. In particular, they are able to distinguish short-persistence features that occur only for a narrow range of distance scales (e.g., voting patterns in densely populated cities) from short-persistence noise by incorporating information about other spatial relationships between regions. -
Using Persistent Homology and Dynamical Distances to Analyze Protein Binding (2016)
Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolić, Giseon HeoAbstract
Persistent homology captures the evolution of topological features of a model as a parameter changes. The most commonly used summary statistics of persistent homology are the barcode and the persistence diagram. Another summary statistic, the persistence landscape, was recently introduced by Bubenik. It is a functional summary, so it is easy to calculate sample means and variances, and it is straightforward to construct various test statistics. Implementing a permutation test we detect conformational changes between closed and open forms of the maltose-binding protein, a large biomolecule consisting of 370 amino acid residues. Furthermore, persistence landscapes can be applied to machine learning methods. A hyperplane from a support vector machine shows the clear separation between the closed and open proteins conformations. Moreover, because our approach captures dynamical properties of the protein our results may help in identifying residues susceptible to ligand binding; we show that the majority of active site residues and allosteric pathway residues are located in the vicinity of the most persistent loop in the corresponding filtered Vietoris-Rips complex. This finding was not observed in the classical anisotropic network model. -
From Trees to Barcodes and Back Again: Theoretical and Statistical Perspectives (2020)
Lida Kanari, Adélie Garin, Kathryn HessAbstract
Methods of topological data analysis have been successfully applied in a wide range of fields to provide useful summaries of the structure of complex data sets in terms of topological descriptors, such as persistence diagrams. While there are many powerful techniques for computing topological descriptors, the inverse problem, i.e., recovering the input data from topological descriptors, has proved to be challenging. In this article we study in detail the Topological Morphology Descriptor (TMD), which assigns a persistence diagram to any tree embedded in Euclidean space, and a sort of stochastic inverse to the TMD, the Topological Neuron Synthesis (TNS) algorithm, gaining both theoretical and computational insights into the relation between the two. We propose a new approach to classify barcodes using symmetric groups, which provides a concrete language to formulate our results. We investigate to what extent the TNS recovers a geometric tree from its TMD and describe the effect of different types of noise on the process of tree generation from persistence diagrams. We prove moreover that the TNS algorithm is stable with respect to specific types of noise. -
Multidimensional Persistence in Biomolecular Data (2015)
Kelin Xia, Guo-Wei WeiAbstract
Persistent homology has emerged as a popular technique for the topological simplification of big data, including biomolecular data. Multidimensional persistence bears considerable promise to bridge the gap between geometry and topology. However, its practical and robust construction has been a challenge. We introduce two families of multidimensional persistence, namely pseudo-multidimensional persistence and multiscale multidimensional persistence. The former is generated via the repeated applications of persistent homology filtration to high dimensional data, such as results from molecular dynamics or partial differential equations. The latter is constructed via isotropic and anisotropic scales that create new simiplicial complexes and associated topological spaces. The utility, robustness and efficiency of the proposed topological methods are demonstrated via protein folding, protein flexibility analysis, the topological denoising of cryo-electron microscopy data, and the scale dependence of nano particles. Topological transition between partial folded and unfolded proteins has been observed in multidimensional persistence. The separation between noise topological signatures and molecular topological fingerprints is achieved by the Laplace-Beltrami flow. The multiscale multidimensional persistent homology reveals relative local features in Betti-0 invariants and the relatively global characteristics of Betti-1 and Betti-2 invariants. -
Geometric Anomaly Detection in Data (2020)
Bernadette J. Stolz, Jared Tanner, Heather A. Harrington, Vidit NandaAbstract
The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors. -
HERMES: Persistent Spectral Graph Software (2020)
Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei WeiAbstract
Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacians (PLs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality. -
Persistent Homology Based Graph Convolution Network for Fine-Grained 3D Shape Segmentation (2021)
Chi-Chong Wong, Chi-Man VongAbstract
Fine-grained 3D segmentation is an important task in 3D object understanding, especially in applications such as intelligent manufacturing or parts analysis for 3D objects. However, many challenges involved in such problem are yet to be solved, such as i) interpreting the complex structures located in different regions for 3D objects; ii) capturing fine-grained structures with sufficient topology correctness. Current deep learning and graph machine learning methods fail to tackle such challenges and thus provide inferior performance in fine-grained 3D analysis. In this work, methods in topological data analysis are incorporated with geometric deep learning model for the task of fine-grained segmentation for 3D objects. We propose a novel neural network model called Persistent Homology based Graph Convolution Network (PHGCN), which i) integrates persistent homology into graph convolution network to capture multi-scale structural information that can accurately represent complex structures for 3D objects; ii) applies a novel Persistence Diagram Loss (ℒPD) that provides sufficient topology correctness for segmentation over the fine-grained structures. Extensive experiments on fine-grained 3D segmentation validate the effectiveness of the proposed PHGCN model and show significant improvements over current state-of-the-art methods. -
CD8 T-Cell Reactivity to Islet Antigens Is Unique to Type 1 While CD4 T-Cell Reactivity Exists in Both Type 1 and Type 2 Diabetes (2014)
Ghanashyam Sarikonda, Jeremy Pettus, Sonal Phatak, Sowbarnika Sachithanantham, Jacqueline F. Miller, Johnna D. Wesley, Eithon Cadag, Ji Chae, Lakshmi Ganesan, Ronna Mallios, Steve Edelman, Bjoern Peters, Matthias von HerrathAbstract
Previous cross-sectional analyses demonstrated that CD8+ and CD4+ T-cell reactivity to islet-specific antigens was more prevalent in T1D subjects than in healthy donors (HD). Here, we examined T1D-associated epitope-specific CD4+ T-cell cytokine production and autoreactive CD8+ T-cell frequency on a monthly basis for one year in 10 HD, 33 subjects with T1D, and 15 subjects with T2D. Autoreactive CD4+ T-cells from both T1D and T2D subjects produced more IFN-γ when stimulated than cells from HD. In contrast, higher frequencies of islet antigen-specific CD8+ T-cells were detected only in T1D. These observations support the hypothesis that general beta-cell stress drives autoreactive CD4+ T-cell activity while islet over-expression of MHC class I commonly seen in T1D mediates amplification of CD8+ T-cells and more rapid beta-cell loss. In conclusion, CD4+ T-cell autoreactivity appears to be present in both T1D and T2D while autoreactive CD8+ T-cells are unique to T1D. Thus, autoreactive CD8+ cells may serve as a more T1D-specific biomarker. -
Data-Driven and Automatic Surface Texture Analysis Using Persistent Homology (2021)
Melih C. Yesilli, Firas A. KhasawnehAbstract
Surface roughness plays an important role in analyzing engineering surfaces. It quantifies the surface topography and can be used to determine whether the resulting surface finish is acceptable or not. Nevertheless, while several existing tools and standards are available for computing surface roughness, these methods rely heavily on user input thus slowing down the analysis and increasing manufacturing costs. Therefore, fast and automatic determination of the roughness level is essential to avoid costs resulting from surfaces with unacceptable finish, and user-intensive analysis. In this study, we propose a Topological Data Analysis (TDA) based approach to classify the roughness level of synthetic surfaces using both their areal images and profiles. We utilize persistent homology from TDA to generate persistence diagrams that encapsulate information on the shape of the surface. We then obtain feature matrices for each surface or profile using Carlsson coordinates, persistence images, and template functions. We compare our results to two widely used methods in the literature: Fast Fourier Transform (FFT) and Gaussian filtering. The results show that our approach yields mean accuracies as high as 97%. We also show that, in contrast to existing surface analysis tools, our TDA-based approach is fully automatable and provides adaptive feature extraction. -
Relational Persistent Homology for Multispecies Data With Application to the Tumor Microenvironment (2023)
Bernadette J. Stolz, Jagdeep Dhesi, Joshua A. Bull, Heather A. Harrington, Helen M. Byrne, Iris H. R. YoonAbstract
Topological data analysis (TDA) is an active field of mathematics for quantifying shape in complex data. Standard methods in TDA such as persistent homology (PH) are typically focused on the analysis of data consisting of a single entity (e.g., cells or molecular species). However, state-of-the-art data collection techniques now generate exquisitely detailed multispecies data, prompting a need for methods that can examine and quantify the relations among them. Such heterogeneous data types arise in many contexts, ranging from biomedical imaging, geospatial analysis, to species ecology. Here, we propose two methods for encoding spatial relations among different data types that are based on Dowker complexes and Witness complexes. We apply the methods to synthetic multispecies data of a tumor microenvironment and analyze topological features that capture relations between different cell types, e.g., blood vessels, macrophages, tumor cells, and necrotic cells. We demonstrate that relational topological features can extract biological insight, including the dominant immune cell phenotype (an important predictor of patient prognosis) and the parameter regimes of a data-generating model. The methods provide a quantitative perspective on the relational analysis of multispecies spatial data, overcome the limits of traditional PH, and are readily computable. -
Exploring Surface Texture Quantification in Piezo Vibration Striking Treatment (PVST) Using Topological Measures (2022)
Melih C. Yesilli, Max M. Chumley, Jisheng Chen, Firas A. Khasawneh, Yang GuoAbstract
Abstract. Surface texture influences wear and tribological properties of manufactured parts, and it plays a critical role in end-user products. Therefore, quantifying the order or structure of a manufactured surface provides important information on the quality and life expectancy of the product. Although texture can be intentionally introduced to enhance aesthetics or to satisfy a design function, sometimes it is an inevitable byproduct of surface treatment processes such as Piezo Vibration Striking Treatment (PVST). Measures of order for surfaces have been characterized using statistical, spectral, and geometric approaches. For nearly hexagonal lattices, topological tools have also been used to measure the surface order. This paper explores utilizing tools from Topological Data Analysis for measuring surface texture. We compute measures of order based on optical digital microscope images of surfaces treated using PVST. These measures are applied to the grid obtained from estimating the centers of tool impacts, and they quantify the grid’s deviations from the nominal one. Our results show that TDA provides a convenient framework for characterization of pattern type that bypasses some limitations of existing tools such as difficult manual processing of the data and the need for an expert user to analyze and interpret the surface images. -
Persistent Homology Analysis of Protein Structure, Flexibility, and Folding (2014)
Kelin Xia, Guo-Wei WeiAbstract
SUMMARYProteins are the most important biomolecules for living organisms. The understanding of protein structure, function, dynamics, and transport is one of the most challenging tasks in biological science. In the present work, persistent homology is, for the first time, introduced for extracting molecular topological fingerprints (MTFs) based on the persistence of molecular topological invariants. MTFs are utilized for protein characterization, identification, and classification. The method of slicing is proposed to track the geometric origin of protein topological invariants. Both all-atom and coarse-grained representations of MTFs are constructed. A new cutoff-like filtration is proposed to shed light on the optimal cutoff distance in elastic network models. On the basis of the correlation between protein compactness, rigidity, and connectivity, we propose an accumulated bar length generated from persistent topological invariants for the quantitative modeling of protein flexibility. To this end, a correlation matrix-based filtration is developed. This approach gives rise to an accurate prediction of the optimal characteristic distance used in protein B-factor analysis. Finally, MTFs are employed to characterize protein topological evolution during protein folding and quantitatively predict the protein folding stability. An excellent consistence between our persistent homology prediction and molecular dynamics simulation is found. This work reveals the topology–function relationship of proteins. Copyright © 2014 John Wiley & Sons, Ltd. -
Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)
Kelin Xia, Zhixiong Zhao, Guo-Wei WeiAbstract
Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs. -
Branching and Circular Features in High Dimensional Data (2011)
B. Wang, B. Summa, V. Pascucci, M. Vejdemo-JohanssonAbstract
Large observations and simulations in scientific research give rise to high-dimensional data sets that present many challenges and opportunities in data analysis and visualization. Researchers in application domains such as engineering, computational biology, climate study, imaging and motion capture are faced with the problem of how to discover compact representations of highdimensional data while preserving their intrinsic structure. In many applications, the original data is projected onto low-dimensional space via dimensionality reduction techniques prior to modeling. One problem with this approach is that the projection step in the process can fail to preserve structure in the data that is only apparent in high dimensions. Conversely, such techniques may create structural illusions in the projection, implying structure not present in the original high-dimensional data. Our solution is to utilize topological techniques to recover important structures in high-dimensional data that contains non-trivial topology. Specifically, we are interested in high-dimensional branching structures. We construct local circle-valued coordinate functions to represent such features. Subsequently, we perform dimensionality reduction on the data while ensuring such structures are visually preserved. Additionally, we study the effects of global circular structures on visualizations. Our results reveal never-before-seen structures on real-world data sets from a variety of applications. -
Topological Data Analysis of Financial Time Series: Landscapes of Crashes (2017)
Marian Gidea, Yuri KatzAbstract
We explore the evolution of daily returns of four major US stock market indices during the technology crash of 2000, and the financial crisis of 2007-2009. Our methodology is based on topological data analysis (TDA). We use persistence homology to detect and quantify topological patterns that appear in multidimensional time series. Using a sliding window, we extract time-dependent point cloud data sets, to which we associate a topological space. We detect transient loops that appear in this space, and we measure their persistence. This is encoded in real-valued functions referred to as a 'persistence landscapes'. We quantify the temporal changes in persistence landscapes via their \$L\textasciicircump\$-norms. We test this procedure on multidimensional time series generated by various non-linear and non-equilibrium models. We find that, in the vicinity of financial meltdowns, the \$L\textasciicircump\$-norms exhibit strong growth prior to the primary peak, which ascends during a crash. Remarkably, the average spectral density at low frequencies of the time series of \$L\textasciicircump\$-norms of the persistence landscapes demonstrates a strong rising trend for 250 trading days prior to either dotcom crash on 03/10/2000, or to the Lehman bankruptcy on 09/15/2008. Our study suggests that TDA provides a new type of econometric analysis, which goes beyond the standard statistical measures. The method can be used to detect early warning signals of imminent market crashes. We believe that this approach can be used beyond the analysis of financial time series presented here. -
A Topological Framework for Identifying Phenomenological Bifurcations in Stochastic Dynamical Systems (2024)
Sunia Tanweer, Firas A. Khasawneh, Elizabeth Munch, Joshua R. TempelmanAbstract
Changes in the parameters of dynamical systems can cause the state of the system to shift between different qualitative regimes. These shifts, known as bifurcations, are critical to study as they can indicate when the system is about to undergo harmful changes in its behavior. In stochastic dynamical systems, there is particular interest in P-type (phenomenological) bifurcations, which can include transitions from a monostable state to multi-stable states, the appearance of stochastic limit cycles and other features in the probability density function (PDF) of the system’s state. Current practices are limited to systems with small state spaces, cannot detect all possible behaviors of the PDFs and mandate human intervention for visually identifying the change in the PDF. In contrast, this study presents a new approach based on Topological Data Analysis that uses superlevel persistence to mathematically quantify P-type bifurcations in stochastic systems through a “homological bifurcation plot”—which shows the changing ranks of 0th and 1st homology groups, through Betti vectors. Using these plots, we demonstrate the successful detection of P-bifurcations on the stochastic Duffing, Raleigh-Vander Pol and Quintic Oscillators given their analytical PDFs, and elaborate on how to generate an estimated homological bifurcation plot given a kernel density estimate (KDE) of these systems by employing a tool for finding topological consistency between PDFs and KDEs. -
Time-Inhomogeneous Diffusion Geometry and Topology (2022)
Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita KrishnaswamyAbstract
Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis. -
Topological Data Analysis for the Characterization of Atomic Scale Morphology From Atom Probe Tomography Images (2018)
Tianmu Zhang, Scott R. Broderick, Krishna RajanAbstract
Atom probe tomography (APT) represents a revolutionary characterization tool for materials that combine atomic imaging with a time-of-flight (TOF) mass spectrometer to provide direct space three-dimensional, atomic scale resolution images of materials with the chemical identities of hundreds of millions of atoms. It involves the controlled removal of atoms from a specimen’s surface by field evaporation and then sequentially analyzing them with a position sensitive detector and TOF mass spectrometer. A paradox in APT is that while on the one hand, it provides an unprecedented level of imaging resolution in three dimensions, it is very difficult to obtain an accurate perspective of morphology or shape outlined by atoms of similar chemistry and microstructure. The origins of this problem are numerous, including incomplete detection of atoms and the complexity of the evaporation fields of atoms at or near interfaces. Hence, unlike scattering techniques such as electron microscopy, interfaces appear diffused, not sharp. This, in turn, makes it challenging to visualize and quantitatively interpret the microstructure at the “meso” scale, where one is interested in the shape and form of the interfaces and their associated chemical gradients. It is here that the application of informatics at the nanoscale and statistical learning methods plays a critical role in both defining the level of uncertainty and helping to make quantitative, statistically objective interpretations where heuristics often dominate. In this chapter, we show how the tools of Topological Data Analysis provide a new and powerful tool in the field of nanoinformatics for materials characterization. -
Topological Data Analysis as a Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace (2018)
Mao Li, Hong An, Ruthie Angelovici, Clement Bagaza, Albert Batushansky, Lynn Clark, Viktoriya Coneva, Michael J. Donoghue, Erika Edwards, Diego Fajardo, Hui Fang, Margaret H. Frank, Timothy Gallaher, Sarah Gebken, Theresa Hill, Shelley Jansky, Baljinder Kaur, Phillip C. Klahs, Laura L. Klein, Vasu Kuraparthy, Jason Londo, Zoë Migicovsky, Allison Miller, Rebekah Mohn, Sean Myles, Wagner C. Otoni, J. C. Pires, Edmond Rieffer, Sam Schmerler, Elizabeth Spriggs, Christopher N. Topp, Allen Van Deynze, Kuang Zhang, Linglong Zhu, Braden M. Zink, Daniel H. ChitwoodAbstract
Current morphometric methods that comprehensively measure shape cannot compare the disparate leaf shapes found in seed plants and are sensitive to processing artifacts. We explore the use of persistent homology, a topological method applied as a filtration across simplicial complexes (or more simply, a method to measure topological features of spaces across different spatial resolutions), to overcome these limitations. The described method isolates subsets of shape features and measures the spatial relationship of neighboring pixel densities in a shape. We apply the method to the analysis of 182,707 leaves, both published and unpublished, representing 141 plant families collected from 75 sites throughout the world. By measuring leaves from throughout the seed plants using persistent homology, a defined morphospace comparing all leaves is demarcated. Clear differences in shape between major phylogenetic groups are detected and estimates of leaf shape diversity within plant families are made. The approach predicts plant family above chance. The application of a persistent homology method, using topological features, to measure leaf shape allows for a unified morphometric framework to measure plant form, including shapes, textures, patterns, and branching architectures. -
A Topological Approach to Selecting Models of Biological Experiments (2019)
M. Ulmer, Lori Ziegelmeier, Chad M. TopazAbstract
We use topological data analysis as a tool to analyze the fit of mathematical models to experimental data. This study is built on data obtained from motion tracking groups of aphids in [Nilsen et al., PLOS One, 2013] and two random walk models that were proposed to describe the data. One model incorporates social interactions between the insects via a functional dependence on an aphid’s distance to its nearest neighbor. The second model is a control model that ignores this dependence. We compare data from each model to data from experiment by performing statistical tests based on three different sets of measures. First, we use time series of order parameters commonly used in collective motion studies. These order parameters measure the overall polarization and angular momentum of the group, and do not rely on a priori knowledge of the models that produced the data. Second, we use order parameter time series that do rely on a priori knowledge, namely average distance to nearest neighbor and percentage of aphids moving. Third, we use computational persistent homology to calculate topological signatures of the data. Analysis of the a priori order parameters indicates that the interactive model better describes the experimental data than the control model does. The topological approach performs as well as these a priori order parameters and better than the other order parameters, suggesting the utility of the topological approach in the absence of specific knowledge of mechanisms underlying the data. -
Multivariate Data Analysis Using Persistence-Based Filtering and Topological Signatures (2012)
B. Rieck, H. Mara, H. LeitteAbstract
The extraction of significant structures in arbitrary high-dimensional data sets is a challenging task. Moreover, classifying data points as noise in order to reduce a data set bears special relevance for many application domains. Standard methods such as clustering serve to reduce problem complexity by providing the user with classes of similar entities. However, they usually do not highlight relations between different entities and require a stopping criterion, e.g. the number of clusters to be detected. In this paper, we present a visualization pipeline based on recent advancements in algebraic topology. More precisely, we employ methods from persistent homology that enable topological data analysis on high-dimensional data sets. Our pipeline inherently copes with noisy data and data sets of arbitrary dimensions. It extracts central structures of a data set in a hierarchical manner by using a persistence-based filtering algorithm that is theoretically well-founded. We furthermore introduce persistence rings, a novel visualization technique for a class of topological features-the persistence intervals-of large data sets. Persistence rings provide a unique topological signature of a data set, which helps in recognizing similarities. In addition, we provide interactive visualization techniques that assist the user in evaluating the parameter space of our method in order to extract relevant structures. We describe and evaluate our analysis pipeline by means of two very distinct classes of data sets: First, a class of synthetic data sets containing topological objects is employed to highlight the interaction capabilities of our method. Second, in order to affirm the utility of our technique, we analyse a class of high-dimensional real-world data sets arising from current research in cultural heritage. -
Applications of Persistent Homology to Time Varying Systems (2013)
Elizabeth MunchAbstract
\textlessp\textgreaterThis dissertation extends the theory of persistent homology to time varying systems. Most of the previous work has been dedicated to using this powerful tool in topological data analysis to study static point clouds. In particular, given a point cloud, we can construct its persistence diagram. Since the diagram varies continuously as the point cloud varies continuously, we study the space of time varying persistence diagrams, called vineyards when they were introduced by Cohen-Steiner, Edelsbrunner, and Morozov.\textless/p\textgreater\textlessp\textgreaterWe will first show that with a good choice of metric, these vineyards are stable for small perturbations of their associated point clouds. We will also define a new mean for a set of persistence diagrams based on the work of Mileyko et al. which, unlike the previously defined mean, is continuous for geodesic vineyards. \textless/p\textgreater\textlessp\textgreaterNext, we study the sensor network problem posed by Ghrist and de Silva, and their application of persistent homology to understand when a set of sensors covers a given region. Giving each of these sensors a probability of failure over time, we show that an exact computation of the probability of failure of the whole system is NP-hard, but give an algorithm which can predict failure in the case of a monitored system.\textless/p\textgreater\textlessp\textgreaterFinally, we apply these methods to an automated system which can cluster agents moving in aerial images by their behaviors. We build a data structure for storing and querying the information in real-time, and define behavior vectors which quantify behaviors of interest. This clustering by behavior can be used to find groups of interest, for which we can also quantify behaviors in order to determine whether the group is working together to achieve a common goal, and we speculate that this work can be extended to improving tracking algorithms as well as behavioral predictors.\textless/p\textgreater -
Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology (2015)
Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, Catherine ParkAbstract
DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer.