🍩 Database of Original & Non-Theoretical Uses of Topology

(found 98 matches in 0.032701s)
  1. Hierarchical Clustering and Zeroth Persistent Homology (2020)

    İsmail Güzel, Atabey Kaygun
    Abstract In this article, we show that hierarchical clustering and the zeroth persistent homology do deliver the same topological information about a given data set. We show this fact using cophenetic matrices constructed out of the filtered Vietoris-Rips complex of the data set at hand. As in any cophenetic matrix, one can also display the inter-relations of zeroth homology classes via a rooted tree, also known as a dendogram. Since homological cophenetic matrices can be calculated for higher homologies, one can also sketch similar dendograms for higher persistent homology classes.
  2. Path Homology as a Stronger Analogue of Cyclomatic Complexity (2020)

    Steve Huntsman
    Abstract Cyclomatic complexity is an incompletely specified but mathematically principled software metric that can be usefully applied to both source and binary code. We consider the application of path homology as a stronger analogue of cyclomatic complexity. We have implemented an algorithm to compute path homology in arbitrary dimension and applied it to several classes of relevant flow graphs, including randomly generated flow graphs representing structured and unstructured control flow. We also compared path homology and cyclomatic complexity on a set of disassembled binaries obtained from the grep utility. There exist control flow graphs realizable at the assembly level with nontrivial path homology in arbitrary dimension. We exhibit several classes of examples in this vein while also experimentally demonstrating that path homology gives identicial results to cyclomatic complexity for at least one detailed notion of structured control flow. We also experimentally demonstrate that the two notions differ on disassembled binaries, and we highlight an example of extreme disagreement. Path homology empirically generalizes cyclomatic complexity for an elementary notion of structured code and appears to identify more structurally relevant features of control flow in general. Path homology therefore has the potential to substantially improve upon cyclomatic complexity.
  3. Coordinate-Free Coverage in Sensor Networks With Controlled Boundaries via Homology (2006)

    V. de Silva, R. Ghrist
    Abstract Tools from computational homology are introduced to verify coverage in an idealized sensor network. These methods are unique in that, while they are coordinate-free and assume no localization or orientation capabilities for the nodes, there are also no probabilistic assumptions. The key ingredient is the theory of homology from algebraic topology. The robustness of these tools is demonstrated by adapting them to a variety of settings, including static planar coverage, 3-D barrier coverage, and time-dependent sweeping coverage. Results are also given on hole repair, error tolerance, optimal coverage, and variable radii. An overview of implementation is given.
  4. Coverage Criterion in Sensor Networks Stable Under Perturbation (2014)

    Yasuaki Hiraoka, Genki Kusano
    Abstract To the coverage problem of sensor networks, V. de Silva and R. Ghrist (2007) developed several approaches based on (persistent) homology theory. Their criteria for the coverage are formulated on the Rips complexes constructed by the sensors, in which their locations are supposed to be fixed. However, the sensors are in general affected by perturbations (e.g., natural phenomena), and hence the stability of the coverage criteria should be also discussed. In this paper, we present a coverage theorem stable under perturbation. Furthermore, we also introduce a method of eliminating redundant cover after perturbation. The coverage theorem is derived by extending the Rips interleaving theorem studied by F. Chazal, V. de Silva, and S. Oudot (2013) into an appropriate relative version.
  5. Topological Machine Learning for Multivariate Time Series (2020)

    Chengyuan Wu, Carol Anne Hargreaves
    Abstract We develop a framework for analyzing multivariate time series using topological data analysis (TDA) methods. The proposed methodology involves converting the multivariate time series to point cloud data, calculating Wasserstein distances between the persistence diagrams and using the \$k\$-nearest neighbors algorithm (\$k\$-NN) for supervised machine learning. Two methods (symmetry-breaking and anchor points) are also introduced to enable TDA to better analyze data with heterogeneous features that are sensitive to translation, rotation, or choice of coordinates. We apply our methods to room occupancy detection based on 5 time-dependent variables (temperature, humidity, light, CO2 and humidity ratio). Experimental results show that topological methods are effective in predicting room occupancy during a time window. We also apply our methods to an Activity Recognition dataset and obtained good results.
  6. TopoGAN: A Topology-Aware Generative Adversarial Network (2020)

    Fan Wang, Huidong Liu, Dimitris Samaras, Chao Chen
    Abstract Existing generative adversarial networks (GANs) focus on generating realistic images based on CNN-derived image features, but fail to preserve the structural properties of real images. This can be fatal in applications where the underlying structure (e.g.., neurons, vessels, membranes, and road networks) of the image carries crucial semantic meaning. In this paper, we propose a novel GAN model that learns the topology of real images, i.e., connectedness and loopy-ness. In particular, we introduce a new loss that bridges the gap between synthetic image distribution and real image distribution in the topological feature space. By optimizing this loss, the generator produces images with the same structural topology as real images. We also propose new GAN evaluation metrics that measure the topological realism of the synthetic images. We show in experiments that our method generates synthetic images with realistic topology. We also highlight the increased performance that our method brings to downstream tasks such as segmentation.

    Community Resources

  7. Weighted Persistent Homology for Osmolyte Molecular Aggregation and Hydrogen-Bonding Network Analysis (2020)

    D. Vijay Anand, Zhenyu Meng, Kelin Xia, Yuguang Mu
    Abstract It has long been observed that trimethylamine N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work on these two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). Boltzmann persistent entropy (BPE) is proposed to quantitatively characterize the topological features from LPH and IPH, together with persistent Betti number (PBN). More specifically, from the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO is found to exhibit a local network structure. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 Å and a few global circle elements at around 12 Å. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function. Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4 Å, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5 Å to 10 Å. These differences are also reflected in the PBNs and BPEs of the local-scale IPH. These localized topological information has never been revealed before. Since graphs can be transferred into simplicial complexes by the clique complex, our weighted persistent homology models can be used in the analysis of various networks and graphs from any molecular structures and aggregation systems.
  8. Evasion Paths in Mobile Sensor Networks (2015)

    Henry Adams, Gunnar Carlsson
    Abstract Suppose that ball-shaped sensors wander in a bounded domain. A sensor does not know its location but does know when it overlaps a nearby sensor. We say that an evasion path exists in this sensor network if a moving intruder can avoid detection. In ‘Coordinate-free coverage in sensor networks with controlled boundaries via homology', Vin de Silva and Robert Ghrist give a necessary condition, depending only on the time-varying connectivity data of the sensors, for an evasion path to exist. Using zigzag persistent homology, we provide an equivalent condition that moreover can be computed in a streaming fashion. However, no method with time-varying connectivity data as input can give necessary and sufficient conditions for the existence of an evasion path. Indeed, we show that the existence of an evasion path depends not only on the fibrewise homotopy type of the region covered by sensors but also on its embedding in spacetime. For planar sensors that also measure weak rotation and distance information, we provide necessary and sufficient conditions for the existence of an evasion path.
  9. Stable Topological Summaries for Analyzing the Organization of Cells in a Packed Tissue (2021)

    Nieves Atienza, Maria-Jose Jimenez, Manuel Soriano-Trigueros
    Abstract We use topological data analysis tools for studying the inner organization of cells in segmented images of epithelial tissues. More specifically, for each segmented image, we compute different persistence barcodes, which codify the lifetime of homology classes (persistent homology) along different filtrations (increasing nested sequences of simplicial complexes) that are built from the regions representing the cells in the tissue. We use a complete and well-grounded set of numerical variables over those persistence barcodes, also known as topological summaries. A novel combination of normalization methods for both the set of input segmented images and the produced barcodes allows for the proven stability results for those variables with respect to small changes in the input, as well as invariance to image scale. Our study provides new insights to this problem, such as a possible novel indicator for the development of the drosophila wing disc tissue or the importance of centroids’ distribution to differentiate some tissues from their CVT-path counterpart (a mathematical model of epithelia based on Voronoi diagrams). We also show how the use of topological summaries may improve the classification accuracy of epithelial images using a Random Forest algorithm.
  10. Transfer Learning for Autonomous Chatter Detection in Machining (2022)

    Melih C. Yesilli, Firas A. Khasawneh, Brian P. Mann
    Abstract Large-amplitude chatter vibrations are one of the most important phenomena in machining processes. It is often detrimental in cutting operations causing a poor surface finish and decreased tool life. Therefore, chatter detection using machine learning has been an active research area over the last decade. Three challenges can be identified in applying machine learning for chatter detection at large in industry: an insufficient understanding of the universality of chatter features across different processes, the need for automating feature extraction, and the existence of limited data for each specific workpiece-machine tool combination, e.g., when machining one-off products. These three challenges can be grouped under the umbrella of transfer learning, which is concerned with studying how knowledge gained from one setting can be leveraged to obtain information in new settings. This paper studies automating chatter detection by evaluating transfer learning of prominent as well as novel chatter detection methods. We investigate chatter classification accuracy using a variety of features extracted from turning and milling experiments with different cutting configurations. The studied methods include Fast Fourier Transform (FFT), Power Spectral Density (PSD), the Auto-correlation Function (ACF), and decomposition based tools such as Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We also examine more recent approaches based on Topological Data Analysis (TDA) and similarity measures of time series based on Discrete Time Warping (DTW). We evaluate transfer learning potential of each approach by training and testing both within and across the turning and milling data sets. Four supervised classification algorithms are explored: support vector machine (SVM), logistic regression, random forest classification, and gradient boosting. In addition to accuracy, we also comment on the automation potential of feature extraction for each approach which is integral to creating autonomous manufacturing centers. Our results show that carefully chosen time-frequency features can lead to high classification accuracies albeit at the cost of requiring manual pre-processing and the tagging of an expert user. On the other hand, we found that the TDA and DTW approaches can provide accuracies and F1-scores on par with the time-frequency methods without the need for manual preprocessing via completely automatic pipelines. Further, we discovered that the DTW approach outperforms all other methods when trained using the milling data and tested on the turning data. Therefore, TDA and DTW approaches may be preferred over the time-frequency-based approaches for fully automated chatter detection schemes. DTW and TDA also can be more advantageous when pooling data from either limited workpiece-machine tool combinations, or from small data sets of one-off processes.
  11. The Emergence of Higher-Order Structure in Scientific and Technological Knowledge Networks (2020)

    Thomas Gebhart, Russell J. Funk
    Abstract The growth of science and technology is primarily a recombinative process, wherein new discoveries and inventions are generally built from prior knowledge. While the recent past has seen rapid growth in scientific and technological knowledge, relatively little is known about the manner in which science and technology develop and coalesce knowledge into larger structures that enable or constrain future breakthroughs. Network science has recently emerged as a framework for measuring the structure and dynamics of knowledge. While helpful, these existing approaches struggle to capture the global structural properties of the underlying networks, leading to conflicting observations about the nature of scientific and technological progress. We bridge this methodological gap using tools from algebraic topology to characterize the higher-order structure of knowledge networks in science and technology across scale. We observe rapid and varied growth in the high-dimensional structure in many fields of science and technology, and find this high-dimensional growth coincides with decline in lower-dimensional structure. This higher-order growth in knowledge networks has historically far outpaced the growth in scientific and technological collaboration networks. We also characterize the relationship between higher-order structure and the nature of the science and technology produced within these structural environments and find a positive relationship between the abstractness of language used within fields and increasing high-dimensional structure. We also find a robust relationship between high-dimensional structure and number of metrics for publication success, implying this high-dimensional structure may be linked to discovery and invention.
  12. Loops Abound in the Cosmic Microwave Background: A \$4\sigma\$ Anomaly on Super-Horizon Scales (2021)

    Pratyush Pranav
    Abstract We present a topological analysis of the temperature fluctuation maps from the \emph\Planck 2020\ Data release 4 (DR4) based on the \texttt\NPIPE\ data processing pipeline. For comparison, we also present the topological characteristics of the maps from \emph\Planck 2018\ Data release 3 (DR3). We perform our analysis in terms of the homology characteristics of the maps, invoking relative homology to account for analysis in the presence of masks. We perform our analysis for a range of smoothing scales spanning sub- and super-horizon scales corresponding to \$FWHM = 5', 10', 20', 40', 80', 160', 320', 640'\$. Our main result indicates a significantly anomalous behavior of the loops in the observed maps compared to simulations that are modeled as isotopic and homogeneous Gaussian random fields. Specifically, we observe a \$4\sigma\$ deviation between the observation and simulations in the number of loops at \$FWHM = 320'\$ and \$FWHM = 640'\$, corresponding to super-horizon scales of \$5\$ degrees and larger. In addition, we also notice a mildly significant deviation at \$2\sigma\$ for all the topological descriptors for almost all the scales analyzed. Our results show a consistency across different data releases, and therefore, the anomalous behavior deserves a careful consideration regarding its origin and ramifications. Disregarding the unlikely source of the anomaly being instrumental systematics, the origin of the anomaly may be genuinely astrophysical -- perhaps due to a yet unresolved foreground, or truly primordial in nature. Given the nature of the topological descriptors, that potentially encodes information of all orders, non-Gaussianities, of either primordial or late-type nature, may be potential candidates. Alternate possibilities include the Universe admitting a non-trivial global topology, including effects induced by large-scale topological defects.
  13. Quantifying Genetic Innovation: Mathematical Foundations for the Topological Study of Reticulate Evolution (2020)

    Michael Lesnick, Raúl Rabadán, Daniel I. S. Rosenbloom
    Abstract A topological approach to the study of genetic recombination, based on persistent homology, was introduced by Chan, Carlsson, and Rabadán in 2013. This associates a sequence of signatures called barcodes to genomic data sampled from an evolutionary history. In this paper, we develop theoretical foundations for this approach. First, we present a novel formulation of the underlying inference problem. Specifically, we introduce and study the novelty profile, a simple, stable statistic of an evolutionary history which not only counts recombination events but also quantifies how recombination creates genetic diversity. We propose that the (hitherto implicit) goal of the topological approach to recombination is the estimation of novelty profiles. We then study the problem of obtaining a lower bound on the novelty profile using barcodes. We focus on a low-recombination regime, where the evolutionary history can be described by a directed acyclic graph called a galled tree, which differs from a tree only by isolated topological defects. We show that in this regime, under a complete sampling assumption, the \$1\textasciicircum\mathrm\st\\$ barcode yields a lower bound on the novelty profile, and hence on the number of recombination events. For \$i\textgreater1\$, the \$i\textasciicircum\\mathrm\th\\\$ barcode is empty. In addition, we use a stability principle to strengthen these results to ones which hold for any subsample of an arbitrary evolutionary history. To establish these results, we describe the topology of the Vietoris--Rips filtrations arising from evolutionary histories indexed by galled trees. As a step towards a probabilistic theory, we also show that for a random history indexed by a fixed galled tree and satisfying biologically reasonable conditions, the intervals of the \$1\textasciicircum\\mathrm\st\\\$ barcode are independent random variables. Using simulations, we explore the sensitivity of these intervals to recombination.
  14. Applications of Persistent Homology to Time Varying Systems (2013)

    Elizabeth Munch
    Abstract \textlessp\textgreaterThis dissertation extends the theory of persistent homology to time varying systems. Most of the previous work has been dedicated to using this powerful tool in topological data analysis to study static point clouds. In particular, given a point cloud, we can construct its persistence diagram. Since the diagram varies continuously as the point cloud varies continuously, we study the space of time varying persistence diagrams, called vineyards when they were introduced by Cohen-Steiner, Edelsbrunner, and Morozov.\textless/p\textgreater\textlessp\textgreaterWe will first show that with a good choice of metric, these vineyards are stable for small perturbations of their associated point clouds. We will also define a new mean for a set of persistence diagrams based on the work of Mileyko et al. which, unlike the previously defined mean, is continuous for geodesic vineyards. \textless/p\textgreater\textlessp\textgreaterNext, we study the sensor network problem posed by Ghrist and de Silva, and their application of persistent homology to understand when a set of sensors covers a given region. Giving each of these sensors a probability of failure over time, we show that an exact computation of the probability of failure of the whole system is NP-hard, but give an algorithm which can predict failure in the case of a monitored system.\textless/p\textgreater\textlessp\textgreaterFinally, we apply these methods to an automated system which can cluster agents moving in aerial images by their behaviors. We build a data structure for storing and querying the information in real-time, and define behavior vectors which quantify behaviors of interest. This clustering by behavior can be used to find groups of interest, for which we can also quantify behaviors in order to determine whether the group is working together to achieve a common goal, and we speculate that this work can be extended to improving tracking algorithms as well as behavioral predictors.\textless/p\textgreater
  15. Sheaves Are the Canonical Data Structure for Sensor Integration (2017)

    Michael Robinson
    Abstract A sensor integration framework should be sufficiently general to accurately represent many sensor modalities, and also be able to summarize information in a faithful way that emphasizes important, actionable information. Few approaches adequately address these two discordant requirements. The purpose of this expository paper is to explain why sheaves are the canonical data structure for sensor integration and how the mathematics of sheaves satisfies our two requirements. We outline some of the powerful inferential tools that are not available to other representational frameworks.
  16. A Barcode Shape Descriptor for Curve Point Cloud Data (2004)

    Anne Collins, Afra Zomorodian, Gunnar Carlsson, Leonidas J. Guibas
    Abstract In this paper, we present a complete computational pipeline for extracting a compact shape descriptor for curve point cloud data (PCD). Our shape descriptor, called a barcode, is based on a blend of techniques from differential geometry and algebraic topology. We also provide a metric over the space of barcodes, enabling fast comparison of PCDs for shape recognition and clustering. To demonstrate the feasibility of our approach, we implement our pipeline and provide experimental evidence in shape classification and parametrization.
  17. Simplicial Neural Networks (2020)

    Stefania Ebli, Michaël Defferrard, Gard Spreemann
    Abstract We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices - allowing us to consider richer data, including vector fields and \$n\$-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes.
  18. Investigation of Flash Crash via Topological Data Analysis (2020)

    Wonse Kim, Younng-Jin Kim, Gihyun Lee, Woong Kook
    Abstract Topological data analysis has been acknowledged as one of the most successful mathematical data analytic methodologies in various fields including medicine, genetics, and image analysis. In this paper, we explore the potential of this methodology in finance by applying persistence landscape and dynamic time series analysis to analyze an extreme event in the stock market, known as Flash Crash. We will provide results of our empirical investigation to confirm the effectiveness of our new method not only for the characterization of this extreme event but also for its prediction purposes.
  19. Microscopic Description of Yielding in Glass Based on Persistent Homology (2019)

    Tatsuhiko Shirai, Takenobu Nakamura
    Abstract Persistent homology (PH) was applied to probe the structural changes of glasses under shear. PH associates each local atomistic structure in an atomistic configuration to a geometric object, namely, a hole, and evaluates the robustness of these holes against noise. We found that the microscopic structures were qualitatively different before and after yielding. The structures before yielding contained robust holes, the number of which decreased after yielding. We also observed that the structures after yielding approached those of quickly quenched glass. This work demonstrates the crucial role of robust holes in yielding and provides an interpretation based on geometry.
  20. Persistent Betti Numbers for a Noise Tolerant Shape-Based Approach to Image Retrieval (2011)

    Patrizio Frosini, Claudia Landi
    Abstract In content-based image retrieval a major problem is the presence of noisy shapes. It is well known that persistent Betti numbers are a shape descriptor that admits a dissimilarity distance, the matching distance, stable under continuous shape deformations. In this paper we focus on the problem of dealing with noise that changes the topology of the studied objects. We present a general method to turn persistent Betti numbers into stable descriptors also in the presence of topological changes. Retrieval tests on the Kimia-99 database show the effectiveness of the method.
  21. Structural Insight Into RNA Hairpin Folding Intermediates (2008)

    Gregory R. Bowman, Xuhui Huang, Yuan Yao, Jian Sun, Gunnar Carlsson, Leonidas J. Guibas, Vijay S. Pande
    Abstract , Hairpins are a ubiquitous secondary structure motif in RNA molecules. Despite their simple structure, there is some debate over whether they fold in a two-state or multi-state manner. We have studied the folding of a small tetraloop hairpin using a serial version of replica exchange molecular dynamics on a distributed computing environment. On the basis of these simulations, we have identified a number of intermediates that are consistent with experimental results. We also find that folding is not simply the reverse of high-temperature unfolding and suggest that this may be a general feature of biomolecular folding.
  22. Classification of Skin Lesions by Topological Data Analysis Alongside With Neural Network (2020)

    Naiereh Elyasi, Mehdi Hosseini Moghadam
    Abstract In this paper we use TDA mapper alongside with deep convolutional neural networks in the classification of 7 major skin diseases. First we apply kepler mapper with neural network as one of its filter steps to classify the dataset HAM10000. Mapper visualizes the classification result by a simplicial complex, where neural network can not do this alone, but as a filter step neural network helps to classify data better. Furthermore we apply TDA mapper and persistent homology to understand the weights of layers of mobilenet network in different training epochs of HAM10000. Also we use persistent diagrams to visualize the results of analysis of layers of mobilenet network.
  23. Topology of Force Networks in Granular Media Under Impact (2017)

    M. X. Lim, R. P. Behringer
    Abstract We investigate the evolution of the force network in experimental systems of two-dimensional granular materials under impact. We use the first Betti number, , and persistence diagrams, as measures of the topological properties of the force network. We show that the structure of the network has a complex, hysteretic dependence on both the intruder acceleration and the total force response of the granular material. can also distinguish between the nonlinear formation and relaxation of the force network. In addition, using the persistence diagram of the force network, we show that the size of the loops in the force network has a Poisson-like distribution, the characteristic size of which changes over the course of the impact.
  24. Topological Singularity Detection at Multiple Scales (2023)

    Julius von Rohrscheidt, Bastian Rieck
    Abstract The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.
  25. A Machine-Learning-Based Early Warning System Boosted by Topological Data Analysis (2019)

    Devraj Basu, Tieqiang Li
    Abstract We propose a novel early warning system for detecting financial market crashes that utilizes the information extracted from the shape of financial market movement. Our system incorporates Topological Data Analysis (TDA), a new set of data analytics techniques specialised in profiling the shape of data, into a more traditional machine learning framework. Incorporating TDA leads to substantial improvements in timely detecting the onset of a sharp market decline. Our framework is both able to generate new features and also unlock more value from existing factors. Our results illustrate the importance of understanding the shape of financial market data and suggest that incorporating TDA into a machine learning framework could be beneficial in a number of financial market settings.
  26. Multiscale Projective Coordinates via Persistent Cohomology of Sparse Filtrations (2018)

    Jose A. Perea
    Abstract We present a framework which leverages the underlying topology of a data set, in order to produce appropriate coordinate representations. In particular, we show how to construct maps to real and complex projective spaces, given appropriate persistent cohomology classes. An initial map is obtained in two steps: First, the persistent cohomology of a sparse filtration is used to compute systems of transition functions for (real and complex) line bundles over neighborhoods of the data. Next, the transition functions are used to produce explicit classifying maps for the induced bundles. A framework for dimensionality reduction in projective space (Principal Projective Components) is also developed, aimed at decreasing the target dimension of the original map. Several examples are provided as well as theorems addressing choices in the construction.
  27. Complexes of Tournaments, Directionality Filtrations and Persistent Homology (2020)

    Dejan Govc, Ran Levi, Jason P. Smith
    Abstract Complete digraphs are referred to in the combinatorics literature as tournaments. We consider a family of semi-simplicial complexes, that we refer to as "tournaplexes", whose simplices are tournaments. In particular, given a digraph \$\mathcal\G\\$, we associate with it a "flag tournaplex" which is a tournaplex containing the directed flag complex of \$\mathcal\G\\$, but also the geometric realisation of cliques that are not directed. We define several types of filtrations on tournaplexes, and exploiting persistent homology, we observe that flag tournaplexes provide finer means of distinguishing graph dynamics than the directed flag complex. We then demonstrate the power of these ideas by applying them to graph data arising from the Blue Brain Project's digital reconstruction of a rat's neocortex.
  28. Spatial Embedding Imposes Constraints on Neuronal Network Architectures (2018)

    Jennifer Stiso, Danielle S. Bassett
    Abstract Recent progress towards understanding circuit function has capitalized on tools from network science to parsimoniously describe the spatiotemporal architecture of neural systems. Such tools often address systems topology divorced from its physical instantiation. Nevertheless, for embedded systems such as the brain, physical laws directly constrain the processes of network growth, development, and function. We review here the rules imposed by the space and volume of the brain on the development of neuronal networks, and show that these rules give rise to a specific set of complex topologies. These rules also affect the repertoire of neural dynamics that can emerge from the system, and thereby inform our understanding of network dysfunction in disease. We close by discussing new tools and models to delineate the effects of spatial embedding.
  29. Robust Crossings Detection in Noisy Signals Using Topological Signal Processing (2024)

    Sunia Tanweer, Firas A. Khasawneh, Elizabeth Munch
    Abstract This article explores a novel method of bracketing zero-crossings for both 1-D functions and discretely sampled time series by the application of 0-D persistent homology from algebraic topology. We introduce an algorithm and demonstrate its capability of detecting crossing in noisy signals across various sampling frequencies. Compared to other software-based methods for crossing-detection in signals, our approach is typically faster, shows a higher accuracy, and has the unique ability to identify all roots within the provided interval instead of detecting only one out of all. We also discuss different options for mathematically estimating the persistence threshold— a parameter which impacts and controls the correct bracketing of roots. Finally, we explore the potential of extending our algorithm to higher dimensions.
  30. Practical Joint Human-Machine Exploration of Industrial Time Series Using the Matrix Profile (2023)

    Felix Nilsson, Mohamed-Rafik Bouguelia, Thorsteinn Rögnvaldsson
    Abstract Technological advancements and widespread adaptation of new technology in industry have made industrial time series data more available than ever before. With this development grows the need for versatile methods for mining industrial time series data. This paper introduces a practical approach for joint human-machine exploration of industrial time series data using the Matrix Profile, and presents some challenges involved. The approach is demonstrated on three real-life industrial data sets to show how it enables the user to quickly extract semantic information, detect cycles, find deviating patterns, and gain a deeper understanding of the time series. A benchmark test is also presented on ECG (electrocardiogram) data, showing that the approach works well in comparison to previously suggested methods for extracting relevant time series motifs.
  31. Topological Regularization for Dense Prediction (2021)

    Deqing Fu, Bradley J. Nelson
    Abstract Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with these topological descriptions. Experimental results show that the output topology can also appear in the internal activations of trained neural networks which allows for a novel use of topological regularization to the internal states of neural networks during training, reducing the computational cost of the regularization. We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures.
  32. Topological Data Analysis: Concepts, Computation, and Applications in Chemical Engineering (2021)

    Alexander D. Smith, Paweł Dłotko, Victor M. Zavala
    Abstract A primary hypothesis that drives scientific and engineering studies is that data has structure. The dominant paradigms for describing such structure are statistics (e.g., moments, correlation functions) and signal processing (e.g., convolutional neural nets, Fourier series). Topological Data Analysis (TDA) is a field of mathematics that analyzes data from a fundamentally different perspective. TDA represents datasets as geometric objects and provides dimensionality reduction techniques that project such objects onto low-dimensional descriptors. The key properties of these descriptors (also known as topological features) are that they provide multiscale information and that they are stable under perturbations (e.g., noise, translation, and rotation). In this work, we review the key mathematical concepts and methods of TDA and present different applications in chemical engineering.
  33. Protein-Folding Analysis Using Features Obtained by Persistent Homology (2020)

    Takashi Ichinomiya, Ippei Obayashi, Yasuaki Hiraoka
    Abstract Understanding the protein-folding process is an outstanding issue in biophysics; recent developments in molecular dynamics simulation have provided insights into this phenomenon. However, the large freedom of atomic motion hinders the understanding of this process. In this study, we applied persistent homology, an emerging method to analyze topological features in a data set, to reveal protein-folding dynamics. We developed a new, to our knowledge, method to characterize the protein structure based on persistent homology and applied this method to molecular dynamics simulations of chignolin. Using principle component analysis or nonnegative matrix factorization, our analysis method revealed two stable states and one saddle state, corresponding to the native, misfolded, and transition states, respectively. We also identified an unfolded state with slow dynamics in the reduced space. Our method serves as a promising tool to understand the protein-folding process.
  34. Persistent Homology Machine Learning for Fingerprint Classification (2019)

    N. Giansiracusa, R. Giansiracusa, C. Moon
    Abstract The fingerprint classification problem is to sort fingerprints into predetermined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near state-of-the-art classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3-dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint ink-roll images, which yields a lower accuracy rate but still shows promise; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD-27.
  35. The Weighted Euler Curve Transform for Shape and Image Analysis (2020)

    Qitong Jiang, Sebastian Kurtek, Tom Needham
    Abstract The Euler Curve Transform (ECT) of Turner et al. is a complete invariant of an embedded simplicial complex, which is amenable to statistical analysis. We generalize the ECT to provide a similarly convenient representation for weighted simplicial complexes, objects which arise naturally, for example, in certain medical imaging applications. We leverage work of Ghrist et al. on Euler integral calculus to prove that this invariant—dubbed the Weighted Euler Curve Transform (WECT)—is also complete. We explain how to transform a segmented region of interest in a grayscale image into a weighted simplicial complex and then into a WECT representation. This WECT representation is applied to study Glioblastoma Multiforme brain tumor shape and texture data. We show that the WECT representation is effective at clustering tumors based on qualitative shape and texture features and that this clustering correlates with patient survival time.
  36. Crystallographic Interacting Topological Phases and Equvariant Cohomology: To Assume or Not to Assume (2020)

    Daniel Sheinbaum, Omar Antolín Camarena
    Abstract For symmorphic crystalline interacting gapped systems we derive a classification under adiabatic evolution. This classification is complete for non-degenerate ground states. For the degenerate case we discuss some invariants given by equivariant characteristic classes. We do not assume an emergent relativistic field theory nor that phases form a topological spectrum. We also do not assume short-range entanglement nor the existence of quasi-particles as is done in SPT and SET classifications respectively. Using a slightly generalized Bloch decomposition and Grassmanians made out of ground state spaces, we show that the \$P\$-equivariant cohomology of a \$d\$-dimensional torus gives rise to different interacting phases. We compare our results to bosonic symmorphic crystallographic SPT phases and to non-interacting fermionic crystallographic phases in class A. Finally we discuss the relation of our assumptions to those made for crystallographic SPT and SET phases.
  37. Persistent Homology Advances Interpretable Machine Learning for Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Joseph Montoya, Jens Hummelshøj, Dmitriy Morozov
    Abstract Machine learning for nanoporous materials design and discovery has emerged as a promising alternative to more time-consuming experiments and simulations. The challenge with this approach is the selection of features that enable universal and interpretable materials representations across multiple prediction tasks. We use persistent homology to construct holistic representations of the materials structure. We show that these representations can also be augmented with other generic features such as word embeddings from natural language processing to capture chemical information. We demonstrate our approach on multiple metal-organic framework datasets by predicting a variety of gas adsorption targets. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from commonly used manually curated features. Persistent homology features allow us to locate the pores that correlate best to adsorption at different pressures, contributing to understanding atomic level structure-property relationships for materials design.
  38. Towards a Philological Metric Through a Topological Data Analysis Approach (2020)

    Eduardo Paluzo-Hidalgo, Rocio Gonzalez-Diaz, Miguel A. Gutiérrez-Naranjo
    Abstract The canon of the baroque Spanish literature has been thoroughly studied with philological techniques. The major representatives of the poetry of this epoch are Francisco de Quevedo and Luis de Góngora y Argote. They are commonly classified by the literary experts in two different streams: Quevedo belongs to the Conceptismo and G\ńgora to the Culteranismo. Besides, traditionally, even if Quevedo is considered the most representative of the Conceptismo, Lope de Vega is also considered to be, at least, closely related to this literary trend. In this paper, we use Topological Data Analysis techniques to provide a first approach to a metric distance between the literary style of these poets. As a consequence, we reach results that are under the literary experts' criteria, locating the literary style of Lope de Vega, closer to the one of Quevedo than to the one of G\'ǵora.
  39. Topological Data Analysis of Contagion Maps for Examining Spreading Processes on Networks (2015)

    Dane Taylor, Florian Klimm, Heather A. Harrington, Miroslav Kramár, Konstantin Mischaikow, Mason A. Porter, Peter J. Mucha
    Abstract Social and biological contagions are influenced by the spatial embeddedness of networks. Historically, many epidemics spread as a wave across part of the Earth’s surface; however, in modern contagions long-range edges—for example, due to airline transportation or communication media—allow clusters of a contagion to appear in distant locations. Here we study the spread of contagions on networks through a methodology grounded in topological data analysis and nonlinear dimension reduction. We construct ‘contagion maps’ that use multiple contagions on a network to map the nodes as a point cloud. By analysing the topology, geometry and dimensionality of manifold structure in such point clouds, we reveal insights to aid in the modelling, forecast and control of spreading processes. Our approach highlights contagion maps also as a viable tool for inferring low-dimensional structure in networks.
  40. Visual Detection of Structural Changes in Time-Varying Graphs Using Persistent Homology (2018)

    Mustafa Hajij, Bei Wang, Carlos Scheidegger, Paul Rosen
    Abstract Topological data analysis is an emerging area in exploratory data analysis and data mining. Its main tool, persistent homology, has become a popular technique to study the structure of complex, high-dimensional data. In this paper, we propose a novel method using persistent homology to quantify structural changes in time-varying graphs. Specifically, we transform each instance of the time-varying graph into a metric space, extract topological features using persistent homology, and compare those features over time. We provide a visualization that assists in time-varying graph exploration and helps to identify patterns of behavior within the data. To validate our approach, we conduct several case studies on real-world datasets and show how our method can find cyclic patterns, deviations from those patterns, and one-time events in time-varying graphs. We also examine whether a persistence-based similarity measure satisfies a set of well-established, desirable properties for graph metrics.
  41. Knowledge Gaps in the Early Growth of Semantic Feature Networks (2018)

    Ann E. Sizemore, Elisabeth A. Karuza, Chad Giusti, Danielle S. Bassett
    Abstract Understanding language learning and more general knowledge acquisition requires the characterization of inherently qualitative structures. Recent work has applied network science to this task by creating semantic feature networks, in which words correspond to nodes and connections correspond to shared features, and then by characterizing the structure of strongly interrelated groups of words. However, the importance of sparse portions of the semantic network—knowledge gaps—remains unexplored. Using applied topology, we query the prevalence of knowledge gaps, which we propose manifest as cavities in the growing semantic feature network of toddlers. We detect topological cavities of multiple dimensions and find that, despite word order variation, the global organization remains similar. We also show that nodal network measures correlate with filling cavities better than basic lexical properties. Finally, we discuss the importance of semantic feature network topology in language learning and speculate that the progression through knowledge gaps may be a robust feature of knowledge acquisition.
  42. Extracting Insights From the Shape of Complex Data Using Topology (2013)

    P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson
    Abstract This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.
  43. Identifying Repeating Patterns in IEC 61499 Systems Using Feature-Based Embeddings (2022)

    Markus Unterdechler, Antonio M. Gutiérrez, Lisa Sonnleithner, Rick Rabiser, Alois Zoitl
    Abstract Cyber-Physical Production Systems (CPPSs) are highly variable systems of systems comprised of software and hardware interacting with each other and the environment. The increasing integration of technologies and devices has brought an unprecedented level of automation and customization. At the same time, it has also increased the efforts to maintain highly complex and heterogeneous systems. Although engineering practices support the reuse of common components to ease the development and maintenance of the systems in different projects, the identification of common components is still manually performed, which is a time-consuming, error-prone task. In this paper, a novel approach identifying repeating patterns in CPPSs based on artificial intelligence techniques is presented. This approach allows finding exact and similar components to support the CPPS design. Furthermore, it enables the maintenance of common components by reusing predefined types thereby reducing development effort. We implemented and evaluated our approach in an industry case study on developing CPPS control software with IEC 61499.
  44. Quantitative and Interpretable Order Parameters for Phase Transitions From Persistent Homology (2020)

    Alex Cole, Gregory J. Loges, Gary Shiu
    Abstract We apply modern methods in computational topology to the task of discovering and characterizing phase transitions. As illustrations, we apply our method to four two-dimensional lattice spin models: the Ising, square ice, XY, and fully-frustrated XY models. In particular, we use persistent homology, which computes the births and deaths of individual topological features as a coarse-graining scale or sublevel threshold is increased, to summarize multiscale and high-point correlations in a spin configuration. We employ vector representations of this information called persistence images to formulate and perform the statistical task of distinguishing phases. For the models we consider, a simple logistic regression on these images is sufficient to identify the phase transition. Interpretable order parameters are then read from the weights of the regression. This method suffices to identify magnetization, frustration, and vortex-antivortex structure as relevant features for phase transitions in our models. We also define "persistence" critical exponents and study how they are related to those critical exponents usually considered.
  45. Unsupervised Topological Learning Approach of Crystal Nucleation (2022)

    Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse
    Abstract Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unravelled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometre length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to monatomic metals, it shows that both translational and orientational ordering always come into play simultaneously as a result of the strong bonding when homogeneous nucleation starts in regions with low five-fold symmetry. It also reveals the specificity of the nucleation pathways depending on the element considered, with features beyond the hypothesis of Classical Nucleation Theory.
  46. Analysis of Kolmogorov Flow and Rayleigh–Bénard Convection Using Persistent Homology (2016)

    Miroslav Kramár, Rachel Levanger, Jeffrey Tithof, Balachandra Suri, Mu Xu, Mark Paul, Michael F. Schatz, Konstantin Mischaikow
    Abstract We use persistent homology to build a quantitative understanding of large complex systems that are driven far-from-equilibrium. In particular, we analyze image time series of flow field patterns from numerical simulations of two important problems in fluid dynamics: Kolmogorov flow and Rayleigh–Bénard convection. For each image we compute a persistence diagram to yield a reduced description of the flow field; by applying different metrics to the space of persistence diagrams, we relate characteristic features in persistence diagrams to the geometry of the corresponding flow patterns. We also examine the dynamics of the flow patterns by a second application of persistent homology to the time series of persistence diagrams. We demonstrate that persistent homology provides an effective method both for quotienting out symmetries in families of solutions and for identifying multiscale recurrent dynamics. Our approach is quite general and it is anticipated to be applicable to a broad range of open problems exhibiting complex spatio-temporal behavior.
  47. Improving Health Care Management Through Persistent Homology of Time-Varying Variability of Emergency Department Patient Flow (2018)

    Mael Dugast, Guillaume Bouleux, Olivier Mory, Eric Marcon
    Abstract Excessive admissions at the Emergency Department (ED) is a phenomenon very closely linked to the propagation of viruses. It is a cause of overcrowding for EDs and a public health problem. The aim of this work is to give EDs’ leaders more time for decision making during this period. Based on the admissions time series associated with specific clinical diagnoses, we will first perform a Detrended Fluctuation Analysis (DFA) to obtain the corresponding variability time series. Next, we will embed this time series on a manifold to obtain a point cloud representation and use Topological Data Analysis (TDA) through persistent homology technic to propose two early realtime indicators. One is the early indicator of abnormal arrivals at the ED whereas the second gives the information on the time index of the maximum number of arrivals. The performance of the detectors is parameter dependent and it can evolve each year. That is why we also propose to solve a bi-objective optimization problem to track the variations of this parameter.
  48. Rule Generation for Classifying SLT Failed Parts (2022)

    Ho-Chieh Hsu, Cheng-Che Lu, Shih-Wei Wang, Kelly Jones, Kai-Chiang Wu, Mango C.-T. Chao
    Abstract System-level test (SLT) has recently gained visibility when integrated circuits become harder and harder to be fully tested due to increasing transistor density and circuit design complexity. Albeit SLT is effective for reducing test escapes, little diagnostic information can be obtained for product improvement. In this paper, we propose an unsupervised learning (UL) method to resolve the aforementioned issue by discovering correlative, potentially systematic defects during the SLT phase. Toward this end, HDBSCAN [1] is used for clustering SLT failed devices in a low-dimensional space created by UMAP [2]. Decision trees are subsequently applied to explain the HDBSCAN results based on generating explainable quantitative rules, e.g., inequality constraints, providing domain experts additional information for advanced diagnosis. Experiments on industrial data demonstrate that the proposed methodology can effectively cluster SLT failed devices and then explain the clustering results with a promising accuracy of above 90%. Our methodology is also scalable and fast, requiring two to five orders of magnitude lower runtime than the method presented in [3].
  49. Topology of Viral Evolution (2013)

    Joseph Minhow Chan, Gunnar Carlsson, Raul Rabadan
    Abstract The tree structure is currently the accepted paradigm to represent evolutionary relationships between organisms, species or other taxa. However, horizontal, or reticulate, genomic exchanges are pervasive in nature and confound characterization of phylogenetic trees. Drawing from algebraic topology, we present a unique evolutionary framework that comprehensively captures both clonal and reticulate evolution. We show that whereas clonal evolution can be summarized as a tree, reticulate evolution exhibits nontrivial topology of dimension greater than zero. Our method effectively characterizes clonal evolution, reassortment, and recombination in RNA viruses. Beyond detecting reticulate evolution, we succinctly recapitulate the history of complex genetic exchanges involving more than two parental strains, such as the triple reassortment of H7N9 avian influenza and the formation of circulating HIV-1 recombinants. In addition, we identify recurrent, large-scale patterns of reticulate evolution, including frequent PB2-PB1-PA-NP cosegregation during avian influenza reassortment. Finally, we bound the rate of reticulate events (i.e., 20 reassortments per year in avian influenza). Our method provides an evolutionary perspective that not only captures reticulate events precluding phylogeny, but also indicates the evolutionary scales where phylogenetic inference could be accurate.
  50. Inferring COVID-19 Biological Pathways From Clinical Phenotypes via Topological Analysis (2021)

    Negin Karisani, Daniel E. Platt, Saugata Basu, Laxmi Parida
    Abstract COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this process in a timely manner. However, medical records are often unstructured clinical notes, and this poses significant challenges to developing the automated systems. In this article, we propose a pipeline to aid practitioners in analyzing clinical notes and revealing the pathways associated with this disease. Our pipeline relies on topological properties and consists of three steps: 1) pre-processing the clinical notes to extract the salient concepts, 2) constructing a feature space of the patients to characterize the extracted concepts, and finally, 3) leveraging the topological properties to distill the available knowledge and visualize the result. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.
  51. Topological Persistence for Relating Microstructure and Capillary Fluid Trapping in Sandstones (2019)

    A. L. Herring, V. Robins, A. P. Sheppard
    Abstract Results from a series of two-phase fluid flow experiments in Leopard, Berea, and Bentheimer sandstones are presented. Fluid configurations are characterized using laboratory-based and synchrotron based 3-D X-ray computed tomography. All flow experiments are conducted under capillary-dominated conditions. We conduct geometry-topology analysis via persistent homology and compare this to standard topological and watershed-partition-based pore-network statistics. Metrics identified as predictors of nonwetting fluid trapping are calculated from the different analytical methods and are compared to levels of trapping measured during drainage-imbibition cycles in the experiments. Metrics calculated from pore networks (i.e., pore body-throat aspect ratio and coordination number) and topological analysis (Euler characteristic) do not correlate well with trapping in these samples. In contrast, a new metric derived from the persistent homology analysis, which incorporates counts of topological features as well as their length scale and spatial distribution, correlates very well (R2 = 0.97) to trapping for all systems. This correlation encompasses a wide range of porous media and initial fluid configurations, and also applies to data sets of different imaging and image processing protocols.
  52. Revealing Key Structural Features Hidden in Liquids and Glasses (2019)

    Hajime Tanaka, Hua Tong, Rui Shi, John Russo
    Abstract A great success of solid state physics comes from the characterization of crystal structures in the reciprocal (wave vector) space. The power of structural characterization in Fourier space originates from the breakdown of translational and rotational symmetries. However, unlike crystals, liquids and amorphous solids possess continuous translational and rotational symmetries on a macroscopic scale, which makes Fourier space analysis much less effective. Lately, several studies have revealed local breakdown of translational and rotational symmetries even for liquids and glasses. Here, we review several mathematical methods used to characterize local structural features of apparently disordered liquids and glasses in real space. We distinguish two types of local ordering in liquids and glasses: energy-driven and entropy-driven. The former, which is favoured energetically by symmetry-selective directional bonding, is responsible for anomalous behaviours commonly observed in water-type liquids such as water, silicon, germanium and silica. The latter, which is often favoured entropically, shows connections with the heterogeneous, slow dynamics found in hard-sphere-like glass-forming liquids. We also discuss the relationship between such local ordering and crystalline structures and its impact on glass-forming ability.
  53. Persistent Homology Analysis of Brain Transcriptome Data in Autism (2019)

    Daniel Shnier, Mircea A. Voineagu, Irina Voineagu
    Abstract Persistent homology methods have found applications in the analysis of multiple types of biological data, particularly imaging data or data with a spatial and/or temporal component. However, few studies have assessed the use of persistent homology for the analysis of gene expression data. Here we apply persistent homology methods to investigate the global properties of gene expression in post-mortem brain tissue (cerebral cortex) of individuals with autism spectrum disorders (ASD) and matched controls. We observe a significant difference in the geometry of inter-sample relationships between autism and healthy controls as measured by the sum of the death times of zero-dimensional components and the Euler characteristic. This observation is replicated across two distinct datasets, and we interpret it as evidence for an increased heterogeneity of gene expression in autism. We also assessed the topology of gene-level point clouds and did not observe significant differences between ASD and control transcriptomes, suggesting that the overall transcriptome organization is similar in ASD and healthy cerebral cortex. Overall, our study provides a novel framework for persistent homology analyses of gene expression data for genetically complex disorders.
  54. Morphometrics Reveals Complex and Heritable Apple Leaf Shapes (2018)

    Zoë Migicovsky, Mao Li, Daniel H. Chitwood, Sean Myles
    Abstract Apple (Malus spp.) is a widely grown and valuable fruit crop. Leaf shape is important for flowering in apple and may also be an early indicator for other agriculturally valuable traits. We examined 9,000 leaves from 869 unique apple accessions using linear measurements and comprehensive morphometric techniques. We identified allometric variation as the result of differing length-to-width aspect ratios between accessions and species of apple. The allometric variation was due to variation in the width of the leaf blade, not the length. Aspect ratio was highly correlated with the first principal component (PC1) of morphometric variation quantified using elliptical Fourier descriptors (EFDs) and persistent homology (PH). While the primary source of variation was aspect ratio, subsequent PCs corresponded to complex shape variation not captured by linear measurements. After linking the morphometric information with over 122,000 genome-wide single nucleotide polymorphisms (SNPs), we found high SNP heritability values even at later PCs, indicating that comprehensive morphometrics can capture complex, heritable phenotypes. Thus, techniques such as EFDs and PH are capturing heritable biological variation that would be missed using linear measurements alone.
  55. Uncovering the Topology of Time-Varying fMRI Data Using Cubical Persistence (2020)

    Bastian Rieck, Tristan Yates, Christian Bock, Karsten Borgwardt, Guy Wolf, Nicholas Turk-Browne, Smita Krishnaswamy
    Abstract Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and person-to-person variation in how information is represented in the brain. To address this challenge, we present a novel topological approach that encodes each time point in an fMRI data set as a persistence diagram of topological features, i.e. high-dimensional voids present in the data. This representation naturally does not rely on voxel-by-voxel correspondence and is robust to noise. We show that these time-varying persistence diagrams can be clustered to find meaningful groupings between participants, and that they are also useful in studying within-subject brain state trajectories of subjects performing a particular task. Here, we apply both clustering and trajectory analysis techniques to a group of participants watching the movie 'Partly Cloudy'. We observe significant differences in both brain state trajectories and overall topological activity between adults and children watching the same movie.
  56. Molecular Phenotyping Using Networks, Diffusion, and Topology: Soft Tissue Sarcoma (2019)

    James C. Mathews, Maryam Pouryahya, Caroline Moosmüller, Yannis G. Kevrekidis, Joseph O. Deasy, Allen Tannenbaum
    Abstract Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
  57. When Remote Sensing Meets Topological Data Analysis (2018)

    Ludovic Duponchel
    Abstract Author Summary: Hyperspectral remote sensing plays an increasingly important role in many scientific domains and everyday life problems. Indeed, this imaging concept ends up in applications as varied as catching tax-evaders red-handed by locating new construction and building alterations, searching for aircraft and saving lives after fatal crashes, detecting oil spills for marine life and environmental preservation, spying on enemies with reconnaissance satellites, watching algae grow as an indicator of environmental health, forecasting weather to warn about natural disasters and much more. From an instrumental point of view, we can say that the actual spectrometers have rather good characteristics, even if we can always increase spatial resolution and spectral range. In order to extract ever more information from such experiments and develop new applications, we must, therefore, propose multivariate data analysis tools able to capture the shape of data sets and their specific features. Nevertheless, actual methods often impose a data model which implicitly defines the geometry of the data set. The aim of the paper is thus to introduce the concept of topological data analysis in the framework of remote sensing, making no assumptions about the global shape of the data set, but also allowing the capture of its local features.
  58. Musical Stylistic Analysis: A Study of Intervallic Transition Graphs via Persistent Homology (2022)

    Martín Mijangos, Alessandro Bravetti, Pablo Padilla
    Abstract Topological data analysis has been recently applied to investigate stylistic signatures and trends in musical compositions. A useful tool in this area is Persistent Homology. In this paper, we develop a novel method to represent a weighted directed graph as a finite metric space and then use persistent homology to extract useful features. We apply this method to weighted directed graphs obtained from pitch transitions information of a given musical fragment and use these techniques to the study of stylistic trends. In particular, we are interested in using these tools to make quantitative stylistic comparisons. As a first illustration, we analyze a selection of string quartets by Haydn, Mozart and Beethoven and discuss possible implications of our results in terms of different approaches by these composers to stylistic exploration and variety. We observe that Haydn is stylistically the most conservative, followed by Mozart, while Beethoven is the most innovative, expanding and modifying the string quartet as a musical form. Finally we also compare the variability of different genres, namely minuets, allegros, prestos and adagios, by a given composer and conclude that the minuet is the most stable form of the string quartet movements.
  59. Chatter Diagnosis in Milling Using Supervised Learning and Topological Features Vector (2019)

    Melih C. Yesilli, Sarah Tymochko, Firas A. Khasawneh, Elizabeth Munch
    Abstract Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning algorithms to diagnose chatter in the milling process. Persistence diagrams, a method of representing topological features, are not easily used in the context of machine learning, so they must be transformed into a form that is more amenable. Specifically, we will focus on two different methods for featurizing persistence diagrams, Carlsson coordinates and template functions. In this paper, we provide classification results for simulated data from various cutting configurations, including upmilling and downmilling, in addition to the same data with some added noise. Our results show that Carlsson Coordinates and Template Functions yield accuracies as high as 96% and 95%, respectively. We also provide evidence that these topological methods are noise robust descriptors for chatter detection.
  60. Persistent Homology in Cosmic Shear - II. A Tomographic Analysis of DES-Y1 (2022)

    Sven Heydenreich, Benjamin Brück, Pierre Burger, Joachim Harnois-Déraps, Sandra Unruh, Tiago Castro, Klaus Dolag, Nicolas Martinet
    Abstract We demonstrate how to use persistent homology for cosmological parameter inference in a tomographic cosmic shear survey. We obtain the first cosmological parameter constraints from persistent homology by applying our method to the first-year data of the Dark Energy Survey. To obtain these constraints, we analyse the topological structure of the matter distribution by extracting persistence diagrams from signal-to-noise maps of aperture masses. This presents a natural extension to the widely used peak count statistics. Extracting the persistence diagrams from the cosmo-SLICS, a suite of \textlessi\textgreaterN\textlessi/\textgreater-body simulations with variable cosmological parameters, we interpolate the signal using Gaussian processes and marginalise over the most relevant systematic effects, including intrinsic alignments and baryonic effects. For the structure growth parameter, we find , which is in full agreement with other late-time probes. We also constrain the intrinsic alignment parameter to \textlessi\textgreaterA\textlessi/\textgreater = 1.54 ± 0.52, which constitutes a detection of the intrinsic alignment effect at almost 3\textlessi\textgreaterσ\textlessi/\textgreater.
  61. A Sheaf and Topology Approach to Generating Local Branch Numbers in Digital Images (2020)

    Chuan-Shen Hu, Yu-Min Chung
    Abstract This paper concerns a theoretical approach that combines topological data analysis (TDA) and sheaf theory. Topological data analysis, a rising field in mathematics and computer science, concerns the shape of the data and has been proven effective in many scientific disciplines. Sheaf theory, a mathematics subject in algebraic geometry, provides a framework for describing the local consistency in geometric objects. Persistent homology (PH) is one of the main driving forces in TDA, and the idea is to track changes of geometric objects at different scales. The persistence diagram (PD) summarizes the information of PH in the form of a multi-set. While PD provides useful information about the underlying objects, it lacks fine relations about the local consistency of specific pairs of generators in PD, such as the merging relation between two connected components in the PH. The sheaf structure provides a novel point of view for describing the merging relation of local objects in PH. It is the goal of this paper to establish a theoretic framework that utilizes the sheaf theory to uncover finer information from the PH. We also show that the proposed theory can be applied to identify the branch numbers of local objects in digital images.
  62. Feature Detection and Hypothesis Testing for Extremely Noisy Nanoparticle Images Using Topological Data Analysis (2023)

    Andrew M. Thomas, Peter A. Crozier, Yuchen Xu, David S. Matteson
    Abstract We propose a flexible algorithm for feature detection and hypothesis testing in images with ultra-low signal-to-noise ratio using cubical persistent homology. Our main application is in the identification of atomic columns and other features in Transmission Electron Microscopy (TEM). Cubical persistent homology is used to identify local minima and their size in subregions in the frames of nanoparticle videos, which are hypothesized to correspond to relevant atomic features. We compare the performance of our algorithm to other employed methods for the detection of columns and their intensity. Additionally, Monte Carlo goodness-of-fit testing using real-valued summaries of persistence diagrams derived from smoothed images (generated from pixels residing in the vacuum region of an image) is developed and employed to identify whether or not the proposed atomic features generated by our algorithm are due to noise. Using these summaries derived from the generated persistence diagrams, one can produce univariate time series for the nanoparticle videos, thus, providing a means for assessing fluxional behavior. A guarantee on the false discovery rate for multiple Monte Carlo testing of identical hypotheses is also established.

    Community Resources

  63. Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

    Arjuna P. H. Don, James F. Peters
    Abstract This article introduces an application of Ghrist barcodes in the study of persistent Betti numbers derived from vortex nerve complexes found in triangulations of video frames. A Ghrist barcode (also called a persistence barcode) is a topology of data pic- tograph useful in representing the persistence of the features of changing shapes. The basic approach is to introduce a free Abelian group representation of intersecting filled polygons on the barycenters of the triangles of Alexandroff nerves. An Alexandroff nerve is a maximal collection of triangles of a common vertex in the triangulation of a finite, bounded planar region. In our case, the planar region is a video frame. A Betti number is a count of the number of generators is a finite Abelian group. The focus here is on the persistent Betti numbers across sequences of triangulated video frames. Each Betti number is mapped to an entry in a Ghrist barcode. Two main results are given, namely, vortex nerves are Edelsbrunner-Harer nerve complexes and the Betti number of a vortex nerve equals k + 2 for a vortex nerve containing k edges attached between a pair of vortex cycles in the nerve.
  64. Vibration Sensors for Detecting Critical Events: A Case Study in Ferrosilicon Production (2024)

    Maryna Waszak, Terje Moen, Anders H. Hansen, Grégory Bouquet, Antoine Pultier, Xiang Ma, Dumitru Roman
    Abstract The mining and metal processing industries are undergoing a transformation through digitization, with sensors and data analysis playing a crucial role in modernization and increased efficiency. Vibration sensors are particularly important in monitoring production infrastructure in metal processing plants. This paper presents the installation of vibration sensors in an actual industrial environment and the results of spectral vibration data analysis. The study demonstrates that vibration sensors can be installed in challenging environments such as metal processing plants and that analyzing vibration patterns can provide valuable insights into predicting machine failures and different machine states. By utilizing dimensionality reduction and dominant frequency observation, we analyzed vibration data and identified patterns that are indicative of potential machine states and critical events that reduce production throughput. This information can be used to improve maintenance, minimize downtime, and ultimately enhance the production process’s overall efficiency. This study highlights the importance of digitization and data analysis in the mining and metal processing industries, particularly the capability not only to predict critical events before they impact production throughput and take action accordingly but also to identify machine states for legacy equipment and be part of retrofitting strategies.
  65. Data-Driven and Automatic Surface Texture Analysis Using Persistent Homology (2021)

    Melih C. Yesilli, Firas A. Khasawneh
    Abstract Surface roughness plays an important role in analyzing engineering surfaces. It quantifies the surface topography and can be used to determine whether the resulting surface finish is acceptable or not. Nevertheless, while several existing tools and standards are available for computing surface roughness, these methods rely heavily on user input thus slowing down the analysis and increasing manufacturing costs. Therefore, fast and automatic determination of the roughness level is essential to avoid costs resulting from surfaces with unacceptable finish, and user-intensive analysis. In this study, we propose a Topological Data Analysis (TDA) based approach to classify the roughness level of synthetic surfaces using both their areal images and profiles. We utilize persistent homology from TDA to generate persistence diagrams that encapsulate information on the shape of the surface. We then obtain feature matrices for each surface or profile using Carlsson coordinates, persistence images, and template functions. We compare our results to two widely used methods in the literature: Fast Fourier Transform (FFT) and Gaussian filtering. The results show that our approach yields mean accuracies as high as 97%. We also show that, in contrast to existing surface analysis tools, our TDA-based approach is fully automatable and provides adaptive feature extraction.
  66. Persistent Voids: A New Structural Metric for Membrane Fusion (2007)

    Peter M. Kasson, Afra Zomorodian, Sanghyun Park, Nina Singhal, Leonidas J. Guibas, Vijay S. Pande
    Abstract Motivation: Membrane fusion constitutes a key stage in cellular processes such as synaptic neurotransmission and infection by enveloped viruses. Current experimental assays for fusion have thus far been unable to resolve early fusion events in fine structural detail. We have previously used molecular dynamics simulations to develop mechanistic models of fusion by small lipid vesicles. Here, we introduce a novel structural measurement of vesicle topology and fusion geometry: persistent voids.Results: Persistent voids calculations enable systematic measurement of structural changes in vesicle fusion by assessing fusion stalk widths. They also constitute a generally applicable technique for assessing lipid topological change. We use persistent voids to compute dynamic relationships between hemifusion neck widening and formation of a full fusion pore in our simulation data. We predict that a tightly coordinated process of hemifusion neck expansion and pore formation is responsible for the rapid vesicle fusion mechanism, while isolated enlargement of the hemifusion diaphragm leads to the formation of a metastable hemifused intermediate. These findings suggest that rapid fusion between small vesicles proceeds via a small hemifusion diaphragm rather than a fully expanded one.Availability: Software available upon request pending public release.Contact:kasson@cmgm.stanford-edu or pande@stanford.eduSupplementary information: Supplementary data are available on Bioinformatics online.
  67. Optimizing Porosity Detection in Wire Laser Metal Deposition Processes Through Data-Driven AI Classification Techniques (2023)

    Meritxell Gomez-Omella, Jon Flores, Basilio Sierra, Susana Ferreiro, Nicolas Hascoët, Francisco Chinesta
    Abstract Additive manufacturing (AM) is an attractive solution for many companies that produce geometrically complex parts. This process consists of depositing material layer by layer following a sliced CAD geometry. It brings several benefits to manufacturing capabilities, such as design freedom, reduced material waste, and short-run customization. However, one of the current challenges faced by users of the process, mainly in wire laser metal deposition (wLMD), is to avoid defects in the manufactured part, especially the porosity. This defect is caused by extreme conditions and metallurgical transformations of the process. And not only does it directly affect the mechanical performance of the parts, especially the fatigue properties, but it also means an increase in costs due to the inspection tasks to which the manufactured parts must be subjected. This work compares three operational solution approaches, product-centric, based on signal-based feature extraction and Topological Data Analysis together with statistical and Machine Learning (ML) techniques, for the early detection and prediction of porosity failure in a wLMD process. The different forecasting and validation strategies demonstrate the variety of conclusions that can be drawn with different objectives in the analysis of the monitored data in AM problems.
  68. Weighted-Persistent-Homology-Based Machine Learning for RNA Flexibility Analysis (2020)

    Chi Seng Pun, Brandon Yung Sin Yong, Kelin Xia
    Abstract With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.
  69. Hierarchical Structures of Amorphous Solids Characterized by Persistent Homology (2016)

    Yasuaki Hiraoka, Takenobu Nakamura, Akihiko Hirata, Emerson G. Escolar, Kaname Matsue, Yasumasa Nishiura
    Abstract This article proposes a topological method that extracts hierarchical structures of various amorphous solids. The method is based on the persistence diagram (PD), a mathematical tool for capturing shapes of multiscale data. The input to the PDs is given by an atomic configuration and the output is expressed as 2D histograms. Then, specific distributions such as curves and islands in the PDs identify meaningful shape characteristics of the atomic configuration. Although the method can be applied to a wide variety of disordered systems, it is applied here to silica glass, the Lennard-Jones system, and Cu-Zr metallic glass as standard examples of continuous random network and random packing structures. In silica glass, the method classified the atomic rings as short-range and medium-range orders and unveiled hierarchical ring structures among them. These detailed geometric characterizations clarified a real space origin of the first sharp diffraction peak and also indicated that PDs contain information on elastic response. Even in the Lennard-Jones system and Cu-Zr metallic glass, the hierarchical structures in the atomic configurations were derived in a similar way using PDs, although the glass structures and properties substantially differ from silica glass. These results suggest that the PDs provide a unified method that extracts greater depth of geometric information in amorphous solids than conventional methods.
  70. Exploring Surface Texture Quantification in Piezo Vibration Striking Treatment (PVST) Using Topological Measures (2022)

    Melih C. Yesilli, Max M. Chumley, Jisheng Chen, Firas A. Khasawneh, Yang Guo
    Abstract Abstract. Surface texture influences wear and tribological properties of manufactured parts, and it plays a critical role in end-user products. Therefore, quantifying the order or structure of a manufactured surface provides important information on the quality and life expectancy of the product. Although texture can be intentionally introduced to enhance aesthetics or to satisfy a design function, sometimes it is an inevitable byproduct of surface treatment processes such as Piezo Vibration Striking Treatment (PVST). Measures of order for surfaces have been characterized using statistical, spectral, and geometric approaches. For nearly hexagonal lattices, topological tools have also been used to measure the surface order. This paper explores utilizing tools from Topological Data Analysis for measuring surface texture. We compute measures of order based on optical digital microscope images of surfaces treated using PVST. These measures are applied to the grid obtained from estimating the centers of tool impacts, and they quantify the grid’s deviations from the nominal one. Our results show that TDA provides a convenient framework for characterization of pattern type that bypasses some limitations of existing tools such as difficult manual processing of the data and the need for an expert user to analyze and interpret the surface images.
  71. Representations of Energy Landscapes by Sublevelset Persistent Homology: An Example With N-Alkanes (2020)

    Joshua Mirth, Yanqin Zhai, Johnathan Bush, Enrique G. Alvarado, Howie Jordan, Mark Heim, Bala Krishnamoorthy, Markus Pflaum, Aurora Clark, Y. Z, Henry Adams
    Abstract Encoding the complex features of an energy landscape is a challenging task, and often chemists pursue the most salient features (minima and barriers) along a highly reduced space, i.e. 2- or 3-dimensions. Even though disconnectivity graphs or merge trees summarize the connectivity of the local minima of an energy landscape via the lowest-barrier pathways, there is more information to be gained by also considering the topology of each connected component at different energy thresholds (or sublevelsets). We propose sublevelset persistent homology as an appropriate tool for this purpose. Our computations on the configuration phase space of n-alkanes from butane to octane allow us to conjecture, and then prove, a complete characterization of the sublevelset persistent homology of the alkane \$C_m H_\2m+2\\$ potential energy landscapes, for all \$m\$, and in all homological dimensions. We further compare both the analytical configurational potential energy landscapes and sampled data from molecular dynamics simulation, using the united and all-atom descriptions of the intramolecular interactions. In turn, this supports the application of distance metrics to quantify sampling fidelity and lays the foundation for future work regarding new metrics that quantify differences between the topological features of high-dimensional energy landscapes.
  72. Hepatic Tumor Classification Using Texture and Topology Analysis of Non-Contrast-Enhanced Three-Dimensional T1-Weighted MR Images With a Radiomics Approach (2019)

    Asuka Oyama, Yasuaki Hiraoka, Ippei Obayashi, Yusuke Saikawa, Shigeru Furui, Kenshiro Shiraishi, Shinobu Kumagai, Tatsuya Hayashi, Jun’ichi Kotoku
    Abstract The purpose of this study is to evaluate the accuracy for classification of hepatic tumors by characterization of T1-weighted magnetic resonance (MR) images using two radiomics approaches with machine learning models: texture analysis and topological data analysis using persistent homology. This study assessed non-contrast-enhanced fat-suppressed three-dimensional (3D) T1-weighted images of 150 hepatic tumors. The lesions included 50 hepatocellular carcinomas (HCCs), 50 metastatic tumors (MTs), and 50 hepatic hemangiomas (HHs) found respectively in 37, 23, and 33 patients. For classification, texture features were calculated, and also persistence images of three types (degree 0, degree 1 and degree 2) were obtained for each lesion from the 3D MR imaging data. We used three classification models. In the classification of HCC and MT (resp. HCC and HH, HH and MT), we obtained accuracy of 92% (resp. 90%, 73%) by texture analysis, and the highest accuracy of 85% (resp. 84%, 74%) when degree 1 (resp. degree 1, degree 2) persistence images were used. Our methods using texture analysis or topological data analysis allow for classification of the three hepatic tumors with considerable accuracy, and thus might be useful when applied for computer-aided diagnosis with MR images.
  73. Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)

    Kelin Xia, Zhixiong Zhao, Guo-Wei Wei
    Abstract Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.
  74. WDR76 Co-Localizes With Heterochromatin Related Proteins and Rapidly Responds to DNA Damage (2016)

    Joshua M. Gilmore, Mihaela E. Sardiu, Brad D. Groppe, Janet L. Thornton, Xingyu Liu, Gerald Dayebgadoh, Charles A. Banks, Brian D. Slaughter, Jay R. Unruh, Jerry L. Workman, Laurence Florens, Michael P. Washburn
    Abstract Proteins that respond to DNA damage play critical roles in normal and diseased states in human biology. Studies have suggested that the S. cerevisiae protein CMR1/YDL156w is associated with histones and is possibly associated with DNA repair and replication processes. Through a quantitative proteomic analysis of affinity purifications here we show that the human homologue of this protein, WDR76, shares multiple protein associations with the histones H2A, H2B, and H4. Furthermore, our quantitative proteomic analysis of WDR76 associated proteins demonstrated links to proteins in the DNA damage response like PARP1 and XRCC5 and heterochromatin related proteins like CBX1, CBX3, and CBX5. Co-immunoprecipitation studies validated these interactions. Next, quantitative imaging studies demonstrated that WDR76 was recruited to laser induced DNA damage immediately after induction, and we compared the recruitment of WDR76 to laser induced DNA damage to known DNA damage proteins like PARP1, XRCC5, and RPA1. In addition, WDR76 co-localizes to puncta with the heterochromatin proteins CBX1 and CBX5, which are also recruited to DNA damage but much less intensely than WDR76. This work demonstrates the chromatin and DNA damage protein associations of WDR76 and demonstrates the rapid response of WDR76 to laser induced DNA damage.
  75. Rootstock Effects on Scion Phenotypes in a ‘Chambourcin’ Experimental Vineyard (2019)

    Zoë Migicovsky, Zachary N Harris, Laura L Klein, Mao Li, Adam McDermaid, Daniel H Chitwood, Anne Fennell, Laszlo G Kovacs, Misha Kwasniewski, Jason P Londo, Qin Ma, Allison J Miller
    Abstract Understanding how root systems modulate shoot system phenotypes is a fundamental question in plant biology and will be useful in developing resilient agricultural crops. Grafting is a common horticultural practice that joins the roots (rootstock) of one plant to the shoot (scion) of another, providing an excellent method for investigating how these two organ systems affect each other. In this study, we used the French-American hybrid grapevine ‘Chambourcin’ (Vitis L.) as a model to explore the rootstock–scion relationship. We examined leaf shape, ion concentrations, and gene expression in ‘Chambourcin’ grown ungrafted as well as grafted to three different rootstocks (‘SO4’, ‘1103P’ and ‘3309C’) across 2 years and three different irrigation treatments. We found that a significant amount of the variation in leaf shape could be explained by the interaction between rootstock and irrigation. For ion concentrations, the primary source of variation identified was the position of a leaf in a shoot, although rootstock and rootstock by irrigation interaction also explained a significant amount of variation for most ions. Lastly, we found rootstock-specific patterns of gene expression in grafted plants when compared to ungrafted vines. Thus, our work reveals the subtle and complex effect of grafting on ‘Chambourcin’ leaf morphology, ionomics, and gene expression.
  76. Cosmic Web Reconstruction Through Density Ridges: Method and Algorithm (2015)

    Yen-Chi Chen, Shirley Ho, Peter E. Freeman, Christopher R. Genovese, Larry Wasserman
    Abstract The detection and characterization of filamentary structures in the cosmic web allows cosmologists to constrain parameters that dictate the evolution of the Universe. While many filament estimators have been proposed, they generally lack estimates of uncertainty, reducing their inferential power. In this paper, we demonstrate how one may apply the subspace constrained mean shift (SCMS) algorithm (Ozertem & Erdogmus 2011; Genovese et al. 2014) to uncover filamentary structure in galaxydata. The SCMS algorithm is a gradient ascent method that models filaments as density ridges, one-dimensional smooth curves that trace high-density regions within the point cloud. We also demonstrate how augmenting the SCMS algorithm with bootstrap-based methods of uncertainty estimation allows one to place uncertainty bands around putative filaments. We apply the SCMS first to the data set generated from the Voronoi model. The density ridges show strong agreement with the filaments from Voronoi method. We then apply the SCMS method data sets sampled from a P3M N-body simulation, with galaxy number densities consistent with SDSS and WFIRST-AFTA, and to LOWZ and CMASS data from the Baryon Oscillation Spectroscopic Survey (BOSS). To further assess the efficacy of SCMS, we compare the relative locations of BOSS filaments with galaxy clusters in the redMaPPer catalogue, and find that redMaPPer clusters are significantly closer (with p-values \textless10−9) to SCMS-detected filaments than to randomly selected galaxies.
  77. Persistent Homology of Time-Dependent Functional Networks Constructed From Coupled Time Series (2017)

    Bernadette J. Stolz, Heather A. Harrington, Mason A. Porter
    Abstract We use topological data analysis to study “functional networks” that we construct from time-series data from both experimental and synthetic sources. We use persistent homology with a weight rank clique filtration to gain insights into these functional networks, and we use persistence landscapes to interpret our results. Our first example uses time-series output from networks of coupled Kuramoto oscillators. Our second example consists of biological data in the form of functional magnetic resonance imaging data that were acquired from human subjects during a simple motor-learning task in which subjects were monitored for three days during a five-day period. With these examples, we demonstrate that (1) using persistent homology to study functional networks provides fascinating insights into their properties and (2) the position of the features in a filtration can sometimes play a more vital role than persistence in the interpretation of topological features, even though conventionally the latter is used to distinguish between signal and noise. We find that persistent homology can detect differences in synchronization patterns in our data sets over time, giving insight both on changes in community structure in the networks and on increased synchronization between brain regions that form loops in a functional network during motor learning. For the motor-learning data, persistence landscapes also reveal that on average the majority of changes in the network loops take place on the second of the three days of the learning process.
  78. Prediction in Cancer Genomics Using Topological Signatures and Machine Learning (2020)

    Georgina Gonzalez, Arina Ushakova, Radmila Sazdanovic, Javier Arsuaga
    Abstract Copy Number Aberrations, gains and losses of genomic regions, are a hallmark of cancer and can be experimentally detected using microarray comparative genomic hybridization (aCGH). In previous works, we developed a topology based method to analyze aCGH data whose output are regions of the genome where copy number is altered in patients with a predetermined cancer phenotype. We call this method Topological Analysis of array CGH (TAaCGH). Here we combine TAaCGH with machine learning techniques to build classifiers using copy number aberrations. We chose logistic regression on two different binary phenotypes related to breast cancer to illustrate this approach. The first case consists of patients with over-expression of the ERBB2 gene. Over-expression of ERBB2 is commonly regulated by a copy number gain in chromosome arm 17q. TAaCGH found the region 17q11-q22 associated with the phenotype and using logistic regression we reduced this region to 17q12-q21.31 correctly classifying 78% of the ERBB2 positive individuals (sensitivity) in a validation data set. We also analyzed over-expression in Estrogen Receptor (ER), a second phenotype commonly observed in breast cancer patients and found that the region 5p14.3-12 together with six full arms were associated with the phenotype. Our method identified 4p, 6p and 16q as the strongest predictors correctly classifying 76% of ER positives in our validation data set. However, for this set there was a significant increase in the false positive rate (specificity). We suggest that topological and machine learning methods can be combined for prediction of phenotypes using genetic data.
  79. Topological Data Analysis: A Promising Big Data Exploration Tool in Biology, Analytical Chemistry and Physical Chemistry (2016)

    Marc Offroy, Ludovic Duponchel
    Abstract An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data).
  80. Signal Enrichment With Strain-Level Resolution in Metagenomes Using Topological Data Analysis (2019)

    Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida
    Abstract Background A metagenome is a collection of genomes, usually in a micro-environment, and sequencing a metagenomic sample en masse is a powerful means for investigating the community of the constituent microorganisms. One of the challenges is in distinguishing between similar organisms due to rampant multiple possible assignments of sequencing reads, resulting in false positive identifications. We map the problem to a topological data analysis (TDA) framework that extracts information from the geometric structure of data. Here the structure is defined by multi-way relationships between the sequencing reads using a reference database. Results Based primarily on the patterns of co-mapping of the reads to multiple organisms in the reference database, we use two models: one a subcomplex of a Barycentric subdivision complex and the other a Čech complex. The Barycentric subcomplex allows a natural mapping of the reads along with their coverage of organisms while the Čech complex takes simply the number of reads into account to map the problem to homology computation. Using simulated genome mixtures we show not just enrichment of signal but also microbe identification with strain-level resolution. Conclusions In particular, in the most refractory of cases where alternative algorithms that exploit unique reads (i.e., mapped to unique organisms) fail, we show that the TDA approach continues to show consistent performance. The Čech model that uses less information is equally effective, suggesting that even partial information when augmented with the appropriate structure is quite powerful.
  81. A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)

    Xiaoyu Zhang, Takanori Fujiwara, Senthil Chandrasegaran, Michael P. Brundage, Thurston Sexton, Alden Dima, Kwan-Liu Ma
    Abstract Analysis of large, high-dimensional, and heterogeneous datasets is challenging as no one technique is suitable for visualizing and clustering such data in order to make sense of the underlying information. For instance, heterogeneous logs detailing machine repair and maintenance in an organization often need to be analyzed to diagnose errors and identify abnormal patterns, formalize root-cause analyses, and plan preventive maintenance. Such real-world datasets are also beset by issues such as inconsistent and/or missing entries. To conduct an effective diagnosis, it is important to extract and understand patterns from the data with support from analytic algorithms (e.g., finding that certain kinds of machine complaints occur more in the summer) while involving the human-in-the-loop. To address these challenges, we adopt existing techniques for dimensionality reduction (DR) and clustering of numerical, categorical, and text data dimensions, and introduce a visual analytics approach that uses multiple coordinated views to connect DR + clustering results across each kind of the data dimension stated. To help analysts label the clusters, each clustering view is supplemented with techniques and visualizations that contrast a cluster of interest with the rest of the dataset. Our approach assists analysts to make sense of machine maintenance logs and their errors. Then the gained insights help them carry out preventive maintenance. We illustrate and evaluate our approach through use cases and expert studies respectively, and discuss generalization of the approach to other heterogeneous data.
  82. Learning Representations of Persistence Barcodes (2019)

    Christoph D. Hofer, Roland Kwitt, Marc Niethammer
    Abstract We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals.
  83. A Data-Driven Workflow for Evaporation Performance Degradation Analysis: A Full-Scale Case Study in the Herbal Medicine Manufacturing Industry (2023)

    Sheng Zhang, Xinyuan Xie, Haibin Qu
    Abstract The evaporation process is a common step in herbal medicine manufacturing and often lasts for a long time. The degradation of evaporation performance is inevitable, leading to more consumption of steam and electricity, and it may also have an impact on the content of thermosensitive components. Recently, a vast amount of evaporation process data is collected with the aid of industrial information systems, and process knowledge is hidden behind the data. But currently, these data are seldom deeply analyzed. In this work, an exploratory data analysis workflow is proposed to evaluate the evaporation performance and to identify the root causes of the performance degradation. The workflow consists of 6 steps: data collecting, preprocessing, characteristic stage identification, feature extraction, model development and interpretation, and decision making. In the model development and interpretation step, the workflow employs the HDBSCAN clustering algorithm for data annotation and then uses the ccPCA method to compare the differences between clusters for root cause analysis. A full-scale case is presented to verify the effectiveness of the workflow. The evaporation process data of 192 batches in 2018 were collected in the case. Through the steps of the workflow, the features of each batch were extracted, and the batches were clustered into 6 groups. The root causes of the performance degradation were determined as the high Pv,II and high LI by ccPCA. Recommended suggestions for future manufacturing were given according to the results. The proposed workflow can determine the root causes of the evaporation performance degradation.
  84. Persistent Homology Analysis of Osmolyte Molecular Aggregation and Their Hydrogen-Bonding Networks (2019)

    Kelin Xia, D. Vijay Anand, Saxena Shikhar, Yuguang Mu
    Abstract Dramatically different properties have been observed for two types of osmolytes, i.e., trimethylamine N-oxide (TMAO) and urea, in a protein folding process. Great progress has been made in revealing the potential underlying mechanism of these two osmolyte systems. However, many problems still remain unsolved. In this paper, we propose to use the persistent homology to systematically study the osmolytes’ molecular aggregation and their hydrogen-bonding network from a global topological perspective. It has been found that, for the first time, TMAO and urea show two extremely different topological behaviors, i.e., an extensive network and local clusters, respectively. In general, TMAO forms highly consistent large loop or circle structures in high concentrations. In contrast, urea is more tightly aggregated locally. Moreover, the resulting hydrogen-bonding networks also demonstrate distinguishable features. With a concentration increase, TMAO hydrogen-bonding networks vary greatly in their total number of loop structures and large-sized loop structures consistently increase. In contrast, urea hydrogen-bonding networks remain relatively stable with slight reduction of the total loop number. Moreover, the persistent entropy (PE) is, for the first time, used in characterization of the topological information of the aggregation and hydrogen-bonding networks. The average PE systematically increases with the concentration for both TMAO and urea, and decreases in their hydrogen-bonding networks. But their PE variances have totally different behaviors. Finally, topological features of the hydrogen-bonding networks are found to be highly consistent with those from the ion aggregation systems, indicating that our topological invariants can characterize intrinsic features of the “structure making” and “structure breaking” systems.
  85. Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials (2020)

    Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy Morozov
    Abstract Machine learning has emerged as an attractive alternative to experiments and simulations for predicting material properties. Usually, such an approach relies on specific domain knowledge for feature design: each learning target requires careful selection of features that an expert recognizes as important for the specific task. The major drawback of this approach is that computation of only a few structural features has been implemented so far, and it is difficult to tell a priori which features are important for a particular application. The latter problem has been empirically observed for predictors of guest uptake in nanoporous materials: local and global porosity features become dominant descriptors at low and high pressures, respectively. We investigate a feature representation of materials using tools from topological data analysis. Specifically, we use persistent homology to describe the geometry of nanoporous materials at various scales. We combine our topological descriptor with traditional structural features and investigate the relative importance of each to the prediction tasks. We demonstrate an application of this feature representation by predicting methane adsorption in zeolites, for pressures in the range of 1-200 bar. Our results not only show a considerable improvement compared to the baseline, but they also highlight that topological features capture information complementary to the structural features: this is especially important for the adsorption at low pressure, a task particularly difficult for the traditional features. Furthermore, by investigation of the importance of individual topological features in the adsorption model, we are able to pinpoint the location of the pores that correlate best to adsorption at different pressure, contributing to our atom-level understanding of structure-property relationships.
  86. Steinhaus Filtration and Stable Paths in the Mapper (2020)

    Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel Saul
    Abstract Two central concepts from topological data analysis are persistence and the Mapper construction. Persistence employs a sequence of objects built on data called a filtration. A Mapper produces insightful summaries of data, and has found widespread applications in diverse areas. We define a new filtration called the cover filtration built from a single cover based on a generalized Steinhaus distance, which is a generalization of Jaccard distance. We prove a stability result: the cover filtrations of two covers are \$\alpha/m\$ interleaved, where \$\alpha\$ is a bound on bottleneck distance between covers and \$m\$ is the size of smallest set in either cover. We also show our construction is equivalent to the Cech filtration under certain settings, and the Vietoris-Rips filtration completely determines the cover filtration in all cases. We then develop a theory for stable paths within this filtration. Unlike standard results on stability in topological persistence, our definition of path stability aligns exactly with the above result on stability of cover filtration. We demonstrate how our framework can be employed in a variety of applications where a metric is not obvious but a cover is readily available. First we present a new model for recommendation systems using cover filtration. For an explicit example, stable paths identified on a movies data set represent sequences of movies constituting gentle transitions from one genre to another. As a second application in explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations. Stable paths in the Mapper from a supervised machine learning model trained on the FashionMNIST data set provide improved explanations of relationships between subpopulations of images.
  87. Understanding Diffraction Patterns of Glassy, Liquid and Amorphous Materials via Persistent Homology Analyses (2019)

    Yohei Onodera, Shinji Kohara, Shuta Tahara, Atsunobu Masuno, Hiroyuki Inoue, Motoki Shiga, Akihiko Hirata, Koichi Tsuchiya, Yasuaki Hiraoka, Ippei Obayashi, Koji Ohara, Akitoshi Mizuno, Osami Sakata
    Abstract The structure of glassy, liquid, and amorphous materials is still not well understood, due to the insufficient structural information from diffraction data. In this article, attempts are made to understand the origin of diffraction peaks, particularly of the first sharp diffraction peak (FSDP, Q1), the principal peak (PP, Q2), and the third peak (Q3), observed in the measured diffraction patterns of disordered materials whose structure contains tetrahedral motifs. It is confirmed that the FSDP (Q1) is not a signature of the formation of a network, because an FSDP is observed in tetrahedral molecular liquids. It is found that the PP (Q2) reflects orientational correlations of tetrahedra. Q3, that can be observed in all disordered materials, even in common liquid metals, stems from simple pair correlations. Moreover, information on the topology of disordered materials was revealed by utilizing persistent homology analyses. The persistence diagram of silica (SiO2) glass suggests that the shape of rings in the glass is similar not only to those in the crystalline phase with comparable density (α-cristobalite), but also to rings present in crystalline phases with higher density (α-quartz and coesite); this is thought to be the signature of disorder. Furthermore, we have succeeded in revealing the differences, in terms of persistent homology, between tetrahedral networks and tetrahedral molecular liquids, and the difference/similarity between liquid and amorphous (glassy) states. Our series of analyses demonstrated that a combination of diffraction data and persistent homology analyses is a useful tool for allowing us to uncover structural features hidden in halo pattern of disordered materials.
  88. Felix: A Topology Based Framework for Visual Exploration of Cosmic Filaments (2016)

    Nithin Shivshankar, Pratyush Pranav, Vijay Natarajan, Rien van de Weygaert, E. G. Patrick Bos, Steven Rieder
    Abstract The large-scale structure of the universe is comprised of virialized blob-like clusters, linear filaments, sheet-like walls and huge near empty three-dimensional voids. Characterizing the large scale universe is essential to our understanding of the formation and evolution of galaxies. The density range of clusters, walls and voids are relatively well separated, when compared to filaments, which span a relatively larger range. The large scale filamentary network thus forms an intricate part of the cosmic web. In this paper, we describe Felix, a topology based framework for visual exploration of filaments in the cosmic web. The filamentary structure is represented by the ascending manifold geometry of the 2-saddles in the Morse-Smale complex of the density field. We generate a hierarchy of Morse-Smale complexes and query for filaments based on the density ranges at the end points of the filaments. The query is processed efficiently over the entire hierarchical Morse-Smale complex, allowing for interactive visualization. We apply Felix to computer simulations based on the heuristic Voronoi kinematic model and the standard \$\Lambda\$CDM cosmology, and demonstrate its usefulness through two case studies. First, we extract cosmic filaments within and across cluster like regions in Voronoi kinematic simulation datasets. We demonstrate that we produce similar results to existing structure finders. Filaments that form the spine of the cosmic web, which exist in high density regions in the current epoch, are isolated using Felix. Also, filaments present in void-like regions are isolated and visualized. These filamentary structures are often over shadowed by higher density range filaments and are not easily characterizable and extractable using other filament extraction methodologies.
  89. Stable Signatures for Dynamic Graphs and Dynamic Metric Spaces via Zigzag Persistence (2018)

    Woojin Kim, Facundo Memoli
    Abstract When studying flocking/swarming behaviors in animals one is interested in quantifying and comparing the dynamics of the clustering induced by the coalescence and disbanding of animals in different groups. In a similar vein, studying the dynamics of social networks leads to the problem of characterizing groups/communities as they form and disperse throughout time. Motivated by this, we study the problem of obtaining persistent homology based summaries of time-dependent data. Given a finite dynamic graph (DG), we first construct a zigzag persistence module arising from linearizing the dynamic transitive graph naturally induced from the input DG. Based on standard results, we then obtain a persistence diagram or barcode from this zigzag persistence module. We prove that these barcodes are stable under perturbations in the input DG under a suitable distance between DGs that we identify. More precisely, our stability theorem can be interpreted as providing a lower bound for the distance between DGs. Since it relies on barcodes, and their bottleneck distance, this lower bound can be computed in polynomial time from the DG inputs. Since DGs can be given rise by applying the Rips functor (with a fixed threshold) to dynamic metric spaces, we are also able to derive related stable invariants for these richer class of dynamic objects. Along the way, we propose a summarization of dynamic graphs that captures their time-dependent clustering features which we call formigrams. These set-valued functions generalize the notion of dendrogram, a prevalent tool for hierarchical clustering. In order to elucidate the relationship between our distance between two DGs and the bottleneck distance between their associated barcodes, we exploit recent advances in the stability of zigzag persistence due to Botnan and Lesnick, and to Bjerkevik.
  90. Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)

    Nicole Bussola, Bruno Papa, Ombretta Melaiu, Aurora Castellano, Doriana Fruci, Giuseppe Jurman
    Abstract We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the U-Net model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform.
  91. Feasibility of Topological Data Analysis for Event-Related fMRI (2019)

    Cameron T. Ellis, Michael Lesnick, Gregory Henselman-Petrusek, Bryn Keller, Jonathan D. Cohen
    Abstract Recent fMRI research shows that perceptual and cognitive representations are instantiated in high-dimensional multivoxel patterns in the brain. However, the methods for detecting these representations are limited. Topological data analysis (TDA) is a new approach, based on the mathematical field of topology, that can detect unique types of geometric features in patterns of data. Several recent studies have successfully applied TDA to study various forms of neural data; however, to our knowledge, TDA has not been successfully applied to data from event-related fMRI designs. Event-related fMRI is very common but limited in terms of the number of events that can be run within a practical time frame and the effect size that can be expected. Here, we investigate whether persistent homology—a popular TDA tool that identifies topological features in data and quantifies their robustness—can identify known signals given these constraints. We use fmrisim, a Python-based simulator of realistic fMRI data, to assess the plausibility of recovering a simple topological representation under a variety of conditions. Our results suggest that persistent homology can be used under certain circumstances to recover topological structure embedded in realistic fMRI data simulations.How do we represent the world? In cognitive neuroscience it is typical to think representations are points in high-dimensional space. In order to study these kinds of spaces it is necessary to have tools that capture the organization of high-dimensional data. Topological data analysis (TDA) holds promise for detecting unique types of geometric features in patterns of data. Although potentially useful, TDA has not been applied to event-related fMRI data. Here we utilized a popular tool from TDA, persistent homology, to recover topological signals from event-related fMRI data. We simulated realistic fMRI data and explored the parameters under which persistent homology can successfully extract signal. We also provided extensive code and recommendations for how to make the most out of TDA for fMRI analysis.
  92. Using Multidimensional Topological Data Analysis to Identify Traits of Hip Osteoarthritis (2018)

    Jasmine Rossi‐deVries, Valentina Pedoia, Michael A. Samaan, Adam R. Ferguson, Richard B. Souza, Sharmila Majumdar
    Abstract Background Osteoarthritis (OA) is a multifaceted disease with many variables affecting diagnosis and progression. Topological data analysis (TDA) is a state-of-the-art big data analytics tool that can combine all variables into multidimensional space. TDA is used to simultaneously analyze imaging and gait analysis techniques. Purpose To identify biochemical and biomechanical biomarkers able to classify different disease progression phenotypes in subjects with and without radiographic signs of hip OA. Study Type Longitudinal study for comparison of progressive and nonprogressive subjects. Population In all, 102 subjects with and without radiographic signs of hip osteoarthritis. Field Strength/Sequence 3T, SPGR 3D MAPSS T1ρ/T2, intermediate-weighted fat-suppressed fast spin-echo (FSE). Assessment Multidimensional data analysis including cartilage composition, bone shape, Kellgren–Lawrence (KL) classification of osteoarthritis, scoring hip osteoarthritis with MRI (SHOMRI), hip disability and osteoarthritis outcome score (HOOS). Statistical Tests Analysis done using TDA, Kolmogorov–Smirnov (KS) testing, and Benjamini-Hochberg to rank P-value results to correct for multiple comparisons. Results Subjects in the later stages of the disease had an increased SHOMRI score (P \textless 0.0001), increased KL (P = 0.0012), and older age (P \textless 0.0001). Subjects in the healthier group showed intact cartilage and less pain. Subjects found between these two groups had a range of symptoms. Analysis of this subgroup identified knee biomechanics (P \textless 0.0001) as an initial marker of the disease that is noticeable before the morphological progression and degeneration. Further analysis of an OA subgroup with femoroacetabular impingement (FAI) showed anterior labral tears to be the most significant marker (P = 0.0017) between those FAI subjects with and without OA symptoms. Data Conclusion The data-driven analysis obtained with TDA proposes new phenotypes of these subjects that partially overlap with the radiographic-based classical disease status classification and also shows the potential for further examination of an early onset biomechanical intervention. Level of Evidence: 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;48:1046–1058.
  93. Continuous Indexing of Fibrosis (CIF): Improving the Assessment and Classification of MPN Patients (2022)

    Hosuk Ryou, Korsuk Sirinukunwattana, Alan Aberdeen, Gillian Grindstaff, Bernadette Stolz, Helen Byrne, Heather A. Harrington, Nikolaos Sousos, Anna L. Godfrey, Claire N. Harrison, Bethan Psaila, Adam J. Mead, Gabrielle Rees, Gareth D. H. Turner, Jens Rittscher, Daniel Royston
    Abstract The detection and grading of fibrosis in myeloproliferative neoplasms (MPN) is an important component of disease classification, prognostication and disease monitoring. However, current fibrosis grading systems are only semi-quantitative and fail to capture sample heterogeneity. To improve the detection, quantitation and representation of reticulin fibrosis, we developed a machine learning (ML) approach using bone marrow trephine (BMT) samples (n = 107) from patients diagnosed with MPN or a reactive / nonneoplastic marrow. The resulting Continuous Indexing of Fibrosis (CIF) enhances the detection and monitoring of fibrosis within BMTs, and aids the discrimination of MPN subtypes. When combined with megakaryocyte feature analysis, CIF discriminates between the frequently challenging differential diagnosis of essential thrombocythemia (ET) and pre-fibrotic myelofibrosis (pre-PMF) with high predictive accuracy [area under the curve = 0.94]. CIF also shows significant promise in the identification of MPN patients at risk of disease progression; analysis of samples from 35 patients diagnosed with ET and enrolled in the Primary Thrombocythemia-1 (PT-1) trial identified features predictive of post-ET myelofibrosis (area under the curve = 0.77). In addition to these clinical applications, automated analysis of fibrosis has clear potential to further refine disease classification boundaries and inform future studies of the micro-environmental factors driving disease initiation and progression in MPN and other stem cell disorders. The image analysis methods used to generate CIF can be readily integrated with those of other key morphological features in MPNs, including megakaryocyte morphology, that lie beyond the scope of conventional histological assessment. Key PointsMachine learning enables an objective and quantitative description of reticulin fibrosis within the bone marrow of patients with myeloproliferative neoplasms (MPN),Automated analysis and Continuous Indexing of Fibrosis (CIF) captures heterogeneity within MPN samples and has utility in refined classification and disease monitoringQuantitative fibrosis assessment combined with topological data analysis may help to predict patients at increased risk of progression to post-ET myelofibrosis, and assist in the discrimination of ET and pre-fibrotic PMF (pre-PMF)
  94. The Architecture of the Endoplasmic Reticulum Is Regulated by the Reversible Lipid Modification of the Shaping Protein CLIMP-63 (2018)

    Patrick A. Sandoz, Robin A. Denhardt-Eriksson, Laurence Abrami, Luciano Abriata, Gard Spreemann, Catherine Maclachlan, Sylvia Ho, Béatrice Kunz, Kathryn Hess, Graham Knott, Vassily Hatzimanikatis, F. Gisou van der Goot
    Abstract \textlessh3\textgreaterAbstract\textless/h3\textgreater \textlessp\textgreaterThe endoplasmic reticulum (ER) has a complex morphology generated and maintained by membrane-shaping proteins and membrane energy minimization, though not much is known about how it is regulated. The architecture of this intracellular organelle is balanced between large, thin sheets that are densely packed in the perinuclear region and a connected network of branched, elongated tubules that extend throughout the cytoplasm. Sheet formation is known to involve the cytoskeleton-linking membrane protein 63 (CLIMP-63), though its regulation and the depth of its involvement remain unknown. Here we show that the post-translational modification of CLIMP-63 by the palmitoyltransferase ZDHHC6 controls the relative distribution of CLIMP-63 between the ER and the plasma membrane. By combining data-driven mathematical modeling, predictions, and experimental validation, we found that the attachment of a medium chain fatty acid, so-called S-palmitoylation, to the unique CLIMP-63 cytoplasmic cysteine residue drastically reduces its turnover rate, and thereby controls its abundance. Light microscopy and focused ion beam electron microcopy further revealed that enhanced CLIMP-63 palmitoylation leads to strong ER-sheet proliferation. Altogether, we show that ZDHHC6-mediated S-palmitoylation regulates the cellular localization of CLIMP-63, the morphology of the ER, and the interconversion of ER structural elements in mammalian cells through its action on the CLIMP-63 protein.\textless/p\textgreater\textlessh3\textgreaterSignificance Statement\textless/h3\textgreater \textlessp\textgreaterEukaryotic cells subcompartmentalize their various functions into organelles, the shape of each being specific and necessary for its proper role. However, how these shapes are generated and controlled is poorly understood. The endoplasmic reticulum is the largest membrane-bound intracellular compartment, accounting for more than 50% of all cellular membranes. We found that the shape and quantity of its sheet-like structures are controlled by a specific protein, cytoskeleton-linking membrane protein 63, through the acquisition of a lipid chain attached by an enzyme called ZDHHC6. Thus, by modifying the ZDHHC6 amounts, a cell can control the shape of its ER. The modeling and prediction technique used herein also provides a method for studying the interconnected function of other post-translational modifications in organelles.\textless/p\textgreater
  95. Segmentation of Biomedical Images by a Computational Topology Framework (2017)

    Rodrigo Rojas Moraleda, Wei Xiong, Niels Halama, Katja Breitkopf-Heinlein, Steven Steven, Luis Salinas, Dieter W. Heermann, Nektarios A. Valous
    Abstract The segmentation of cell nuclei is an important step towards the automated analysis of histological images. The presence of a large number of nuclei in whole-slide images necessitates methods that are computationally tractable in addition to being effective. In this work, a method is developed for the robust segmentation of cell nuclei in histological images based on the principles of persistent homology. More specifically, an abstract simplicial homology approach for image segmentation is established. Essentially, the approach deals with the persistence of disconnected sets in the image, thus identifying salient regions that express patterns of persistence. By introducing an image representation based on topological features, the task of segmentation is less dependent on variations of color or texture. This results in a novel approach that generalizes well and provides stable performance. The method conceptualizes regions of interest (cell nuclei) pertinent to their topological features in a successful manner. The time cost of the proposed approach is lower-bounded by an almost linear behavior and upper-bounded by O(n2) in a worst-case scenario. Time complexity matches a quasilinear behavior which is O(n1+ɛ) for ε \textless 1. Images acquired from histological sections of liver tissue are used as a case study to demonstrate the effectiveness of the approach. The histological landscape consists of hepatocytes and non-parenchymal cells. The accuracy of the proposed methodology is verified against an automated workflow created by the output of a conventional filter bank (validated by experts) and the supervised training of a random forest classifier. The results are obtained on a per-object basis. The proposed workflow successfully detected both hepatocyte and non-parenchymal cell nuclei with an accuracy of 84.6%, and hepatocyte cell nuclei only with an accuracy of 86.2%. A public histological dataset with supplied ground-truth data is also used for evaluating the performance of the proposed approach (accuracy: 94.5%). Further validations are carried out with a publicly available dataset and ground-truth data from the Gland Segmentation in Colon Histology Images Challenge (GlaS) contest. The proposed method is useful for obtaining unsupervised robust initial segmentations that can be further integrated in image/data processing and management pipelines. The development of a fully automated system supporting a human expert provides tangible benefits in the context of clinical decision-making.
  96. Using Persistent Homology as a New Approach for Super-Resolution Localization Microscopy Data Analysis and Classification of γH2AX Foci/Clusters (2018)

    Andreas Hofmann, Matthias Krufczik, Dieter W. Heermann, Michael Hausmann
    Abstract DNA double strand breaks (DSB) are the most severe damages in chromatin induced by ionizing radiation. In response to such environmentally determined stress situations, cells have developed repair mechanisms. Although many investigations have contributed to a detailed understanding of repair processes, e.g., homologous recombination repair or non-homologous end-joining, the question is not sufficiently answered, how a cell decides to apply a certain repair process at a certain damage site, since all different repair pathways could simultaneously occur in the same cell nucleus. One of the first processes after DSB induction is phosphorylation of the histone variant H2AX to γH2AX in the given surroundings of the damaged locus. Since the spatial organization of chromatin is not random, it may be conclusive that the spatial organization of γH2AX foci is also not random, and rather, contributes to accessibility of special repair proteins to the damaged site, and thus, to the following repair pathway at this given site. The aim of this article is to demonstrate a new approach to analyze repair foci by their topology in order to obtain a cell independent method of categorization. During the last decade, novel super-resolution fluorescence light microscopic techniques have enabled new insights into genome structure and spatial organization on the nano-scale in the order of 10 nm. One of these techniques is single molecule localization microscopy (SMLM) with which the spatial coordinates of single fluorescence molecules can precisely be determined and density and distance distributions can be calculated. This method is an appropriate tool to quantify complex changes of chromatin and to describe repair foci on the single molecule level. Based on the pointillist information obtained by SMLM from specifically labeled heterochromatin and γH2AX foci reflecting the chromatin morphology and repair foci topology, we have developed a new analytical methodology of foci or foci cluster characterization, respectively, by means of persistence homology. This method allows, for the first time, a cell independent comparison of two point distributions (here the point distributions of two γH2AX clusters) with each other of a selected ensample and to give a mathematical measure of their similarity. In order to demonstrate the feasibility of this approach, cells were irradiated by low LET (linear energy transfer) radiation with different doses and the heterochromatin and γH2AX foci were fluorescently labeled by antibodies for SMLM. By means of our new analysis method, we were able to show that the topology of clusters of γH2AX foci can be categorized depending on the distance to heterochromatin. This method opens up new possibilities to categorize spatial organization of point patterns by parameterization of topological similarity.
  97. Unexpected Topology of the Temperature Fluctuations in the Cosmic Microwave Background (2019)

    Pratyush Pranav, Robert J. Adler, Thomas Buchert, Herbert Edelsbrunner, Bernard J. T. Jones, Armin Schwartzman, Hubert Wagner, Rien van de Weygaert
    Abstract We study the topology generated by the temperature fluctuations of the cosmic microwave background (CMB) radiation, as quantified by the number of components and holes, formally given by the Betti numbers, in the growing excursion sets. We compare CMB maps observed by the \textlessi\textgreaterPlanck\textlessi/\textgreater satellite with a thousand simulated maps generated according to the ΛCDM paradigm with Gaussian distributed fluctuations. The comparison is multi-scale, being performed on a sequence of degraded maps with mean pixel separation ranging from 0.05 to 7.33°. The survey of the CMB over 𝕊\textlesssup\textgreater2\textlesssup/\textgreater is incomplete due to obfuscation effects by bright point sources and other extended foreground objects like our own galaxy. To deal with such situations, where analysis in the presence of “masks” is of importance, we introduce the concept of relative homology. The parametric \textlessi\textgreaterχ\textlessi/\textgreater\textlesssup\textgreater2\textlesssup/\textgreater-test shows differences between observations and simulations, yielding \textlessi\textgreaterp\textlessi/\textgreater-values at percent to less than permil levels roughly between 2 and 7°, with the difference in the number of components and holes peaking at more than 3\textlessi\textgreaterσ\textlessi/\textgreater sporadically at these scales. The highest observed deviation between the observations and simulations for \textlessi\textgreaterb\textlessi/\textgreater\textlesssub\textgreater0\textlesssub/\textgreater and \textlessi\textgreaterb\textlessi/\textgreater\textlesssub\textgreater1\textlesssub/\textgreater is approximately between 3\textlessi\textgreaterσ\textlessi/\textgreater and 4\textlessi\textgreaterσ\textlessi/\textgreater at scales of 3–7°. There are reports of mildly unusual behaviour of the Euler characteristic at 3.66° in the literature, computed from independent measurements of the CMB temperature fluctuations by \textlessi\textgreaterPlanck\textlessi/\textgreater’s predecessor, the \textlessi\textgreaterWilkinson\textlessi/\textgreater Microwave Anisotropy Probe (WMAP) satellite. The mildly anomalous behaviour of the Euler characteristic is phenomenologically related to the strongly anomalous behaviour of components and holes, or the zeroth and first Betti numbers, respectively. Further, since these topological descriptors show consistent anomalous behaviour over independent measurements of \textlessi\textgreaterPlanck\textlessi/\textgreater and WMAP, instrumental and systematic errors may be an unlikely source. These are also the scales at which the observed maps exhibit low variance compared to the simulations, and approximately the range of scales at which the power spectrum exhibits a dip with respect to the theoretical model. Non-parametric tests show even stronger differences at almost all scales. Crucially, Gaussian simulations based on power-spectrum matching the characteristics of the observed dipped power spectrum are not able to resolve the anomaly. Understanding the origin of the anomalies in the CMB, whether cosmological in nature or arising due to late-time effects, is an extremely challenging task. Regardless, beyond the trivial possibility that this may still be a manifestation of an extreme Gaussian case, these observations, along with the super-horizon scales involved, may motivate the study of primordial non-Gaussianity. Alternative scenarios worth exploring may be models with non-trivial topology, including topological defect models.
  98. Blind Swarms for Coverage in 2-D (2005)

    V. D. Silva, R. Ghrist, A. Muhammad
    Abstract We consider coverage problems in robot sensor networks with minimal sensing capabilities. In particular, we demonstrate that a “blind” swarm of robots with no localization and only a weak form of distance estimation can rigorously determine coverage in a bounded planar domain of unknown size and shape. The methods we introduce come from algebraic topology. I. COVERAGE PROBLEMS Many of the potential applications of robot swarms require information about coverage in a given domain. For example, using a swarm of robot sensors for surveillance and security applications carries with it the charge to maximize, or, preferably, guarantee coverage. Such applications include networks of security cameras, mine field sweeping via networked robots [18], and oceanographic sampling [4]. In these contexts, each robot has some coverage domain, and one wishes to know about the union of these coverage domains. Such problems are also crucial in applications not involving robots directly, e.g., communication networks. As a preliminary analysis, we consider the static “field” coverage problem, in which robots are assumed stationary and the goal is to verify blanket coverage of a given domain. There is a large literature on this subject; see, e.g., [7], [1], [16]. In addition, there are variants on these problems involving “barrier” coverage to separate regions. Dynamic or “sweeping” coverage [3] is a common and challenging task with applications ranging from security to vacuuming. Although a sensor network composed of robots will have dynamic capabilities, we restrict attention in this brief paper to the static case in order to lay the groundwork for future inquiry. There are two primary approaches to static coverage problems in the literature. The first uses computational geometry tools applied to exact node coordinates. This typically involves ‘ruler-and-compass’ style geometry [10] or Delaunay triangulations of the domain [16], [14], [20]. Such approaches are very rigid with regards to inputs: one must know exact node coordinates and one must know the geometry of the domain precisely to determine the Delaunay complex. To alleviate the former requirement, many authors have turned to probabilistic tools. For example, in [13], the author assumes a randomly and uniformly distributed collection of nodes in a domain with a fixed geometry and proves expected area coverage. Other approaches [15], [19] give percolationtype results about coverage and network integrity for randomly distributed nodes. The drawback of these methods is the need for strong assumptions about the exact shape of the domain, as well as the need for a uniform distribution of nodes. In the sensor networks community, there is a compelling interest (and corresponding burgeoning literature) in determining properties of a network in which the nodes do not possess coordinate data. One example of a coordinate-free approach is in [17], which gives a heuristic method for geographic routing without coordinate data: among the large literature arising from this paper, we note in particular the mathematical analysis of this approach in [11]. To our knowledge, noone has treated the coverage problem in a coordinate-free setting. In this note, we introduce a new set of tools for answering coverage problems in robotics and sensor networks with minimal assumptions about domain geometry and node localization. We provide a sufficiency criterion for coverage. We do not answer the problem of how the nodes should be placed in order to maximize coverage, nor the minimum number of such nodes necessary; neither do we address how to reallocate nodes to fill coverage holes.