Modeling the Spread of the Zika Virus Using Topological Data Analysis (2018)

Export citation

Acridine Derivatives as Inhibitors of the IRE1α–XBP1 Pathway Are Cytotoxic to Human Multiple Myeloma (2016)

Dadi Jiang, Arvin B. Tam, Muthuraman Alagappan, Michael P. Hay, Aparna Gupta, Margaret M. Kozak, David E. Solow-Cordero, Pek Y. Lum, Nicholas C. Denko, Amato J. Giaccia

Export citation

Biochemical Association of Metabolic Profile and Microbiome in Chronic Pressure Ulcer Wounds (2015)

Mary Cloud B. Ammons, Kathryn Morrissey, Brian P. Tripet, James T. Van Leuven, Anne Han, Gerald S. Lazarus, Jonathan M. Zenilman, Philip S. Stewart, Garth A. James, Valérie Copié

Export citation

Persistent Homology for the Quantitative Evaluation of Architectural Features in Prostate Cancer Histology (2019)

Peter Lawson, Andrew B. Sholl, J. Quincy Brown, Brittany Terese Fasy, Carola Wenk

Export citation

Protein Classification With Improved Topological Data Analysis (2018)

Tamal K. Dey, Sayan Mandal

Topological Feature Tracking for Submesoscale Eddies (2022)

Sam Voisin, Jay Hineman, James B. Polly, Gary Koplik, Ken Ball, Paul Bendich, Joseph D‘Addezio, Gregg A. Jacobs, Tamay Özgökmen

High-Throughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: Application to Zeolites (2018)

Yongjin Lee, Senja D. Barthel, Pawe\\textbackslash\l D\\textbackslash\lotko, Seyed Mohamad Moosavi, Kathryn Hess, Berend Smit

Export citation

Alteration in the Local and Global Functional Connectivity of Resting State Networks in Parkinson’s Disease (2018)

Maryam Ghahremani, Jaejun Yoo, Sun Ju Chung, Kwangsun Yoo, Jong C. Ye, Yong Jeong

Export citation

Non-Empirical Identification of Trigger Sites in Image Data Using Persistent Homology: Crack Formation During Heterogeneous Reduction of Iron-Ore Sinters (2018)

M. Kimura, I. Obayashi, Y. Takeichi, R. Murao, Y. Hiraoka

Export citation

Reconstructing Linearly Embedded Graphs: A First Step to Stratified Space Learning (2021)

Yossi Bokor, Christopher Williams, Katharine Turner

Community Resources

Code (GitHub)

Multiscale Topology Characterizes Dynamic Tumor Vascular Networks (2022)

Bernadette J. Stolz, Jakob Kaeppler, Bostjan Markelc, Franziska Braun, Florian Lipsmeier, Ruth J. Muschel, Helen M. Byrne, Heather A. Harrington

A New Approach to Investigate the Association Between Brain Functional Connectivity and Disease Characteristics of Attention-Deficit/Hyperactivity Disorder: Topological Neuroimaging Data Analysis (2015)

Sunghyon Kyeong, Seonjeong Park, Keun-Ah Cheon, Jae-Jin Kim, Dong-Ho Song, Eunjoo Kim

Export citation

Deep Learning With Topological Signatures (2017)

Christoph Hofer, Roland Kwitt, Marc Niethammer, Andreas Uhl

Topological Data Analysis of Task-Based fMRI Data From Experiments on Schizophrenia (2021)

Bernadette J. Stolz, Tegan Emerson, Satu Nahkuri, Mason A. Porter, Heather A. Harrington

Interdisciplinary Approaches to Automated Obstructive Sleep Apnea Diagnosis Through High-Dimensional Multiple Scaled Data Analysis (2019)

Giseon Heo, Kathryn Leonard, Xu Wang, Yi Zhou

Export citation

Skyler (2023)

Yossi Bokor Bleile

Abstract

Julia package for recovering stratified spaces underlying point clouds.

Homological Analysis of Multi-Qubit Entanglement (2018)

Alessandra di Pierro, Stefano Mancini, Laleh Memarzadeh, Riccardo Mengoni

Abstract

We propose the usage of persistent homologies to characterize multipartite entanglement. On a multi-qubit data set we introduce metric-like measures defined in terms of bipartite entanglement and then we derive barcodes. We show that, depending on the distance, they are able to produce different classifications. In one case, it is possible to obtain the standard separability classes. In the other case, a new classification of entangled states of three and four qubits is provided.

Sheaves Are the Canonical Data Structure for Sensor Integration (2017)

Michael Robinson

Abstract

A sensor integration framework should be sufficiently general to accurately represent many sensor modalities, and also be able to summarize information in a faithful way that emphasizes important, actionable information. Few approaches adequately address these two discordant requirements. The purpose of this expository paper is to explain why sheaves are the canonical data structure for sensor integration and how the mathematics of sheaves satisfies our two requirements. We outline some of the powerful inferential tools that are not available to other representational frameworks.

Graph Filtration Learning (2020)

Christoph Hofer, Florian Graf, Bastian Rieck, Marc Niethammer, Roland Kwitt

Abstract

We propose an approach to learning with graph-structured data in the problem domain of graph classification. In particular, we present a novel type of readout operation to aggregate node features into a graph-level representation. To this end, we leverage persistent homology computed via a real-valued, learnable, filter function. We establish the theoretical foundation for differentiating through the persistent homology computation. Empirically, we show that this type of readout operation compares favorably to previous techniques, especially when the graph connectivity structure is informative for the learning problem.

Topological Autoencoders (2020)

Michael Moor, Max Horn, Bastian Rieck, Karsten Borgwardt

Abstract

We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors.

Coordinate-Free Coverage in Sensor Networks With Controlled Boundaries via Homology (2006)

V. de Silva, R. Ghrist

Abstract

Tools from computational homology are introduced to verify coverage in an idealized sensor network. These methods are unique in that, while they are coordinate-free and assume no localization or orientation capabilities for the nodes, there are also no probabilistic assumptions. The key ingredient is the theory of homology from algebraic topology. The robustness of these tools is demonstrated by adapting them to a variety of settings, including static planar coverage, 3-D barrier coverage, and time-dependent sweeping coverage. Results are also given on hole repair, error tolerance, optimal coverage, and variable radii. An overview of implementation is given.

Investigation of Flash Crash via Topological Data Analysis (2020)

Wonse Kim, Younng-Jin Kim, Gihyun Lee, Woong Kook

Abstract

Topological data analysis has been acknowledged as one of the most successful mathematical data analytic methodologies in various fields including medicine, genetics, and image analysis. In this paper, we explore the potential of this methodology in finance by applying persistence landscape and dynamic time series analysis to analyze an extreme event in the stock market, known as Flash Crash. We will provide results of our empirical investigation to confirm the effectiveness of our new method not only for the characterization of this extreme event but also for its prediction purposes.

A Persistent Weisfeiler-Lehman Procedure for Graph Classification (2019)

Bastian Rieck, Christian Bock, Karsten Borgwardt

Abstract

The Weisfeiler–Lehman graph kernel exhibits competitive performance in many graph classification tasks. However, its subtree features are not able to capture connected components and cycles, topological features known for characterising graphs. To extract such features, we leverage propagated node label information and transform unweighted graphs into metric ones. This permits us to augment the subtree features with topological information obtained using persistent homology, a concept from topological data analysis. Our method, which we formalise as a generalisation of Weisfeiler–Lehman subtree features, exhibits favourable classification accuracy and its improvements in predictive performance are mainly driven by including cycle information.

Topologically Densified Distributions (2020)

Christoph Hofer, Florian Graf, Marc Niethammer, Roland Kwitt

Abstract

We study regularization in the context of small sample-size learning with over-parametrized neural networks. Specifically, we shift focus from architectural properties, such as norms on the network weights, to properties of the internal representations before a linear classifier. Specifically, we impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances, i.e., a property beneficial for generalization. By leveraging previous work to impose topological constrains in a neural network setting, we provide empirical evidence (across various vision benchmarks) to support our claim for better generalization.

Topological Graph Neural Networks (2021)

Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt

Abstract

Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data.

Unsupervised Topological Learning for Identification of Atomic Structures (2022)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

We propose an unsupervised learning methodology with descriptors based on topological data analysis (TDA) concepts to describe the local structural properties of materials at the atomic scale. Based only on atomic positions and without a priori knowledge, our method allows for an autonomous identification of clusters of atomic structures through a Gaussian mixture model. We apply successfully this approach to the analysis of elemental Zr in the crystalline and liquid states as well as homogeneous nucleation events under deep undercooling conditions. This opens the way to deeper and autonomous study of complex phenomena in materials at the atomic scale.

Topological Singularity Detection at Multiple Scales (2023)

Julius von Rohrscheidt, Bastian Rieck

Abstract

The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.

Optimal Topological Cycles and Their Application in Cardiac Trabeculae Restoration (2017)

Pengxiang Wu, Chao Chen, Yusu Wang, Shaoting Zhang, Changhe Yuan, Zhen Qian, Dimitris Metaxas, Leon Axel

Abstract

In cardiac image analysis, it is important yet challenging to reconstruct the trabeculae, namely, fine muscle columns whose ends are attached to the ventricular walls. To extract these fine structures, traditional image segmentation methods are insufficient. In this paper, we propose a novel method to jointly detect salient topological handles and compute the optimal representations of them. The detected handles are considered hypothetical trabeculae structures. They are further screened using a classifier and are then included in the final segmentation. We show in experiments the significance of our contribution compared with previous standard segmentation methods without topological priors, as well as with previous topological method in which non-optimal representations of topological handles are used.

Topological Biomarkers for Real-Time Detection of Epileptic Seizures (2022)

Ximena Fernández, Diego Mateos

Abstract

Automated seizure detection is a fundamental problem in computational neuroscience towards diagnosis and treatment's improvement of epileptic disease. We propose a real-time computational method for automated tracking and detection of epileptic seizures from raw neurophysiological recordings. Our mechanism is based on the topological analysis of the sliding-window embedding of the time series derived from simultaneously recorded channels. We extract topological biomarkers from the signals via the computation of the persistent homology of time-evolving topological spaces. Remarkably, the proposed biomarkers robustly captures the change in the brain dynamics during the ictal state. We apply our methods in different types of signals including scalp and intracranial EEG and MEG, in patients during interictal and ictal states, showing high accuracy in a range of clinical situations.

Multiphase Mixing Quantification by Computational Homology and Imaging Analysis (2011)

Jianxin Xu, Hua Wang, Hui Fang

Abstract

The purpose of this study is to introduce a new technique for quantifying the efficiency of multiphase mixing. This technique based on algebraic topology is illustrated by using the hydraulic modeling of gas agitated reactors stirred by top lance gas injection and image analysis. The zeroth Betti numbers are used to estimate the numbers of pieces in the patterns, leading to a useful parameter to characterize the mixture homogeneity. The first Betti numbers are introduced to characterize the nonhomogeneity of the mixture. The mixing efficiency can be characterized by the Betti numbers for binary images of the patterns. This novel method may be applied for studying a variety of multiphase mixing problems in which multiphase components or tracers are visually distinguishable.

Decoding of Neural Data Using Cohomological Feature Extraction (2019)

Erik Rybakken, Nils Baas, Benjamin Dunn

Abstract

We introduce a novel data-driven approach to discover and decode features in the neural code coming from large population neural recordings with minimal assumptions, using cohomological feature extraction. We apply our approach to neural recordings of mice moving freely in a box, where we find a circular feature. We then observe that the decoded value corresponds well to the head direction of the mouse. Thus, we capture head direction cells and decode the head direction from the neural population activity without having to process the mouse's behavior. Interestingly, the decoded values convey more information about the neural activity than the tracked head direction does, with differences that have some spatial organization. Finally, we note that the residual population activity, after the head direction has been accounted for, retains some low-dimensional structure that is correlated with the speed of the mouse.

Lung Topology Characteristics in Patients With Chronic Obstructive Pulmonary Disease (2018)

Francisco Belchi, Mariam Pirashvili, Joy Conway, Michael Bennett, Ratko Djukanovic, Jacek Brodzki

Abstract

Quantitative features that can currently be obtained from medical imaging do not provide a complete picture of Chronic Obstructive Pulmonary Disease (COPD). In this paper, we introduce a novel analytical tool based on persistent homology that extracts quantitative features from chest CT scans to describe the geometric structure of the airways inside the lungs. We show that these new radiomic features stratify COPD patients in agreement with the GOLD guidelines for COPD and can distinguish between inspiratory and expiratory scans. These CT measurements are very different to those currently in use and we demonstrate that they convey significant medical information. The results of this study are a proof of concept that topological methods can enhance the standard methodology to create a finer classification of COPD and increase the possibilities of more personalized treatment.

Unsupervised Topological Learning Approach of Crystal Nucleation in Pure Tantalum (2021)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unraveled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometer length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to a monatomic metal, namely Tantalum (Ta), it shows that both translational and orientational ordering always come into play simultaneously when homogeneous nucleation starts in regions with low five-fold symmetry.

Finding Universal Structures in Quantum Many-Body Dynamics via Persistent Homology (2020)

Daniel Spitz, Jürgen Berges, Markus K. Oberthaler, Anna Wienhard

Abstract

Inspired by topological data analysis techniques, we introduce persistent homology observables and apply them in a geometric analysis of the dynamics of quantum field theories. As a prototype application, we consider simulated data of a two-dimensional Bose gas far from equilibrium. We discover a continuous spectrum of dynamical scaling exponents, which provides a refined classification of nonequilibrium universal phenomena. A possible explanation of the underlying processes is provided in terms of mixing wave turbulence and vortex kinetics components in point clouds. We find that the persistent homology scaling exponents are inherently linked to the geometry of the system, as the derivation of a packing relation reveals. The approach opens new ways of analyzing quantum many-body dynamics in terms of robust topological structures beyond standard field theoretic techniques.

Statistical Topological Data Analysis - A Kernel Perspective (2015)

Roland Kwitt, Stefan Huber, Marc Niethammer, Weili Lin, Ulrich Bauer

Abstract

We consider the problem of statistical computations with persistence diagrams, a summary representation of topological features in data. These diagrams encode persistent homology, a widely used invariant in topological data analysis. While several avenues towards a statistical treatment of the diagrams have been explored recently, we follow an alternative route that is motivated by the success of methods based on the embedding of probability measures into reproducing kernel Hilbert spaces. In fact, a positive definite kernel on persistence diagrams has recently been proposed, connecting persistent homology to popular kernel-based learning techniques such as support vector machines. However, important properties of that kernel enabling a principled use in the context of probability measure embeddings remain to be explored. Our contribution is to close this gap by proving universality of a variant of the original kernel, and to demonstrate its effective use in two-sample hypothesis testing on synthetic as well as real-world data.

Export citation

Topological Analysis of Population Activity in Visual Cortex (2008)

Gurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, Dario L. Ringach

Abstract

Information in the cortex is thought to be represented by the joint activity of neurons. Here we describe how fundamental questions about neural representation can be cast in terms of the topological structure of population activity. A new method, based on the concept of persistent homology, is introduced and applied to the study of population activity in primary visual cortex (V1). We found that the topological structure of activity patterns when the cortex is spontaneously active is similar to those evoked by natural image stimulation and consistent with the topology of a two sphere. We discuss how this structure could emerge from the functional organization of orientation and spatial frequency maps and their mutual relationship. Our findings extend prior results on the relationship between spontaneous and evoked activity in V1 and illustrates how computational topology can help tackle elementary questions about the representation of information in the nervous system.

Topological Attention for Time Series Forecasting (2021)

Sebastian Zeng, Florian Graf, Christoph Hofer, Roland Kwitt

Abstract

The problem of (point) forecasting univariate time series is considered. Most approaches, ranging from traditional statistical methods to recent learning-based techniques with neural networks, directly operate on raw time series observations. As an extension, we study whether local topological properties, as captured via persistent homology, can serve as a reliable signal that provides complementary information for learning to forecast. To this end, we propose topological attention, which allows attending to local topological features within a time horizon of historical data. Our approach easily integrates into existing end-to-end trainable forecasting models, such as N-BEATS, and, in combination with the latter exhibits state-of-the-art performance on the large-scale M4 benchmark dataset of 100,000 diverse time series from different domains. Ablation experiments, as well as a comparison to recent techniques in a setting where only a single time series is available for training, corroborate the beneficial nature of including local topological information through an attention mechanism.

Topological Data Analysis for Electric Motor Eccentricity Fault Detection (2022)

Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru

Abstract

In this paper, we develop topological data analysis (TDA) method for motor current signature analysis (MCSA), and apply it to induction motor eccentricity fault detection. We introduce TDA and present the procedure of extracting topological features from time-domain data that will be represented using persistence diagrams and vectorized Betti sequences. The procedure is applied to induction machine phase current signal analysis, and shown to be highly effective in differentiating signals from different eccentricity levels. With TDA, we are able to use a simple regression model that can predict the fault levels with reasonable accuracy, even for the data of eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of fault detection applications.

(Quasi)Periodicity Quantification in Video Data, Using Topology (2018)

Christopher J. Tralie, Jose A. Perea

Abstract

This work introduces a novel framework for quantifying the presence and strength of recurrent dynamics in video data. Specifically, we provide continuous measures of periodicity (perfect repetition) and quasiperiodicity (superposition of periodic modes with noncommensurate periods), in a way which does not require segmentation, training, object tracking, or 1-dimensional surrogate signals. Our methodology operates directly on video data. The approach combines ideas from nonlinear time series analysis (delay embeddings) and computational topology (persistent homology) by translating the problem of finding recurrent dynamics in video data into the problem of determining the circularity or toroidality of an associated geometric space. Through extensive testing, we show the robustness of our scores with respect to several noise models/levels; we show that our periodicity score is superior to other methods when compared to human-generated periodicity rankings; and furthermore, we show that our quasiperiodicity score clearly indicates the presence of biphonation in videos of vibrating vocal folds, which has never before been accomplished quantitatively end to end.

Knowledge Gaps in the Early Growth of Semantic Feature Networks (2018)

Ann E. Sizemore, Elisabeth A. Karuza, Chad Giusti, Danielle S. Bassett

Abstract

Understanding language learning and more general knowledge acquisition requires the characterization of inherently qualitative structures. Recent work has applied network science to this task by creating semantic feature networks, in which words correspond to nodes and connections correspond to shared features, and then by characterizing the structure of strongly interrelated groups of words. However, the importance of sparse portions of the semantic network—knowledge gaps—remains unexplored. Using applied topology, we query the prevalence of knowledge gaps, which we propose manifest as cavities in the growing semantic feature network of toddlers. We detect topological cavities of multiple dimensions and find that, despite word order variation, the global organization remains similar. We also show that nodal network measures correlate with filling cavities better than basic lexical properties. Finally, we discuss the importance of semantic feature network topology in language learning and speculate that the progression through knowledge gaps may be a robust feature of knowledge acquisition.

Visualizing Nanoparticle Surface Dynamics and Instabilities Enabled by Deep Denoising (2025)

Peter A. Crozier, Matan Leibovich, Piyush Haluai, Mai Tan, Andrew M. Thomas, Joshua Vincent, Sreyas Mohan, Adria Marcos Morales, Shreyas A. Kulkarni, David S. Matteson, Yifan Wang, Carlos Fernandez-Granda

Abstract

Materials functionalities may be associated with atomic-level structural dynamics occurring on the millisecond timescale. However, the capability of electron microscopy to image structures with high spatial resolution and millisecond temporal resolution is often limited by poor signal-to-noise ratios. With an unsupervised deep denoising framework, we observed metal nanoparticle surfaces (platinum nanoparticles on cerium oxide) in a gas environment with time resolutions down to 10 milliseconds at a moderate electron dose. On this timescale, many nanoparticle surfaces continuously transition between ordered and disordered configurations. Stress fields can penetrate below the surface, leading to defect formation and destabilization, thus making the nanoparticle fluxional. Combining this unsupervised denoiser with in situ electron microscopy greatly improves spatiotemporal characterization, opening a new window for the exploration of atomic-level structural dynamics in materials.

A Novel Approach for Wafer Defect Pattern Classification Based on Topological Data Analysis (2023)

Seungchan Ko, Dowan Koo

Abstract

In semiconductor manufacturing, wafer map defect pattern provides critical information for facility maintenance and yield management, so the classification of defect patterns is one of the most important tasks in the manufacturing process. In this paper, we propose a novel way to represent the shape of the defect pattern as a finite-dimensional vector, which will be used as an input for a neural network algorithm for classification. The main idea is to extract the topological features of each pattern by using the theory of persistent homology from topological data analysis (TDA). Through some experiments with a simulated dataset, we show that the proposed method is faster and much more efficient in training with higher accuracy, compared with the method using convolutional neural networks (CNN) which is the most common approach for wafer map defect pattern classification. Moreover, it was shown that our method outperforms the CNN-based method when the number of training data is not enough and is imbalanced.

Alzheimer Disease Detection From Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning (2023)

Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D’Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini

Abstract

The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer’s disease (AD) as well as of 5 pathological controls was collected and analyzed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (\textgreater87%). Although our results are preliminary, they indicate that RS and topological analysis may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to investigate whether topological data analysis could support the characterization of AD subtypes.

Unsupervised Topological Learning Approach of Crystal Nucleation (2022)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unravelled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometre length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to monatomic metals, it shows that both translational and orientational ordering always come into play simultaneously as a result of the strong bonding when homogeneous nucleation starts in regions with low five-fold symmetry. It also reveals the specificity of the nucleation pathways depending on the element considered, with features beyond the hypothesis of Classical Nucleation Theory.

TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images (2021)

Mustafa Hajij, Ghada Zamzmi, Fawwaz Batayneh

Abstract

Topological Data Analysis (TDA) has emerged recently as a robust tool to extract and compare the structure of datasets. TDA identifies features in data (e.g., connected components and holes) and assigns a quantitative measure to these features. Several studies reported that topological features extracted by TDA tools provide unique information about the data, discover new insights, and determine which feature is more related to the outcome. On the other hand, the overwhelming success of deep neural networks in learning patterns and relationships has been proven on various data applications including images. To capture the characteristics of both worlds, we propose TDA-Net, a novel ensemble network that fuses topological and deep features for the purpose of enhancing model generalizability and accuracy. We apply the proposed TDA-Net to a critical application, which is the automated detection of COVID-19 from CXR images. Experimental results showed that the proposed network achieved excellent performance and suggested the applicability of our method in practice.

Confinement in Non-Abelian Lattice Gauge Theory via Persistent Homology (2022)

Daniel Spitz, Julian M. Urban, Jan M. Pawlowski

Abstract

We investigate the structure of confining and deconfining phases in SU(2) lattice gauge theory via persistent homology, which gives us access to the topology of a hierarchy of combinatorial objects constructed from given data. Specifically, we use filtrations by traced Polyakov loops, topological densities, holonomy Lie algebra fields, as well as electric and magnetic fields. This allows for a comprehensive picture of confinement. In particular, topological densities form spatial lumps which show signatures of the classical probability distribution of instanton-dyons. Signatures of well-separated dyons located at random positions are encoded in holonomy Lie algebra fields, following the semi-classical temperature dependence of the instanton appearance probability. Debye screening discriminating between electric and magnetic fields is visible in persistent homology and pronounced at large gauge coupling. All employed constructions are gauge-invariant without a priori assumptions on the configurations under study. This work showcases the versatility of persistent homology for statistical and quantum physics studies, barely explored to date.

Topology of Viral Evolution (2013)

Joseph Minhow Chan, Gunnar Carlsson, Raul Rabadan

Abstract

The tree structure is currently the accepted paradigm to represent evolutionary relationships between organisms, species or other taxa. However, horizontal, or reticulate, genomic exchanges are pervasive in nature and confound characterization of phylogenetic trees. Drawing from algebraic topology, we present a unique evolutionary framework that comprehensively captures both clonal and reticulate evolution. We show that whereas clonal evolution can be summarized as a tree, reticulate evolution exhibits nontrivial topology of dimension greater than zero. Our method effectively characterizes clonal evolution, reassortment, and recombination in RNA viruses. Beyond detecting reticulate evolution, we succinctly recapitulate the history of complex genetic exchanges involving more than two parental strains, such as the triple reassortment of H7N9 avian influenza and the formation of circulating HIV-1 recombinants. In addition, we identify recurrent, large-scale patterns of reticulate evolution, including frequent PB2-PB1-PA-NP cosegregation during avian influenza reassortment. Finally, we bound the rate of reticulate events (i.e., 20 reassortments per year in avian influenza). Our method provides an evolutionary perspective that not only captures reticulate events precluding phylogeny, but also indicates the evolutionary scales where phylogenetic inference could be accurate.

Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

John T. Nardini, Bernadette J. Stolz, Kevin B. Flores, Heather A. Harrington, Helen M. Byrne

Abstract

Angiogenesis is the process by which blood vessels form from pre-existing vessels. It plays a key role in many biological processes, including embryonic development and wound healing, and contributes to many diseases including cancer and rheumatoid arthritis. The structure of the resulting vessel networks determines their ability to deliver nutrients and remove waste products from biological tissues. Here we simulate the Anderson-Chaplain model of angiogenesis at different parameter values and quantify the vessel architectures of the resulting synthetic data. Specifically, we propose a topological data analysis (TDA) pipeline for systematic analysis of the model. TDA is a vibrant and relatively new field of computational mathematics for studying the shape of data. We compute topological and standard descriptors of model simulations generated by different parameter values. We show that TDA of model simulation data stratifies parameter space into regions with similar vessel morphology. The methodologies proposed here are widely applicable to other synthetic and experimental data including wound healing, development, and plant biology.

Identification of Key Features Using Topological Data Analysis for Accurate Prediction of Manufacturing System Outputs (2017)

Wei Guo, Ashis G. Banerjee

Abstract

Topological data analysis (TDA) has emerged as one of the most promising approaches to extract insights from high-dimensional data of varying types such as images, point clouds, and meshes, in an unsupervised manner. To the best of our knowledge, here, we provide the first successful application of TDA in the manufacturing systems domain. We apply a widely used TDA method, known as the Mapper algorithm, on two benchmark data sets for chemical process yield prediction and semiconductor wafer fault detection, respectively. The algorithm yields topological networks that capture the intrinsic clusters and connections among the clusters present in the data sets, which are difficult to detect using traditional methods. We select key process variables or features that impact the system outcomes by analyzing the network shapes. We then use predictive models to evaluate the impact of the selected features. Results show that the models achieve at least the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.

Capturing Dynamics of Time-Varying Data via Topology (2020)

Lu Xian, Henry Adams, Chad M. Topaz, Lori Ziegelmeier

Abstract

One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of time-varying metric spaces including vineyards [17], crocker plots [52], and multiparameter rank functions [34]. We then introduce a new tool to summarize time-varying metric spaces: a crocker stack. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable stability property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [54]. Altogether, we aim to bring the broader applied mathematics community up-to-date on topological summaries of time-varying metric spaces.

Statistical Inference for Persistent Homology Applied to Simulated fMRI Time Series Data (2023)

Hassan Abdallah, Adam Regalski, Mohammad Behzad Kang, Maria Berishaj, Nkechi Nnadi, Asadur Chowdury, Vaibhav A. Diwadkar, Andrew Salch

Abstract

Time-series data are amongst the most widely-used in biomedical sciences, including domains such as functional Magnetic Resonance Imaging (fMRI). Structure within time series data can be captured by the tools of topological data analysis (TDA). Persistent homology is the mostly commonly used data-analytic tool in TDA, and can effectively summarize complex high-dimensional data into an interpretable 2-dimensional representation called a persistence diagram. Existing methods for statistical inference for persistent homology of data depend on an independence assumption being satisfied. While persistent homology can be computed for each time index in a time-series, time-series data often fail to satisfy the independence assumption. This paper develops a statistical test that obviates the independence assumption by implementing a multi-level block sampled Monte Carlo test with sets of persistence diagrams. Its efficacy for detecting task-dependent topological organization is then demonstrated on simulated fMRI data. This new statistical test is therefore suitable for analyzing persistent homology of fMRI data, and of non-independent data in general.

Capturing Shape Information With Multi-Scale Topological Loss Terms For 3D Reconstruction (2022)

Dominik J. E. Waibel, Scott Atwell, Matthias Meier, Carsten Marr, Bastian Rieck

Abstract

Reconstructing 3D objects from 2D images is both challenging for our brains and machine learning algorithms. To support this spatial reasoning task, contextual information about the overall shape of an object is critical. However, such information is not captured by established loss terms (e.g. Dice loss). We propose to complement geometrical shape information by including multi-scale topological features, such as connected components, cycles, and voids, in the reconstruction loss. Our method uses cubical complexes to calculate topological features of 3D volume data and employs an optimal transport distance to guide the reconstruction process. This topology-aware loss is fully differentiable, computationally efficient, and can be added to any neural network. We demonstrate the utility of our loss by incorporating it into SHAPR, a model for predicting the 3D cell shape of individual cells based on 2D microscopy images. Using a hybrid loss that leverages both geometrical and topological information of single objects to assess their shape, we find that topological information substantially improves the quality of reconstructions, thus highlighting its ability to extract more relevant features from image datasets.

Histopathological Cancer Detection With Topological Signatures (2023)

Ankur Yadav, Faisal Ahmed, Ovidiu Daescu, Reyhan Gedik, Baris Coskunuzer

Abstract

We present a transformative approach to histopathological cancer detection and grading by introducing a very powerful feature extraction method based on the latest topological data analysis tools. By analyzing the evolution of topological patterns in different color channels, we discovered that every tumor class leaves its own topological footprint in histopathological images, allowing to extract feature vectors that can be used to reliably identify tumor classes.Our topological signatures, even when combined with traditional machine learning methods, provide very fast and highly accurate results in various settings. While most DL models work well for one type of cancer, our model easily adapts to different scenarios, and consistently gives highly competitive results with the state-of-the-art models on benchmark datasets across multiple cancer types including bone, colon, breast, cervical (cytopathology), and prostate cancer. Unlike most DL models, our proposed Topo-ML model does not need any data augmentation or pre-processing steps and works perfectly on small datasets. The model is computationally very efficient, with end-to-end processing taking only a few hours for datasets consisting of thousands of images.

Imaging-Based Representation and Stratification of Intra-Tumor Heterogeneity via Tree-Edit Distance (2022)

Lara Cavinato, Matteo Pegoraro, Alessandra Ragni, Francesca Ieva

Abstract

Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning.

A Functional Data-Driven Approach to Monitor and Analyze Equipment Degradation in Multiproduct Batch Processes (2023)

Joel Sansana, Ricardo Rendall, Mark N. Joswiak, Ivan Castillo, Gloria Miller, Leo H. Chiang, Marco S. Reis

Abstract

Equipment degradation is ubiquitous in the Chemical Process Industry (CPI), causing significant losses in efficiency, controllability, and plant economy, as well as an increased environmental fingerprint and additional operational safety risks. The case of fouling in heat exchangers, in particular, is well-known and pervasive but still hard to cope with, given the complexity of the underlying mechanisms and the difficulty of assessing its extension in real-time. This problem becomes even more complex in batch processes producing different products, where multiple recipes are used, bringing additional variability and new challenges to the analysis. In this work, we propose a functional data-driven approach for streamlining the analysis and monitoring of the progression of fouling taking place in heat exchangers in multiproduct batch processes. With the approach developed and presented in this paper, process analysis can be efficiently conducted by integrating historical data with engineering knowledge. Furthermore, a surrogate measure of fouling extension in heat exchangers is proposed, that can be readily implemented as an equipment health indicator (EHI) leading to a safer operation of the heat exchanger.

Vibration Sensors for Detecting Critical Events: A Case Study in Ferrosilicon Production (2024)

Maryna Waszak, Terje Moen, Anders H. Hansen, Grégory Bouquet, Antoine Pultier, Xiang Ma, Dumitru Roman

Abstract

The mining and metal processing industries are undergoing a transformation through digitization, with sensors and data analysis playing a crucial role in modernization and increased efficiency. Vibration sensors are particularly important in monitoring production infrastructure in metal processing plants. This paper presents the installation of vibration sensors in an actual industrial environment and the results of spectral vibration data analysis. The study demonstrates that vibration sensors can be installed in challenging environments such as metal processing plants and that analyzing vibration patterns can provide valuable insights into predicting machine failures and different machine states. By utilizing dimensionality reduction and dominant frequency observation, we analyzed vibration data and identified patterns that are indicative of potential machine states and critical events that reduce production throughput. This information can be used to improve maintenance, minimize downtime, and ultimately enhance the production process’s overall efficiency. This study highlights the importance of digitization and data analysis in the mining and metal processing industries, particularly the capability not only to predict critical events before they impact production throughput and take action accordingly but also to identify machine states for legacy equipment and be part of retrofitting strategies.

Data-Driven and Automatic Surface Texture Analysis Using Persistent Homology (2021)

Melih C. Yesilli, Firas A. Khasawneh

Abstract

Surface roughness plays an important role in analyzing engineering surfaces. It quantifies the surface topography and can be used to determine whether the resulting surface finish is acceptable or not. Nevertheless, while several existing tools and standards are available for computing surface roughness, these methods rely heavily on user input thus slowing down the analysis and increasing manufacturing costs. Therefore, fast and automatic determination of the roughness level is essential to avoid costs resulting from surfaces with unacceptable finish, and user-intensive analysis. In this study, we propose a Topological Data Analysis (TDA) based approach to classify the roughness level of synthetic surfaces using both their areal images and profiles. We utilize persistent homology from TDA to generate persistence diagrams that encapsulate information on the shape of the surface. We then obtain feature matrices for each surface or profile using Carlsson coordinates, persistence images, and template functions. We compare our results to two widely used methods in the literature: Fast Fourier Transform (FFT) and Gaussian filtering. The results show that our approach yields mean accuracies as high as 97%. We also show that, in contrast to existing surface analysis tools, our TDA-based approach is fully automatable and provides adaptive feature extraction.

Induction Motor Eccentricity Fault Detection and Quantification Using Topological Data Analysis (2024)

Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru

Abstract

In this paper, we propose a topological data analysis (TDA) method for the processing of induction motor stator current data, and apply it to the detection and quantification of eccentricity faults. Traditionally, physics-based models and involved signal processing techniques are required to identify and extract the subtle frequency components in current data related to a particular fault. We show that TDA offers an alternative way to extract fault related features, and effectively distinguish data from different fault conditions. We will introduce TDA method and the procedure of extracting topological features from time-domain data, and apply it to induction motor current data measured under different eccentricity fault conditions. We show that while the raw time-domain data are very challenging to distinguish, the extracted topological features from these data are distinct and highly associated with eccentricity fault level. With TDA processed data, we can effectively train machine learning models to predict fault levels with good accuracy, even for new data from eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of data-driven fault detection applications.

Stable Topological Summaries for Analyzing the Organization of Cells in a Packed Tissue (2021)

Nieves Atienza, Maria-Jose Jimenez, Manuel Soriano-Trigueros

Abstract

We use topological data analysis tools for studying the inner organization of cells in segmented images of epithelial tissues. More specifically, for each segmented image, we compute different persistence barcodes, which codify the lifetime of homology classes (persistent homology) along different filtrations (increasing nested sequences of simplicial complexes) that are built from the regions representing the cells in the tissue. We use a complete and well-grounded set of numerical variables over those persistence barcodes, also known as topological summaries. A novel combination of normalization methods for both the set of input segmented images and the produced barcodes allows for the proven stability results for those variables with respect to small changes in the input, as well as invariance to image scale. Our study provides new insights to this problem, such as a possible novel indicator for the development of the drosophila wing disc tissue or the importance of centroids’ distribution to differentiate some tissues from their CVT-path counterpart (a mathematical model of epithelia based on Voronoi diagrams). We also show how the use of topological summaries may improve the classification accuracy of epithelial images using a Random Forest algorithm.

Efficient Planning of Multi-Robot Collective Transport Using Graph Reinforcement Learning With Higher Order Topological Abstraction (2023)

Steve Paul, Wenyuan Li, Brian Smyth, Yuzhou Chen, Yulia Gel, Souma Chowdhury

Abstract

Efficient multi-robot task allocation (MRTA) is fundamental to various time-sensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call MRTA-collective transport or MRTA-CT - here tasks present varying workloads and deadlines, and robots are subject to flight range, communication range, and payload constraints. For large instances of these problems involving 100s-1000's of tasks and 10s-100s of robots, traditional non-learning solvers are often time-inefficient, and emerging learning-based policies do not scale well to larger-sized problems without costly retraining. To address this gap, we use a recently proposed encoder-decoder graph neural network involving Capsule networks and multi-head attention mechanism, and innovatively add topological descriptors (TD) as new features to improve transferability to unseen problems of similar and larger size. Persistent homology is used to derive the TD, and proximal policy optimization is used to train our TD-augmented graph neural network. The resulting policy model compares favorably to state-of-the-art non-learning baselines while being much faster. The benefit of using TD is readily evident when scaling to test problems of size larger than those used in training.

Exploring Surface Texture Quantification in Piezo Vibration Striking Treatment (PVST) Using Topological Measures (2022)

Melih C. Yesilli, Max M. Chumley, Jisheng Chen, Firas A. Khasawneh, Yang Guo

Abstract

Abstract. Surface texture influences wear and tribological properties of manufactured parts, and it plays a critical role in end-user products. Therefore, quantifying the order or structure of a manufactured surface provides important information on the quality and life expectancy of the product. Although texture can be intentionally introduced to enhance aesthetics or to satisfy a design function, sometimes it is an inevitable byproduct of surface treatment processes such as Piezo Vibration Striking Treatment (PVST). Measures of order for surfaces have been characterized using statistical, spectral, and geometric approaches. For nearly hexagonal lattices, topological tools have also been used to measure the surface order. This paper explores utilizing tools from Topological Data Analysis for measuring surface texture. We compute measures of order based on optical digital microscope images of surfaces treated using PVST. These measures are applied to the grid obtained from estimating the centers of tool impacts, and they quantify the grid’s deviations from the nominal one. Our results show that TDA provides a convenient framework for characterization of pattern type that bypasses some limitations of existing tools such as difficult manual processing of the data and the need for an expert user to analyze and interpret the surface images.

Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)

Rayna Andreeva, Anwesha Sarkar, Rik Sarkar

Abstract

The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics.

Evolutionary Homology on Coupled Dynamical Systems With Applications to Protein Flexibility Analysis (2020)

Zixuan Cang, Elizabeth Munch, Guo-Wei Wei

Abstract

While the spatial topological persistence is naturally constructed from a radius-based ﬁltration, it has hardly been derived from a temporal ﬁltration. Most topological models are designed for the global topology of a given object as a whole. There is no method reported in the literature for the topology of an individual component in an object to the best of our knowledge. For many problems in science and engineering, the topology of an individual component is important for describing its properties. We propose evolutionary homology (EH) constructed via a time evolution-based ﬁltration and topological persistence. Our approach couples a set of dynamical systems or chaotic oscillators by the interactions of a physical system, such as a macromolecule. The interactions are approximated by weighted graph Laplacians. Simplices, simplicial complexes, algebraic groups and topological persistence are deﬁned on the coupled trajectories of the chaotic oscillators. The resulting EH gives rise to time-dependent topological invariants or evolutionary barcodes for an individual component of the physical system, revealing its topology-function relationship. In conjunction with Wasserstein metrics, the proposed EH is applied to protein ﬂexibility analysis, an important problem in computational biophysics. Numerical results for the B-factor prediction of a benchmark set of 364 proteins indicate that the proposed EH outperforms all the other state-of-the-art methods in the ﬁeld.

A Topological Framework for Identifying Phenomenological Bifurcations in Stochastic Dynamical Systems (2024)

Sunia Tanweer, Firas A. Khasawneh, Elizabeth Munch, Joshua R. Tempelman

Abstract

Changes in the parameters of dynamical systems can cause the state of the system to shift between different qualitative regimes. These shifts, known as bifurcations, are critical to study as they can indicate when the system is about to undergo harmful changes in its behavior. In stochastic dynamical systems, there is particular interest in P-type (phenomenological) bifurcations, which can include transitions from a monostable state to multi-stable states, the appearance of stochastic limit cycles and other features in the probability density function (PDF) of the system’s state. Current practices are limited to systems with small state spaces, cannot detect all possible behaviors of the PDFs and mandate human intervention for visually identifying the change in the PDF. In contrast, this study presents a new approach based on Topological Data Analysis that uses superlevel persistence to mathematically quantify P-type bifurcations in stochastic systems through a “homological bifurcation plot”—which shows the changing ranks of 0th and 1st homology groups, through Betti vectors. Using these plots, we demonstrate the successful detection of P-bifurcations on the stochastic Duffing, Raleigh-Vander Pol and Quintic Oscillators given their analytical PDFs, and elaborate on how to generate an estimated homological bifurcation plot given a kernel density estimate (KDE) of these systems by employing a tool for finding topological consistency between PDFs and KDEs.

Signal Enrichment With Strain-Level Resolution in Metagenomes Using Topological Data Analysis (2019)

Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida

Abstract

Background A metagenome is a collection of genomes, usually in a micro-environment, and sequencing a metagenomic sample en masse is a powerful means for investigating the community of the constituent microorganisms. One of the challenges is in distinguishing between similar organisms due to rampant multiple possible assignments of sequencing reads, resulting in false positive identifications. We map the problem to a topological data analysis (TDA) framework that extracts information from the geometric structure of data. Here the structure is defined by multi-way relationships between the sequencing reads using a reference database. Results Based primarily on the patterns of co-mapping of the reads to multiple organisms in the reference database, we use two models: one a subcomplex of a Barycentric subdivision complex and the other a Čech complex. The Barycentric subcomplex allows a natural mapping of the reads along with their coverage of organisms while the Čech complex takes simply the number of reads into account to map the problem to homology computation. Using simulated genome mixtures we show not just enrichment of signal but also microbe identification with strain-level resolution. Conclusions In particular, in the most refractory of cases where alternative algorithms that exploit unique reads (i.e., mapped to unique organisms) fail, we show that the TDA approach continues to show consistent performance. The Čech model that uses less information is equally effective, suggesting that even partial information when augmented with the appropriate structure is quite powerful.

Time-Inhomogeneous Diffusion Geometry and Topology (2022)

Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy

Abstract

Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.

The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)

Salvador Chulián, Bernadette J. Stolz, Álvaro Martínez-Rubio, Cristina Blázquez Goñi, Juan F. Rodríguez Gutiérrez, Teresa Caballero Velázquez, Águeda Molinos Quintana, Manuel Ramírez Orellana, Ana Castillo Robleda, José Luis Fuster Soler, Alfredo Minguela Puras, María Victoria Martínez Sánchez, María Rosa, Víctor M. Pérez-García, Helen Byrne

Abstract

Acute Lymphoblastic Leukaemia (ALL) is the most frequent paediatric cancer. Modern therapies have improved survival rates, but approximately 15-20 % of patients relapse. At present, patients’ risk of relapse are assessed by projecting high-dimensional flow cytometry data onto a subset of biomarkers and manually estimating the shape of this reduced data. Here, we apply methods from topological data analysis (TDA), which quantify shape in data via features such as connected components and loops, to pre-treatment ALL datasets with known outcomes. We combine these fully unsupervised analyses with machine learning to identify features in the pre-treatment data that are prognostic for risk of relapse. We find significant topological differences between relapsing and non-relapsing patients and confirm the predictive power of CD10, CD20, CD38, and CD45. Further, we are able to use the TDA descriptors to predict patients who relapsed. We propose three prognostic pipelines that readily extend to other haematological malignancies. Teaser Topology reveals features in flow cytometry data which predict relapse of patients with acute lymphoblastic leukemia

Weighted-Persistent-Homology-Based Machine Learning for RNA Flexibility Analysis (2020)

Chi Seng Pun, Brandon Yung Sin Yong, Kelin Xia

Abstract

With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.

Raman Spectroscopy and Topological Machine Learning for Cancer Grading (2023)

Francesco Conti, Mario D’Acunto, Claudia Caudai, Sara Colantonio, Raffaele Gaeta, Davide Moroni, Maria Antonietta Pascali

Abstract

In the last decade, Raman Spectroscopy is establishing itself as a highly promising technique for the classification of tumour tissues as it allows to obtain the biochemical maps of the tissues under investigation, making it possible to observe changes among different tissues in terms of biochemical constituents (proteins, lipid structures, DNA, vitamins, and so on). In this paper, we aim to show that techniques emerging from the cross-fertilization of persistent homology and machine learning can support the classification of Raman spectra extracted from cancerous tissues for tumour grading. In more detail, topological features of Raman spectra and machine learning classifiers are trained in combination as an automatic classification pipeline in order to select the best-performing pair. The case study is the grading of chondrosarcoma in four classes: cross and leave-one-patient-out validations have been used to assess the classification accuracy of the method. The binary classification achieves a validation accuracy of 81% and a test accuracy of 90%. Moreover, the test dataset has been collected at a different time and with different equipment. Such results are achieved by a support vector classifier trained with the Betti Curve representation of the topological features extracted from the Raman spectra, and are excellent compared with the existing literature. The added value of such results is that the model for the prediction of the chondrosarcoma grading could easily be implemented in clinical practice, possibly integrated into the acquisition system.

Learning Representations of Persistence Barcodes (2019)

Christoph D. Hofer, Roland Kwitt, Marc Niethammer

Abstract

We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals.

A Data-Driven Workflow for Evaporation Performance Degradation Analysis: A Full-Scale Case Study in the Herbal Medicine Manufacturing Industry (2023)

Sheng Zhang, Xinyuan Xie, Haibin Qu

Abstract

The evaporation process is a common step in herbal medicine manufacturing and often lasts for a long time. The degradation of evaporation performance is inevitable, leading to more consumption of steam and electricity, and it may also have an impact on the content of thermosensitive components. Recently, a vast amount of evaporation process data is collected with the aid of industrial information systems, and process knowledge is hidden behind the data. But currently, these data are seldom deeply analyzed. In this work, an exploratory data analysis workflow is proposed to evaluate the evaporation performance and to identify the root causes of the performance degradation. The workflow consists of 6 steps: data collecting, preprocessing, characteristic stage identification, feature extraction, model development and interpretation, and decision making. In the model development and interpretation step, the workflow employs the HDBSCAN clustering algorithm for data annotation and then uses the ccPCA method to compare the differences between clusters for root cause analysis. A full-scale case is presented to verify the effectiveness of the workflow. The evaporation process data of 192 batches in 2018 were collected in the case. Through the steps of the workflow, the features of each batch were extracted, and the batches were clustered into 6 groups. The root causes of the performance degradation were determined as the high Pv,II and high LI by ccPCA. Recommended suggestions for future manufacturing were given according to the results. The proposed workflow can determine the root causes of the evaporation performance degradation.

Pattern Characterization Using Topological Data Analysis: Application to Piezo Vibration Striking Treatment (2023)

Max M. Chumley, Melih C. Yesilli, Jisheng Chen, Firas A. Khasawneh, Yang Guo

Abstract

Quantifying patterns in visual or tactile textures provides important information about the process or phenomena that generated these patterns. In manufacturing, these patterns can be intentionally introduced as a design feature, or they can be a byproduct of a specific process. Since surface texture has significant impact on the mechanical properties and the longevity of the workpiece, it is important to develop tools for quantifying surface patterns and, when applicable, comparing them to their nominal counterparts. While existing tools may be able to indicate the existence of a pattern, they typically do not provide more information about the pattern structure, or how much it deviates from a nominal pattern. Further, prior works do not provide automatic or algorithmic approaches for quantifying other pattern characteristics such as depths’ consistency, and variations in the pattern motifs at different level sets. This paper leverages persistent homology from Topological Data Analysis (TDA) to derive noise-robust scores for quantifying motifs’ depth and roundness in a pattern. Specifically, sublevel persistence is used to derive scores that quantify the consistency of indentation depths at any level set in Piezo Vibration Striking Treatment (PVST) surfaces. Moreover, we combine sublevel persistence with the distance transform to quantify the consistency of the indentation radii, and to compare them with the nominal ones. Although the tool in our PVST experiments had a semi-spherical profile, we present a generalization of our approach to tools/motifs of arbitrary shapes thus making our method applicable to other pattern-generating manufacturing processes.

Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

Monica Nicolau, Arnold J. Levine, Gunnar Carlsson

Abstract

High-throughput biological data, whether generated as sequencing, transcriptional microarrays, proteomic, or other means, continues to require analytic methods that address its high dimensional aspects. Because the computational part of data analysis ultimately identifies shape characteristics in the organization of data sets, the mathematics of shape recognition in high dimensions continues to be a crucial part of data analysis. This article introduces a method that extracts information from high-throughput microarray data and, by using topology, provides greater depth of information than current analytic techniques. The method, termed Progression Analysis of Disease (PAD), first identifies robust aspects of cluster analysis, then goes deeper to find a multitude of biologically meaningful shape characteristics in these data. Additionally, because PAD incorporates a visualization tool, it provides a simple picture or graph that can be used to further explore these data. Although PAD can be applied to a wide range of high-throughput data types, it is used here as an example to analyze breast cancer transcriptional data. This identified a unique subgroup of Estrogen Receptor-positive (ER+) breast cancers that express high levels of c-MYB and low levels of innate inflammatory genes. These patients exhibit 100% survival and no metastasis. No supervised step beyond distinction between tumor and healthy patients was used to identify this subtype. The group has a clear and distinct, statistically significant molecular signature, it highlights coherent biology but is invisible to cluster methods, and does not fit into the accepted classification of Luminal A/B, Normal-like subtypes of ER+ breast cancers. We denote the group as c-MYB+ breast cancer.

Topology Identifies Emerging Adaptive Mutations in SARS-CoV-2 (2021)

Michael Bleher, Lukas Hahn, Juan Angel Patino-Galindo, Mathieu Carriere, Ulrich Bauer, Raul Rabadan, Andreas Ott

Abstract

The COVID-19 pandemic has lead to a worldwide effort to characterize its evolution through the mapping of mutations in the genome of the coronavirus SARS-CoV-2. Ideally, one would like to quickly identify new mutations that could confer adaptive advantages (e.g. higher infectivity or immune evasion) by leveraging the large number of genomes. One way of identifying adaptive mutations is by looking at convergent mutations, mutations in the same genomic position that occur independently. However, the large number of currently available genomes precludes the efficient use of phylogeny-based techniques. Here, we establish a fast and scalable Topological Data Analysis approach for the early warning and surveillance of emerging adaptive mutations based on persistent homology. It identifies convergent events merely by their topological footprint and thus overcomes limitations of current phylogenetic inference techniques. This allows for an unbiased and rapid analysis of large viral datasets. We introduce a new topological measure for convergent evolution and apply it to the GISAID dataset as of February 2021, comprising 303,651 high-quality SARS-CoV-2 isolates collected since the beginning of the pandemic. We find that topologically salient mutations on the receptor-binding domain appear in several variants of concern and are linked with an increase in infectivity and immune escape, and for many adaptive mutations the topological signal precedes an increase in prevalence. We show that our method effectively identifies emerging adaptive mutations at an early stage. By localizing topological signals in the dataset, we extract geo-temporal information about the early occurrence of emerging adaptive mutations. The identification of these mutations can help to develop an alert system to monitor mutations of concern and guide experimentalists to focus the study of specific circulating variants.

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny

Abstract

While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.

Novel Production Prediction Model of Gasoline Production Processes for Energy Saving and Economic Increasing Based on AM-GRU Integrating the UMAP Algorithm (2023)

Jintao Liu, Liangchao Chen, Wei Xu, Mingfei Feng, Yongming Han, Tao Xia, Zhiqiang Geng

Abstract

Gasoline, as an extremely important petroleum product, is of great significance to ensure people's living standards and maintain national energy security. In the actual gasoline industrial production environment, the point information collected by industrial devices usually has the characteristics of high dimension, high noise and time series because of the instability of manual operation and equipment operation. Therefore, it is difficult to use the traditional method to predict and optimize gasoline production. In this paper, a novel production prediction model using an attention mechanism (AM) based gated recurrent unit (GRU) (AM-GRU) integrating the uniform manifold approximation and projection (UMAP) is proposed. The data collected in the industrial plant are processed by the box plot to remove the data outside the quartile. Then, the UMAP is used to remove the strong correlation between the data, which can improve the running speed and the performance of the AM-GRU. Compared with the existing time series data prediction method, the superiority of the AM-GRU is verified based on University of California Irvine (UCI) benchmark datasets. Finally, the production prediction model of actual complex gasoline production processes for energy saving and economic increasing based on the proposed method is built. The experiment results show that compared with other time series data prediction models, the proposed model has better stability and higher accuracy with reaching 0.4171, 0.9969, 0.2538 and 0.5038 in terms of the mean squared error, the average absolute accuracy, the mean squared error and the root mean square error. Moreover, according to the optimal scheme of the raw material, the inefficiency production points can be expected to increase about 0.69 tons of the gasoline yield and between about \$645.1 and \$925.6 of economic benefits of industrial production.

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

Hongxiao Wang, Gang Huang, Zhuo Zhao, Liang Cheng, Anna Juncker-Jensen, Máté Levente Nagy, Xin Lu, Xiangliang Zhang, Danny Z. Chen

Abstract

Pathology images contain rich information of cell appearance, microenvironment, and topology features for cancer analysis and diagnosis. Among such features, topology becomes increasingly important in analysis for cancer immunotherapy. By analyzing geometric and hierarchically structured cell distribution topology, oncologists can identify densely-packed and cancer-relevant cell communities (CCs) for making decisions. Compared to commonly-used pixel-level Convolution Neural Network (CNN) features and cell-instance-level Graph Neural Network (GNN) features, CC topology features are at a higher level of granularity and geometry. However, topological features have not been well exploited by recent deep learning (DL) methods for pathology image classification due to lack of effective topological descriptors for cell distribution and gathering patterns. In this paper, inspired by clinical practice, we analyze and classify pathology images by comprehensively learning cell appearance, microenvironment, and topology in a fine-to-coarse manner. To describe and exploit topology, we design Cell Community Forest (CCF), a novel graph that represents the hierarchical formulation process of big-sparse CCs from small-dense CCs. Using CCF as a new geometric topological descriptor of tumor cells in pathology images, we propose CCF-GNN, a GNN model that successively aggregates heterogeneous features (e.g., appearance, microenvironment) from cell-instance-level, cell-community-level, into image-level for pathology image classification. Extensive cross-validation experiments show that our method significantly outperforms alternative methods on H&E-stained; immunofluorescence images for disease grading tasks with multiple cancer types. Our proposed CCF-GNN establishes a new topological data analysis (TDA) based method, which facilitates integrating multi-level heterogeneous features of point clouds (e.g., for cells) into a unified DL framework.

Persistent Topology for Cryo-Em Data Analysis (2015)

Kelin Xia, Guo-Wei Wei

Abstract

SummaryIn this work, we introduce persistent homology for the analysis of cryo-electron microscopy (cryo-EM) density maps. We identify the topological fingerprint or topological signature of noise, which is widespread in cryo-EM data. For low signal-to-noise ratio (SNR) volumetric data, intrinsic topological features of biomolecular structures are indistinguishable from noise. To remove noise, we employ geometric flows that are found to preserve the intrinsic topological fingerprints of cryo-EM structures and diminish the topological signature of noise. In particular, persistent homology enables us to visualize the gradual separation of the topological fingerprints of cryo-EM structures from those of noise during the denoising process, which gives rise to a practical procedure for prescribing a noise threshold to extract cryo-EM structure information from noise contaminated data after certain iterations of the geometric flow equation. To further demonstrate the utility of persistent homology for cryo-EM data analysis, we consider a microtubule intermediate structure Electron Microscopy Data (EMD 1129). Three helix models, an alpha-tubulin monomer model, an alpha-tubulin and beta-tubulin model, and an alpha-tubulin and beta-tubulin dimer model, are constructed to fit the cryo-EM data. The least square fitting leads to similarly high correlation coefficients, which indicates that structure determination via optimization is an ill-posed inverse problem. However, these models have dramatically different topological fingerprints. Especially, linkages or connectivities that discriminate one model from another, play little role in the traditional density fitting or optimization but are very sensitive and crucial to topological fingerprints. The intrinsic topological features of the microtubule data are identified after topological denoising. By a comparison of the topological fingerprints of the original data and those of three models, we found that the third model is topologically favored. The present work offers persistent homology based new strategies for topological denoising and for resolving ill-posed inverse problems. Copyright © 2015 John Wiley & Sons, Ltd.

Continuous Indexing of Fibrosis (CIF): Improving the Assessment and Classification of MPN Patients (2022)

Hosuk Ryou, Korsuk Sirinukunwattana, Alan Aberdeen, Gillian Grindstaff, Bernadette Stolz, Helen Byrne, Heather A. Harrington, Nikolaos Sousos, Anna L. Godfrey, Claire N. Harrison, Bethan Psaila, Adam J. Mead, Gabrielle Rees, Gareth D. H. Turner, Jens Rittscher, Daniel Royston

Abstract

The detection and grading of fibrosis in myeloproliferative neoplasms (MPN) is an important component of disease classification, prognostication and disease monitoring. However, current fibrosis grading systems are only semi-quantitative and fail to capture sample heterogeneity. To improve the detection, quantitation and representation of reticulin fibrosis, we developed a machine learning (ML) approach using bone marrow trephine (BMT) samples (n = 107) from patients diagnosed with MPN or a reactive / nonneoplastic marrow. The resulting Continuous Indexing of Fibrosis (CIF) enhances the detection and monitoring of fibrosis within BMTs, and aids the discrimination of MPN subtypes. When combined with megakaryocyte feature analysis, CIF discriminates between the frequently challenging differential diagnosis of essential thrombocythemia (ET) and pre-fibrotic myelofibrosis (pre-PMF) with high predictive accuracy [area under the curve = 0.94]. CIF also shows significant promise in the identification of MPN patients at risk of disease progression; analysis of samples from 35 patients diagnosed with ET and enrolled in the Primary Thrombocythemia-1 (PT-1) trial identified features predictive of post-ET myelofibrosis (area under the curve = 0.77). In addition to these clinical applications, automated analysis of fibrosis has clear potential to further refine disease classification boundaries and inform future studies of the micro-environmental factors driving disease initiation and progression in MPN and other stem cell disorders. The image analysis methods used to generate CIF can be readily integrated with those of other key morphological features in MPNs, including megakaryocyte morphology, that lie beyond the scope of conventional histological assessment. Key PointsMachine learning enables an objective and quantitative description of reticulin fibrosis within the bone marrow of patients with myeloproliferative neoplasms (MPN),Automated analysis and Continuous Indexing of Fibrosis (CIF) captures heterogeneity within MPN samples and has utility in refined classification and disease monitoringQuantitative fibrosis assessment combined with topological data analysis may help to predict patients at increased risk of progression to post-ET myelofibrosis, and assist in the discrimination of ET and pre-fibrotic PMF (pre-PMF)

Transfer Learning for Autonomous Chatter Detection in Machining (2022)

Melih C. Yesilli, Firas A. Khasawneh, Brian P. Mann

Abstract

Large-amplitude chatter vibrations are one of the most important phenomena in machining processes. It is often detrimental in cutting operations causing a poor surface finish and decreased tool life. Therefore, chatter detection using machine learning has been an active research area over the last decade. Three challenges can be identified in applying machine learning for chatter detection at large in industry: an insufficient understanding of the universality of chatter features across different processes, the need for automating feature extraction, and the existence of limited data for each specific workpiece-machine tool combination, e.g., when machining one-off products. These three challenges can be grouped under the umbrella of transfer learning, which is concerned with studying how knowledge gained from one setting can be leveraged to obtain information in new settings. This paper studies automating chatter detection by evaluating transfer learning of prominent as well as novel chatter detection methods. We investigate chatter classification accuracy using a variety of features extracted from turning and milling experiments with different cutting configurations. The studied methods include Fast Fourier Transform (FFT), Power Spectral Density (PSD), the Auto-correlation Function (ACF), and decomposition based tools such as Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We also examine more recent approaches based on Topological Data Analysis (TDA) and similarity measures of time series based on Discrete Time Warping (DTW). We evaluate transfer learning potential of each approach by training and testing both within and across the turning and milling data sets. Four supervised classification algorithms are explored: support vector machine (SVM), logistic regression, random forest classification, and gradient boosting. In addition to accuracy, we also comment on the automation potential of feature extraction for each approach which is integral to creating autonomous manufacturing centers. Our results show that carefully chosen time-frequency features can lead to high classification accuracies albeit at the cost of requiring manual pre-processing and the tagging of an expert user. On the other hand, we found that the TDA and DTW approaches can provide accuracies and F1-scores on par with the time-frequency methods without the need for manual preprocessing via completely automatic pipelines. Further, we discovered that the DTW approach outperforms all other methods when trained using the milling data and tested on the turning data. Therefore, TDA and DTW approaches may be preferred over the time-frequency-based approaches for fully automated chatter detection schemes. DTW and TDA also can be more advantageous when pooling data from either limited workpiece-machine tool combinations, or from small data sets of one-off processes.

Unexpected Topology of the Temperature Fluctuations in the Cosmic Microwave Background (2019)

Pratyush Pranav, Robert J. Adler, Thomas Buchert, Herbert Edelsbrunner, Bernard J. T. Jones, Armin Schwartzman, Hubert Wagner, Rien van de Weygaert

Abstract

We study the topology generated by the temperature fluctuations of the cosmic microwave background (CMB) radiation, as quantified by the number of components and holes, formally given by the Betti numbers, in the growing excursion sets. We compare CMB maps observed by the Planck satellite with a thousand simulated maps generated according to the ΛCDM paradigm with Gaussian distributed fluctuations. The comparison is multi-scale, being performed on a sequence of degraded maps with mean pixel separation ranging from 0.05 to 7.33°. The survey of the CMB over 𝕊2 is incomplete due to obfuscation effects by bright point sources and other extended foreground objects like our own galaxy. To deal with such situations, where analysis in the presence of “masks” is of importance, we introduce the concept of relative homology. The parametric χ2-test shows differences between observations and simulations, yielding p-values at percent to less than permil levels roughly between 2 and 7°, with the difference in the number of components and holes peaking at more than 3σ sporadically at these scales. The highest observed deviation between the observations and simulations for b0 and b1 is approximately between 3σ and 4σ at scales of 3–7°. There are reports of mildly unusual behaviour of the Euler characteristic at 3.66° in the literature, computed from independent measurements of the CMB temperature fluctuations by Planck’s predecessor, the Wilkinson Microwave Anisotropy Probe (WMAP) satellite. The mildly anomalous behaviour of the Euler characteristic is phenomenologically related to the strongly anomalous behaviour of components and holes, or the zeroth and first Betti numbers, respectively. Further, since these topological descriptors show consistent anomalous behaviour over independent measurements of Planck and WMAP, instrumental and systematic errors may be an unlikely source. These are also the scales at which the observed maps exhibit low variance compared to the simulations, and approximately the range of scales at which the power spectrum exhibits a dip with respect to the theoretical model. Non-parametric tests show even stronger differences at almost all scales. Crucially, Gaussian simulations based on power-spectrum matching the characteristics of the observed dipped power spectrum are not able to resolve the anomaly. Understanding the origin of the anomalies in the CMB, whether cosmological in nature or arising due to late-time effects, is an extremely challenging task. Regardless, beyond the trivial possibility that this may still be a manifestation of an extreme Gaussian case, these observations, along with the super-horizon scales involved, may motivate the study of primordial non-Gaussianity. Alternative scenarios worth exploring may be models with non-trivial topology, including topological defect models.

🍩 Database of Original & Non-Theoretical Uses of Topology

Modeling the Spread of the Zika Virus Using Topological Data Analysis (2018)

Acridine Derivatives as Inhibitors of the IRE1α–XBP1 Pathway Are Cytotoxic to Human Multiple Myeloma (2016)

Biochemical Association of Metabolic Profile and Microbiome in Chronic Pressure Ulcer Wounds (2015)

Persistent Homology for the Quantitative Evaluation of Architectural Features in Prostate Cancer Histology (2019)

Protein Classification With Improved Topological Data Analysis (2018)

Topological Feature Tracking for Submesoscale Eddies (2022)

High-Throughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: Application to Zeolites (2018)

Alteration in the Local and Global Functional Connectivity of Resting State Networks in Parkinson’s Disease (2018)

Non-Empirical Identification of Trigger Sites in Image Data Using Persistent Homology: Crack Formation During Heterogeneous Reduction of Iron-Ore Sinters (2018)

Reconstructing Linearly Embedded Graphs: A First Step to Stratified Space Learning (2021)

Community Resources

Multiscale Topology Characterizes Dynamic Tumor Vascular Networks (2022)

A New Approach to Investigate the Association Between Brain Functional Connectivity and Disease Characteristics of Attention-Deficit/Hyperactivity Disorder: Topological Neuroimaging Data Analysis (2015)

Deep Learning With Topological Signatures (2017)

Topological Data Analysis of Task-Based fMRI Data From Experiments on Schizophrenia (2021)

Interdisciplinary Approaches to Automated Obstructive Sleep Apnea Diagnosis Through High-Dimensional Multiple Scaled Data Analysis (2019)

Skyler (2023)

Homological Analysis of Multi-Qubit Entanglement (2018)

Sheaves Are the Canonical Data Structure for Sensor Integration (2017)

Graph Filtration Learning (2020)

Topological Autoencoders (2020)

Coordinate-Free Coverage in Sensor Networks With Controlled Boundaries via Homology (2006)

Investigation of Flash Crash via Topological Data Analysis (2020)

A Persistent Weisfeiler-Lehman Procedure for Graph Classification (2019)

Topologically Densified Distributions (2020)

Topological Graph Neural Networks (2021)

Unsupervised Topological Learning for Identification of Atomic Structures (2022)

Topological Singularity Detection at Multiple Scales (2023)

Optimal Topological Cycles and Their Application in Cardiac Trabeculae Restoration (2017)

Topological Biomarkers for Real-Time Detection of Epileptic Seizures (2022)

Multiphase Mixing Quantification by Computational Homology and Imaging Analysis (2011)

Decoding of Neural Data Using Cohomological Feature Extraction (2019)

Lung Topology Characteristics in Patients With Chronic Obstructive Pulmonary Disease (2018)

Unsupervised Topological Learning Approach of Crystal Nucleation in Pure Tantalum (2021)

Finding Universal Structures in Quantum Many-Body Dynamics via Persistent Homology (2020)

Statistical Topological Data Analysis - A Kernel Perspective (2015)

Topological Analysis of Population Activity in Visual Cortex (2008)

Topological Attention for Time Series Forecasting (2021)

Topological Data Analysis for Electric Motor Eccentricity Fault Detection (2022)

(Quasi)Periodicity Quantification in Video Data, Using Topology (2018)

Knowledge Gaps in the Early Growth of Semantic Feature Networks (2018)

Visualizing Nanoparticle Surface Dynamics and Instabilities Enabled by Deep Denoising (2025)

A Novel Approach for Wafer Defect Pattern Classification Based on Topological Data Analysis (2023)

Alzheimer Disease Detection From Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning (2023)

Unsupervised Topological Learning Approach of Crystal Nucleation (2022)

TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images (2021)

Confinement in Non-Abelian Lattice Gauge Theory via Persistent Homology (2022)

Topology of Viral Evolution (2013)

Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

Identification of Key Features Using Topological Data Analysis for Accurate Prediction of Manufacturing System Outputs (2017)

Capturing Dynamics of Time-Varying Data via Topology (2020)

Statistical Inference for Persistent Homology Applied to Simulated fMRI Time Series Data (2023)

Capturing Shape Information With Multi-Scale Topological Loss Terms For 3D Reconstruction (2022)

Histopathological Cancer Detection With Topological Signatures (2023)

Imaging-Based Representation and Stratification of Intra-Tumor Heterogeneity via Tree-Edit Distance (2022)

A Functional Data-Driven Approach to Monitor and Analyze Equipment Degradation in Multiproduct Batch Processes (2023)

Vibration Sensors for Detecting Critical Events: A Case Study in Ferrosilicon Production (2024)

Data-Driven and Automatic Surface Texture Analysis Using Persistent Homology (2021)

Induction Motor Eccentricity Fault Detection and Quantification Using Topological Data Analysis (2024)

Stable Topological Summaries for Analyzing the Organization of Cells in a Packed Tissue (2021)

Efficient Planning of Multi-Robot Collective Transport Using Graph Reinforcement Learning With Higher Order Topological Abstraction (2023)

Exploring Surface Texture Quantification in Piezo Vibration Striking Treatment (PVST) Using Topological Measures (2022)

Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)

Evolutionary Homology on Coupled Dynamical Systems With Applications to Protein Flexibility Analysis (2020)

A Topological Framework for Identifying Phenomenological Bifurcations in Stochastic Dynamical Systems (2024)

Signal Enrichment With Strain-Level Resolution in Metagenomes Using Topological Data Analysis (2019)

Time-Inhomogeneous Diffusion Geometry and Topology (2022)

The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)

Weighted-Persistent-Homology-Based Machine Learning for RNA Flexibility Analysis (2020)

Raman Spectroscopy and Topological Machine Learning for Cancer Grading (2023)

Learning Representations of Persistence Barcodes (2019)

A Data-Driven Workflow for Evaporation Performance Degradation Analysis: A Full-Scale Case Study in the Herbal Medicine Manufacturing Industry (2023)

Pattern Characterization Using Topological Data Analysis: Application to Piezo Vibration Striking Treatment (2023)

Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

Topology Identifies Emerging Adaptive Mutations in SARS-CoV-2 (2021)

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

Novel Production Prediction Model of Gasoline Production Processes for Energy Saving and Economic Increasing Based on AM-GRU Integrating the UMAP Algorithm (2023)

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

Persistent Topology for Cryo-Em Data Analysis (2015)