Toward Automated Prediction of Manufacturing Productivity Based on Feature Selection Using Topological Data Analysis (2016)

Abstract

In this paper, we extend the application of topological data analysis (TDA) to the field of manufacturing for the first time to the best of our knowledge. We apply a particular TDA method, known as the Mapper algorithm, on a benchmark chemical processing data set. The algorithm yields a topological network that captures the intrinsic clusters and connections among the clusters present in the high-dimensional data set, which are difficult to detect using traditional methods. We select key process variables or features that impact the final product yield by analyzing the shape of this network. We then use three prediction models to evaluate the impact of the selected features. Results show that the models achieve the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.

An Industry Case of Large-Scale Demand Forecasting of Hierarchical Components (2019)

Rodrigo Rivera-Castro, Ivan Nazarov, Yuke Xiang, Ivan Maksimov, Aleksandr Pletnev, Evgeny Burnaev

Abstract

Demand forecasting of hierarchical components is essential in manufacturing. However, its discussion in the machine-learning literature has been limited, and judgemental forecasts remain pervasive in the industry. Demand planners require easy-to-understand tools capable of delivering state-of-the-art results. This work presents an industry case of demand forecasting at one of the largest manufacturers of electronics in the world. It seeks to support practitioners with five contributions: (1) A benchmark of fourteen demand forecast methods applied to a relevant data set, (2) A data transformation technique yielding comparable results with state of the art, (3) An alternative to ARIMA based on matrix factorization, (4) A model selection technique based on topological data analysis for time series and (5) A novel data set. Organizations seeking to up-skill existing personnel and increase forecast accuracy will find value in this work.

Using Topological Data Analysis for Text Classification (2018)

Pratik Doshi

Community Resources

Code

Export citation

A Topological Data Analysis Based Classification Method for Multiple Measurements (2019)

Henri Riihimäki, Wojciech Chachólski, Jakob Theorell, Jan Hillert, Ryan Ramanujam

Abstract

\textlessh3\textgreaterAbstract\textless/h3\textgreater \textlessh3\textgreaterBackground\textless/h3\textgreater \textlessp\textgreaterMachine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values.\textless/p\textgreater\textlessh3\textgreaterResults\textless/h3\textgreater \textlessp\textgreaterFor 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model.\textless/p\textgreater\textlessh3\textgreaterConclusions\textless/h3\textgreater \textlessp\textgreaterThis algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.\textless/p\textgreater

Fibers of Failure: Classifying Errors in Predictive Processes (2020)

Leo S. Carlsson, Mikael Vejdemo-Johansson, Gunnar Carlsson, Pär G. Jönsson

Abstract

Predictive models are used in many different fields of science and engineering and are always prone to make faulty predictions. These faulty predictions can be more or less malignant depending on the model application. We describe fibers of failure (FiFa), a method to classify failure modes of predictive processes. Our method uses Mapper, an algorithm from topological data analysis (TDA), to build a graphical model of input data stratified by prediction errors. We demonstrate two ways to use the failure mode groupings: either to produce a correction layer that adjusts predictions by similarity to the failure modes; or to inspect members of the failure modes to illustrate and investigate what characterizes each failure mode. We demonstrate FiFa on two scenarios: a convolutional neural network (CNN) predicting MNIST images with added noise, and an artificial neural network (ANN) predicting the electrical energy consumption of an electric arc furnace (EAF). The correction layer on the CNN model improved its prediction accuracy significantly while the inspection of failure modes for the EAF model provided guiding insights into the domain-specific reasons behind several high-error regions.

A Topological Framework for Deep Learning (2020)

Mustafa Hajij, Kyle Istvan

Abstract

We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective.

Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

Mehmet Emin Aktas, Esra Akbas

Abstract

Due to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach.

Steinhaus Filtration and Stable Paths in the Mapper (2020)

Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel Saul

Abstract

Two central concepts from topological data analysis are persistence and the Mapper construction. Persistence employs a sequence of objects built on data called a filtration. A Mapper produces insightful summaries of data, and has found widespread applications in diverse areas. We define a new filtration called the cover filtration built from a single cover based on a generalized Steinhaus distance, which is a generalization of Jaccard distance. We prove a stability result: the cover filtrations of two covers are \$\alpha/m\$ interleaved, where \$\alpha\$ is a bound on bottleneck distance between covers and \$m\$ is the size of smallest set in either cover. We also show our construction is equivalent to the Cech filtration under certain settings, and the Vietoris-Rips filtration completely determines the cover filtration in all cases. We then develop a theory for stable paths within this filtration. Unlike standard results on stability in topological persistence, our definition of path stability aligns exactly with the above result on stability of cover filtration. We demonstrate how our framework can be employed in a variety of applications where a metric is not obvious but a cover is readily available. First we present a new model for recommendation systems using cover filtration. For an explicit example, stable paths identified on a movies data set represent sequences of movies constituting gentle transitions from one genre to another. As a second application in explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations. Stable paths in the Mapper from a supervised machine learning model trained on the FashionMNIST data set provide improved explanations of relationships between subpopulations of images.

Topological Machine Learning With Persistence Indicator Functions (2019)

Bastian Rieck, Filip Sadlo, Heike Leitte

Abstract

Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distance-based data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernel-based similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data.

Topological Graph Neural Networks (2021)

Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt

Abstract

Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data.

Topologically Densified Distributions (2020)

Christoph Hofer, Florian Graf, Marc Niethammer, Roland Kwitt

Abstract

We study regularization in the context of small sample-size learning with over-parametrized neural networks. Specifically, we shift focus from architectural properties, such as norms on the network weights, to properties of the internal representations before a linear classifier. Specifically, we impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances, i.e., a property beneficial for generalization. By leveraging previous work to impose topological constrains in a neural network setting, we provide empirical evidence (across various vision benchmarks) to support our claim for better generalization.

Unsupervised Topological Learning for Identification of Atomic Structures (2022)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

We propose an unsupervised learning methodology with descriptors based on topological data analysis (TDA) concepts to describe the local structural properties of materials at the atomic scale. Based only on atomic positions and without a priori knowledge, our method allows for an autonomous identification of clusters of atomic structures through a Gaussian mixture model. We apply successfully this approach to the analysis of elemental Zr in the crystalline and liquid states as well as homogeneous nucleation events under deep undercooling conditions. This opens the way to deeper and autonomous study of complex phenomena in materials at the atomic scale.

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

Hongxiao Wang, Gang Huang, Zhuo Zhao, Liang Cheng, Anna Juncker-Jensen, Máté Levente Nagy, Xin Lu, Xiangliang Zhang, Danny Z. Chen

Abstract

Pathology images contain rich information of cell appearance, microenvironment, and topology features for cancer analysis and diagnosis. Among such features, topology becomes increasingly important in analysis for cancer immunotherapy. By analyzing geometric and hierarchically structured cell distribution topology, oncologists can identify densely-packed and cancer-relevant cell communities (CCs) for making decisions. Compared to commonly-used pixel-level Convolution Neural Network (CNN) features and cell-instance-level Graph Neural Network (GNN) features, CC topology features are at a higher level of granularity and geometry. However, topological features have not been well exploited by recent deep learning (DL) methods for pathology image classification due to lack of effective topological descriptors for cell distribution and gathering patterns. In this paper, inspired by clinical practice, we analyze and classify pathology images by comprehensively learning cell appearance, microenvironment, and topology in a fine-to-coarse manner. To describe and exploit topology, we design Cell Community Forest (CCF), a novel graph that represents the hierarchical formulation process of big-sparse CCs from small-dense CCs. Using CCF as a new geometric topological descriptor of tumor cells in pathology images, we propose CCF-GNN, a GNN model that successively aggregates heterogeneous features (e.g., appearance, microenvironment) from cell-instance-level, cell-community-level, into image-level for pathology image classification. Extensive cross-validation experiments show that our method significantly outperforms alternative methods on H&E-stained; immunofluorescence images for disease grading tasks with multiple cancer types. Our proposed CCF-GNN establishes a new topological data analysis (TDA) based method, which facilitates integrating multi-level heterogeneous features of point clouds (e.g., for cells) into a unified DL framework.

A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science (2023)

Lander Ver Hoef, Henry Adams, Emily J. King, Imme Ebert-Uphoff

Abstract

Abstract Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code. Significance Statement Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.

Acute Lymphoblastic Leukemia Classification Using Persistent Homology (2024)

Waqar Hussain Shah, Abdullah Baloch, Rider Jaimes-Reátegui, Sohail Iqbal, Syeda Rafia Fatima, Alexander N. Pisarchik

Abstract

Acute Lymphoblastic Leukemia (ALL) is a prevalent form of childhood blood cancer characterized by the proliferation of immature white blood cells that rapidly replace normal cells in the bone marrow. The exponential growth of these leukemic cells can be fatal if not treated promptly. Classifying lymphoblasts and healthy cells poses a significant challenge, even for domain experts, due to their morphological similarities. Automated computer analysis of ALL can provide substantial support in this domain and potentially save numerous lives. In this paper, we propose a novel classification approach that involves analyzing shapes and extracting topological features of ALL cells. We employ persistent homology to capture these topological features. Our technique accurately and efficiently detects and classifies leukemia blast cells, achieving a recall of 98.2% and an F1-score of 94.6%. This approach has the potential to significantly enhance leukemia diagnosis and therapy.

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)

Nicole Bussola, Bruno Papa, Ombretta Melaiu, Aurora Castellano, Doriana Fruci, Giuseppe Jurman

Abstract

We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the U-Net model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform.

Differentiable Euler Characteristic Transforms for Shape Classification (2023)

Ernst Röell, Bastian Rieck

Abstract

The _Euler Characteristic Transform_ (ECT) is a powerful invariant, combining geometrical and topological characteristics of shapes and graphs. However, the ECT was hitherto unable to learn task-specific representations. We overcome this issue and develop a novel computational layer that enables learning the ECT in an end-to-end fashion. Our method, the _Differentiable Euler Characteristic Transform_ (DECT) is fast and computationally efficient, while exhibiting performance on a par with more complex models in both graph and point cloud classification tasks. Moreover, we show that this seemingly simple statistic provides the same topological expressivity as more complex topological deep learning layers.

Protein Classification With Improved Topological Data Analysis (2018)

Tamal K. Dey, Sayan Mandal

Deep Learning With Topological Signatures (2017)

Christoph Hofer, Roland Kwitt, Marc Niethammer, Andreas Uhl

Curvature Filtrations for Graph Generative Model Evaluation (2023)

Joshua Southern, Jeremy Wayland, Michael Bronstein, Bastian Rieck

Topology-Informed Machine Learning for Efficient Prediction of Solid Oxide Fuel Cell Electrode Polarization (2025)

Maksym Szemer, Szymon Buchaniec, Tomasz Prokop, Grzegorz Brus

Abstract

Machine learning has emerged as a potent computational tool for expediting research and development in solid oxide fuel cell electrodes. The effective application of machine learning for performance prediction requires transforming electrode microstructure into a format compatible with artificial neural networks. Input data may range from a comprehensive digital material representation of the electrode to a selected set of microstructural parameters. The chosen representation significantly influences the performance and results of the network. Here, we show a novel approach utilizing persistence representation derived from computational topology. Using 500 microstructures and current–voltage characteristics obtained with three-dimensional first-principles simulations, we have prepared an artificial neural network model that can replicate current–voltage characteristics of unseen microstructures based on their persistent image representation. The artificial neural network can accurately predict the polarization curve of solid oxide fuel cell electrodes. The presented method incorporates complex microstructural information from the digital material representation while requiring substantially less computational resources (preprocessing and prediction time ≈1min) compared to our high-fidelity simulations (simulation time ≈1h) to obtain a single current-potential characteristic for one microstructure.

Community Resources

Code

Simplicial Representation Learning With Neural \$K\$-Forms (2023)

Kelly Maggs, Celia Hacker, Bastian Rieck

Abstract

Geometric deep learning extends deep learning to incorporate information about the geometry and topology data, especially in complex domains like graphs. Despite the popularity of message passing in this field, it has limitations such as the need for graph rewiring, ambiguity in interpreting data, and over-smoothing. In this paper, we take a different approach, focusing on leveraging geometric information from simplicial complexes embedded in \$\mathbb\R\\textasciicircumn\$ using node coordinates. We use differential \$k\$-forms in \$\mathbb\R\\textasciicircumn\$ to create representations of simplices, offering interpretability and geometric consistency without message passing. This approach also enables us to apply differential geometry tools and achieve universal approximation. Our method is efficient, versatile, and applicable to various input complexes, including graphs, simplicial complexes, and cell complexes. It outperforms existing message passing neural networks in harnessing information from geometrical graphs with node features serving as coordinates.

A Novel Multi-Task Machine Learning Classifier for Rare Disease Patterning Using Cardiac Strain Imaging Data (2024)

Nanda K. Siva, Yashbir Singh, Quincy A. Hathaway, Partho P. Sengupta, Naveena Yanamala

Abstract

To provide accurate predictions, current machine learning-based solutions require large, manually labeled training datasets. We implement persistent homology (PH), a topological tool for studying the pattern of data, to analyze echocardiography-based strain data and differentiate between rare diseases like constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM). Patient population (retrospectively registered) included those presenting with heart failure due to CP (n = 51), RCM (n = 47), and patients without heart failure symptoms (n = 53). Longitudinal, radial, and circumferential strains/strain rates for left ventricular segments were processed into topological feature vectors using Machine learning PH workflow. In differentiating CP and RCM, the PH workflow model had a ROC AUC of 0.94 (Sensitivity = 92%, Specificity = 81%), compared with the GLS model AUC of 0.69 (Sensitivity = 65%, Specificity = 66%). In differentiating between all three conditions, the PH workflow model had an AUC of 0.83 (Sensitivity = 68%, Specificity = 84%), compared with the GLS model AUC of 0.68 (Sensitivity = 52% and Specificity = 76%). By employing persistent homology to differentiate the “pattern” of cardiac deformations, our machine-learning approach provides reasonable accuracy when evaluating small datasets and aids in understanding and visualizing patterns of cardiac imaging data in clinically challenging disease states.

Community Resources

Code
Code

Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

Zixuan Cang, Lin Mu, Guo-Wei Wei

Abstract

This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.

Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

Paolo Bajardi, Matteo Delfino, André Panisson, Giovanni Petri, Michele Tizzoni

Abstract

We analyse a large mobile phone activity dataset provided by Telecom Italia for the Telecom Big Data Challenge contest. The dataset reports the international country codes of every call/SMS made and received by mobile phone users in Milan, Italy, between November and December 2013, with a spatial resolution of about 200 meters. We first show that the observed spatial distribution of international codes well matches the distribution of international communities reported by official statistics, confirming the value of mobile phone data for demographic research. Next, we define an entropy function to measure the heterogeneity of the international phone activity in space and time. By comparing the entropy function to empirical data, we show that it can be used to identify the city’s hotspots, defined by the presence of points of interests. Eventually, we use the entropy function to characterize the spatial distribution of international communities in the city. Adopting a topological data analysis approach, we find that international mobile phone users exhibit some robust clustering patterns that correlate with basic socio-economic variables. Our results suggest that mobile phone records can be used in conjunction with topological data analysis tools to study the geography of migrant communities in a global city.

Community Resources

Code

Filtration Surfaces for Dynamic Graph Classification (2023)

Franz Srambical, Bastian Rieck

Abstract

Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation among similarly scalable methods.

Topological Singularity Detection at Multiple Scales (2023)

Julius von Rohrscheidt, Bastian Rieck

Abstract

The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that real-world data exhibits distinct non-manifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data.

Mapping the Multiverse of Latent Representations (2024)

Jeremy Wayland, Corinna Coupette, Bastian Rieck

Abstract

Echoing recent calls to counter reliability and robustness concerns in machine learning via multiverse analysis, we present PRESTO, a principled framework for mapping the multiverse of machine-learning models that rely on latent representations. Although such models enjoy widespread adoption, the variability in their embeddings remains poorly understood, resulting in unnecessary complexity and untrustworthy representations. Our framework uses persistent homology to characterize the latent spaces arising from different combinations of diverse machine-learning methods, (hyper)parameter configurations, and datasets, allowing us to measure their pairwise (dis)similarity and statistically reason about their distributions. As we demonstrate both theoretically and empirically, our pipeline preserves desirable properties of collections of latent representations, and it can be leveraged to perform sensitivity analysis, detect anomalous embeddings, or efficiently and effectively navigate hyperparameter search spaces.

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny

Abstract

While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.

Position: Topological Deep Learning Is the New Frontier for Relational Learning (2024)

Theodore Papamarkou, Tolga Birdal, Michael M. Bronstein, Gunnar E. Carlsson, Justin Curry, Yue Gao, Mustafa Hajij, Roland Kwitt, Pietro Lio, Paolo Di Lorenzo, Vasileios Maroulas, Nina Miolane, Farzana Nasrin, Karthikeyan Natesan Ramamurthy, Bastian Rieck, Simone Scardapane, Michael T. Schaub, Petar Veličković, Bei Wang, Yusu Wang, Guowei Wei, Ghada Zamzmi

Abstract

Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.

Topological Data Analysis for Electric Motor Eccentricity Fault Detection (2022)

Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto Kanemaru

Abstract

In this paper, we develop topological data analysis (TDA) method for motor current signature analysis (MCSA), and apply it to induction motor eccentricity fault detection. We introduce TDA and present the procedure of extracting topological features from time-domain data that will be represented using persistence diagrams and vectorized Betti sequences. The procedure is applied to induction machine phase current signal analysis, and shown to be highly effective in differentiating signals from different eccentricity levels. With TDA, we are able to use a simple regression model that can predict the fault levels with reasonable accuracy, even for the data of eccentricity levels that are not seen in the training data. The proposed method is model-free, and only requires a small segment of time-domain data to make prediction. These advantages make it attractive for a wide range of fault detection applications.

Topological Feature Vectors for Chatter Detection in Turning Processes (2019)

Melih C. Yesilli, Firas A. Khasawneh, Andreas Otto

Export citation

Identification of Coenzyme-Binding Proteins With Machine Learning Algorithms (2019)

Yong Liu, Cristian R. Munteanu, Zhiwei Kong, Tao Ran, Alfredo Sahagún-Ruiz, Zhixiong He, Chuanshe Zhou, Zhiliang Tan

Export citation

Persistent Homology for the Automatic Classification of Prostate Cancer Aggressiveness in Histopathology Images (2019)

Peter Lawson, Jordan Schupbach, Brittany Terese Fasy, John W. Sheppard

Export citation

Cell Complex Neural Networks (2020)

Mustafa Hajij, Kyle Istvan, Ghada Zamzami

Abstract

Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. We propose a general, combinatorial, and unifying construction for performing neural network-type computations on cell complexes. Furthermore, we introduce inter-cellular message passing schemes, message passing schemes on cell complexes that take the topology of the underlying space into account. In particular, our method generalizes many of the most popular types of graph neural networks.

Filtration Curves for Graph Representation (2021)

Leslie O'Bray, Bastian Rieck, Karsten Borgwardt

Abstract

The two predominant approaches to graph comparison in recent years are based on (i) enumerating matching subgraphs or (ii) comparing neighborhoods of nodes. In this work, we complement these two perspectives with a third way of representing graphs: using filtration curves from topological data analysis that capture both edge weight information and global graph structure. Filtration curves are highly efficient to compute and lead to expressive representations of graphs, which we demonstrate on graph classification benchmark datasets. Our work opens the door to a new form of graph representation in data mining.

The Classification of Endoscopy Images With Persistent Homology (2016)

Olga Dunaeva, Herbert Edelsbrunner, Anton Lukyanov, Michael Machin, Daria Malkova, Roman Kuvaev, Sergey Kashin

Abstract

Aiming at the automatic diagnosis of tumors using narrow band imaging (NBI) magnifying endoscopic (ME) images of the stomach, we combine methods from image processing, topology, geometry, and machine learning to classify patterns into three classes: oval, tubular and irregular. Training the algorithm on a small number of images of each type, we achieve a high rate of correct classifications. The analysis of the learning algorithm reveals that a handful of geometric and topological features are responsible for the overwhelming majority of decisions.

Interpretable Phase Detection and Classification With Persistent Homology (2020)

Alex Cole, Gregory J. Loges, Gary Shiu

Abstract

We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for conducting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortex-antivortex structure are identified as relevant features for characterizing phase transitions.

Graph Filtration Learning (2020)

Christoph Hofer, Florian Graf, Bastian Rieck, Marc Niethammer, Roland Kwitt

Abstract

We propose an approach to learning with graph-structured data in the problem domain of graph classification. In particular, we present a novel type of readout operation to aggregate node features into a graph-level representation. To this end, we leverage persistent homology computed via a real-valued, learnable, filter function. We establish the theoretical foundation for differentiating through the persistent homology computation. Empirically, we show that this type of readout operation compares favorably to previous techniques, especially when the graph connectivity structure is informative for the learning problem.

Simplicial Neural Networks (2020)

Stefania Ebli, Michaël Defferrard, Gard Spreemann

Abstract

We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices - allowing us to consider richer data, including vector fields and \$n\$-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes.

Topological Autoencoders (2020)

Michael Moor, Max Horn, Bastian Rieck, Karsten Borgwardt

Abstract

We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multi-scale connectivity information. We show that our approach is theoretically well-founded and that it exhibits favourable latent representations on a synthetic manifold as well as on real-world image data sets, while preserving low reconstruction errors.

Data-Driven and Automatic Surface Texture Analysis Using Persistent Homology (2021)

Melih C. Yesilli, Firas A. Khasawneh

Abstract

Surface roughness plays an important role in analyzing engineering surfaces. It quantifies the surface topography and can be used to determine whether the resulting surface finish is acceptable or not. Nevertheless, while several existing tools and standards are available for computing surface roughness, these methods rely heavily on user input thus slowing down the analysis and increasing manufacturing costs. Therefore, fast and automatic determination of the roughness level is essential to avoid costs resulting from surfaces with unacceptable finish, and user-intensive analysis. In this study, we propose a Topological Data Analysis (TDA) based approach to classify the roughness level of synthetic surfaces using both their areal images and profiles. We utilize persistent homology from TDA to generate persistence diagrams that encapsulate information on the shape of the surface. We then obtain feature matrices for each surface or profile using Carlsson coordinates, persistence images, and template functions. We compare our results to two widely used methods in the literature: Fast Fourier Transform (FFT) and Gaussian filtering. The results show that our approach yields mean accuracies as high as 97%. We also show that, in contrast to existing surface analysis tools, our TDA-based approach is fully automatable and provides adaptive feature extraction.

Can Neural Networks Learn Persistent Homology Features? (2020)

Guido Montúfar, Nina Otter, Yuguang Wang

Abstract

Topological data analysis uses tools from topology -- the mathematical area that studies shapes -- to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.

Topological Data Analysis of C. Elegans Locomotion and Behavior (2021)

Ashleigh Thomas, Kathleen Bates, Alex Elchesen, Iryna Hartsock, Hang Lu, Peter Bubenik

Abstract

Video of nematodes/roundworms was analyzed using persistent homology to study locomotion and behavior. In each frame, an organism's body posture was represented by a high-dimensional vector. By concatenating points in fixed-duration segments of this time series, we created a sliding window embedding (sometimes called a time delay embedding) where each point corresponds to a sequence of postures of an organism. Persistent homology on the points in this time series detected behaviors and comparisons of these persistent homology computations detected variation in their corresponding behaviors. We used average persistence landscapes and machine learning techniques to study changes in locomotion and behavior in varying environments.

A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)

Xiaoyu Zhang, Takanori Fujiwara, Senthil Chandrasegaran, Michael P. Brundage, Thurston Sexton, Alden Dima, Kwan-Liu Ma

Abstract

Analysis of large, high-dimensional, and heterogeneous datasets is challenging as no one technique is suitable for visualizing and clustering such data in order to make sense of the underlying information. For instance, heterogeneous logs detailing machine repair and maintenance in an organization often need to be analyzed to diagnose errors and identify abnormal patterns, formalize root-cause analyses, and plan preventive maintenance. Such real-world datasets are also beset by issues such as inconsistent and/or missing entries. To conduct an effective diagnosis, it is important to extract and understand patterns from the data with support from analytic algorithms (e.g., finding that certain kinds of machine complaints occur more in the summer) while involving the human-in-the-loop. To address these challenges, we adopt existing techniques for dimensionality reduction (DR) and clustering of numerical, categorical, and text data dimensions, and introduce a visual analytics approach that uses multiple coordinated views to connect DR + clustering results across each kind of the data dimension stated. To help analysts label the clusters, each clustering view is supplemented with techniques and visualizations that contrast a cluster of interest with the rest of the dataset. Our approach assists analysts to make sense of machine maintenance logs and their errors. Then the gained insights help them carry out preventive maintenance. We illustrate and evaluate our approach through use cases and expert studies respectively, and discuss generalization of the approach to other heterogeneous data.

A Stable Multi-Scale Kernel for Topological Machine Learning (2015)

Jan Reininghaus, Stefan Huber, Ulrich Bauer, Roland Kwitt

Abstract

Topological data analysis offers a rich source of valuable information to study vision problems. Yet, so far we lack a theoretically sound connection to popular kernel-based learning techniques, such as kernel SVMs or kernel PCA. In this work, we establish such a connection by designing a multi-scale kernel for persistence diagrams, a stable summary representation of topological features in data. We show that this kernel is positive definite and prove its stability with respect to the 1-Wasserstein distance. Experiments on two benchmark datasets for 3D shape classification/retrieval and texture recognition show considerable performance gains of the proposed method compared to an alternative approach that is based on the recently introduced persistence landscapes.

Topological Detection of Trojaned Neural Networks (2021)

Songzhu Zheng, Yikai Zhang, Hubert Wagner, Mayank Goswami, Chao Chen

Abstract

Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model’s behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles, we discover subtle – yet critical – structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from shallow to deep layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.

Topological Detection of Trojaned Neural Networks (2021)

Songzhu Zheng, Yikai Zhang, Hubert Wagner, Mayank Goswami, Chao Chen

Abstract

Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model’s behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles, we discover subtle – yet critical – structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from shallow to deep layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.

A Novel Method of Extracting Topological Features From Word Embeddings (2020)

Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny

Abstract

In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.

Generalized Penalty for Circular Coordinate Representation (2020)

Hengrui Luo, Alice Patania, Jisu Kim, Mikael Vejdemo-Johansson

Abstract

Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account sparsity in high-dimensional applications. We use a generalized penalty function instead of an \$L_\2\\$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analysis to support our claim that circular coordinates with generalized penalty will accommodate the sparsity in high-dimensional datasets under different sampling schemes while preserving the topological structures.

Fast and Accurate Tumor Segmentation of Histology Images Using Persistent Homology and Deep Convolutional Features (2019)

Talha Qaiser, Yee-Wah Tsang, Daiki Taniyama, Naoya Sakamoto, Kazuaki Nakane, David Epstein, Nasir Rajpoot

Abstract

Tumor segmentation in whole-slide images of histology slides is an important step towards computer-assisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. We propose an efficient way of computing topological persistence of an image, alternative to simplicial homology. The PHPs are devised to distinguish tumor regions from their normal counterparts by modeling the atypical characteristics of tumor nuclei. We propose two variants of our method for tumor segmentation: one that targets speed without compromising accuracy and the other that targets higher accuracy. The fast version is based on a selection of exemplar image patches from a convolution neural network (CNN) and patch classification by quantifying the divergence between the PHPs of exemplars and the input image patch. Detailed comparative evaluation shows that the proposed algorithm is significantly faster than competing algorithms while achieving comparable results. The accurate version combines the PHPs and high-level CNN features and employs a multi-stage ensemble strategy for image patch labeling. Experimental results demonstrate that the combination of PHPs and CNN features outperform competing algorithms. This study is performed on two independently collected colorectal datasets containing adenoma, adenocarcinoma, signet, and healthy cases. Collectively, the accurate tumor segmentation produces the highest average patch-level F1-score, as compared with competing algorithms, on malignant and healthy cases from both the datasets. Overall the proposed framework highlights the utility of persistent homology for histopathology image analysis.

Persistent Homology Machine Learning for Fingerprint Classification (2019)

N. Giansiracusa, R. Giansiracusa, C. Moon

Abstract

The fingerprint classification problem is to sort fingerprints into predetermined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near state-of-the-art classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3-dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint ink-roll images, which yields a lower accuracy rate but still shows promise; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD-27.

Hyperparameter Optimization of Topological Features for Machine Learning Applications (2019)

Francis Motta, Christopher Tralie, Rossella Bedini, Fabiano Bini, Gilberto Bini, Hamed Eramian, Marcio Gameiro, Steve Haase, Hugh Haddox, John Harer, Nick Leiby, Franco Marinozzi, Scott Novotney, Gabe Rocklin, Jed Singer, Devin Strickland, Matt Vaughn

Abstract

This paper describes a general pipeline for generating optimal vector representations of topological features of data for use with machine learning algorithms. This pipeline can be viewed as a costly black-box function defined over a complex configuration space, each point of which specifies both how features are generated and how predictive models are trained on those features. We propose using state-of-the-art Bayesian optimization algorithms to inform the choice of topological vectorization hyperparameters while simultaneously choosing learning model parameters. We demonstrate the need for and effectiveness of this pipeline using two difficult biological learning problems, and illustrate the nontrivial interactions between topological feature generation and learning model hyperparameters.

Contagion Dynamics for Manifold Learning (2020)

Barbara I. Mahler

Abstract

Contagion maps exploit activation times in threshold contagions to assign vectors in high-dimensional Euclidean space to the nodes of a network. A point cloud that is the image of a contagion map reflects both the structure underlying the network and the spreading behaviour of the contagion on it. Intuitively, such a point cloud exhibits features of the network's underlying structure if the contagion spreads along that structure, an observation which suggests contagion maps as a viable manifold-learning technique. We test contagion maps as a manifold-learning tool on a number of different real-world and synthetic data sets, and we compare their performance to that of Isomap, one of the most well-known manifold-learning algorithms. We find that, under certain conditions, contagion maps are able to reliably detect underlying manifold structure in noisy data, while Isomap fails due to noise-induced error. This consolidates contagion maps as a technique for manifold learning.

Persistent Homology Advances Interpretable Machine Learning for Nanoporous Materials (2020)

Aditi S. Krishnapriyan, Joseph Montoya, Jens Hummelshøj, Dmitriy Morozov

Abstract

Machine learning for nanoporous materials design and discovery has emerged as a promising alternative to more time-consuming experiments and simulations. The challenge with this approach is the selection of features that enable universal and interpretable materials representations across multiple prediction tasks. We use persistent homology to construct holistic representations of the materials structure. We show that these representations can also be augmented with other generic features such as word embeddings from natural language processing to capture chemical information. We demonstrate our approach on multiple metal-organic framework datasets by predicting a variety of gas adsorption targets. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from commonly used manually curated features. Persistent homology features allow us to locate the pores that correlate best to adsorption at different pressures, contributing to understanding atomic level structure-property relationships for materials design.

Topological Machine Learning for Mixed Numeric and Categorical Data (2020)

Chengyuan Wu, Carol Anne Hargreaves

Abstract

Topological data analysis is a relatively new branch of machine learning that excels in studying high dimensional data, and is theoretically known to be robust against noise. Meanwhile, data objects with mixed numeric and categorical attributes are ubiquitous in real-world applications. However, topological methods are usually applied to point cloud data, and to the best of our knowledge there is no available framework for the classification of mixed data using topological methods. In this paper, we propose a novel topological machine learning method for mixed data classification. In the proposed method, we use theory from topological data analysis such as persistent homology, persistence diagrams and Wasserstein distance to study mixed data. The performance of the proposed method is demonstrated by experiments on a real-world heart disease dataset. Experimental results show that our topological method outperforms several state-of-the-art algorithms in the prediction of heart disease.

Explainable Machine Learning Approach to Yield and Quality Improvements Using Deep Topological Data Analytics (2023)

Janhavi Giri, Attila Lengyel

Abstract

Abstract. In wafer fabrication, data is collected and analyzed to prevent process deviations that could affect product quality and wafer yield. However, the high-dimensional, sparse, and imbalanced nature of the data poses significant challenges to yield and quality root cause analysis. Deep Topological Data Analysis (DTDA) is an unsupervised machine learning method that clusters and models the data in the form of geometric objects such as graphs and their higher-dimensional versions. This method reduces the multidimensional dataset to two-dimensional networks or graphs, where each node represents a cluster of samples with similar characteristics, and an edge represents the presence of overlapping characteristics between the connecting nodes. DTDA provides insights into the necessary data elements required to conduct accurate analysis and helps engineers identify the features contributing to yield and quality issues, enabling corrective actions. Moreover, the approach prevents the waste of engineering resources and mitigates the impact on final manufacturing cost.

Quantitative Analysis of Phase Transitions in Two-Dimensional XY Models Using Persistent Homology (2022)

Nicholas Sale, Jeffrey Giansiracusa, Biagio Lucini

Abstract

We use persistent homology and persistence images as an observable of three different variants of the two-dimensional XY model in order to identify and study their phase transitions. We examine models with the classical XY action, a topological lattice action, and an action with an additional nematic term. In particular, we introduce a new way of computing the persistent homology of lattice spin model configurations and, by considering the fluctuations in the output of logistic regression and k-nearest neighbours models trained on persistence images, we develop a methodology to extract estimates of the critical temperature and the critical exponent of the correlation length. We put particular emphasis on finite-size scaling behaviour and producing estimates with quantifiable error. For each model we successfully identify its phase transition(s) and are able to get an accurate determination of the critical temperatures and critical exponents of the correlation length.

Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)

Bryan Bischof, Eric Bunch

Abstract

We experimentally investigate a collection of feature engineering pipelines for use with a CNN for classifying eyes-open or eyes-closed from electroencephalogram (EEG) time-series from the Bonn dataset. Using the Takens' embedding--a geometric representation of time-series--we construct simplicial complexes from EEG data. We then compare \$\epsilon\$-series of Betti-numbers and \$\epsilon\$-series of graph spectra (a novel construction)--two topological invariants of the latent geometry from these complexes--to raw time series of the EEG to fill in a gap in the literature for benchmarking. These methods, inspired by Topological Data Analysis, are used for feature engineering to capture local geometry of the time-series. Additionally, we test these feature pipelines' robustness to downsampling and data reduction. This paper seeks to establish clearer expectations for both time-series classification via geometric features, and how CNNs for time-series respond to data of degraded resolution.

Topological Regularization for Dense Prediction (2021)

Deqing Fu, Bradley J. Nelson

Abstract

Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with these topological descriptions. Experimental results show that the output topology can also appear in the internal activations of trained neural networks which allows for a novel use of topological regularization to the internal states of neural networks during training, reducing the computational cost of the regularization. We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures.

TopoGAN: A Topology-Aware Generative Adversarial Network (2020)

Fan Wang, Huidong Liu, Dimitris Samaras, Chao Chen

Abstract

Existing generative adversarial networks (GANs) focus on generating realistic images based on CNN-derived image features, but fail to preserve the structural properties of real images. This can be fatal in applications where the underlying structure (e.g.., neurons, vessels, membranes, and road networks) of the image carries crucial semantic meaning. In this paper, we propose a novel GAN model that learns the topology of real images, i.e., connectedness and loopy-ness. In particular, we introduce a new loss that bridges the gap between synthetic image distribution and real image distribution in the topological feature space. By optimizing this loss, the generator produces images with the same structural topology as real images. We also propose new GAN evaluation metrics that measure the topological realism of the synthetic images. We show in experiments that our method generates synthetic images with realistic topology. We also highlight the increased performance that our method brings to downstream tasks such as segmentation.

Community Resources

Video

Severe Slugging Flow Identification From Topological Indicators (2022)

Simone Casolo

Abstract

In this work, topological data analysis is used to identify the onset of severe slug flow in offshore petroleum production systems. Severe slugging is a multiphase flow regime known to be very inefficient and potentially harmful to process equipment and it is characterized by large oscillations in the production fluid pressure. Time series from pressure sensors in subsea oil wells are processed by means of Takens embedding to produce point clouds of data. Embedded sensor data is then analyzed using persistent homology to obtain topological indicators capable of revealing the occurrence of severe slugging in a condition-based monitoring approach. A large dataset of well events consisting of both real and simulated data is used to demonstrate the possibilty of authomatizing severe slugging detection from live data via topological data analysis. Methods based on persistence diagrams are shown to accurately identify severe slugging and to classify different flow regimes from pressure signals of producing wells with supervised machine learning.

The Extended Persistent Homology Transform of Manifolds With Boundary (2022)

Katharine Turner, Vanessa Robins, James Morgan

Abstract

The Extended Persistent Homology Transform (XPHT) is a topological transform which takes as input a shape embedded in Euclidean space, and to each unit vector assigns the extended persistence module of the height function over that shape with respect to that direction. We can define a distance between two shapes by integrating over the sphere the distance between their respective extended persistence modules. By using extended persistence we get finite distances between shapes even when they have different Betti numbers. We use Morse theory to show that the extended persistence of a height function over a manifold with boundary can be deduced from the extended persistence for that height function restricted to the boundary, alongside labels on the critical points as positive or negative critical. We study the application of the XPHT to binary images; outlining an algorithm for efficient calculation of the XPHT exploiting relationships between the PHT of the boundary curves to the extended persistence of the foreground.

Towards a New Approach to Reveal Dynamical Organization of the Brain Using Topological Data Analysis (2018)

Manish Saggar, Olaf Sporns, Javier Gonzalez-Castillo, Peter A. Bandettini, Gunnar Carlsson, Gary Glover, Allan L. Reiss

Abstract

Approaches describing how the brain changes to accomplish cognitive tasks tend to rely on collapsed data. Here, authors present a new approach that maintains high dimensionality and use it to describe individual differences in how brain activity is represented and organized across different cognitive tasks.

Analyzing Collective Motion With Machine Learning and Topology (2019)

Dhananjay Bhaskar, Angelika Manhart, Jesse Milzman, John T. Nardini, Kathleen M. Storey, Chad M. Topaz, Lori Ziegelmeier

Abstract

We use topological data analysis and machine learning to study a seminal model of collective motion in biology [M. R. D’Orsogna et al., Phys. Rev. Lett. 96, 104302 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based on topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.

Identifying Repeating Patterns in IEC 61499 Systems Using Feature-Based Embeddings (2022)

Markus Unterdechler, Antonio M. Gutiérrez, Lisa Sonnleithner, Rick Rabiser, Alois Zoitl

Abstract

Cyber-Physical Production Systems (CPPSs) are highly variable systems of systems comprised of software and hardware interacting with each other and the environment. The increasing integration of technologies and devices has brought an unprecedented level of automation and customization. At the same time, it has also increased the efforts to maintain highly complex and heterogeneous systems. Although engineering practices support the reuse of common components to ease the development and maintenance of the systems in different projects, the identification of common components is still manually performed, which is a time-consuming, error-prone task. In this paper, a novel approach identifying repeating patterns in CPPSs based on artificial intelligence techniques is presented. This approach allows finding exact and similar components to support the CPPS design. Furthermore, it enables the maintenance of common components by reusing predefined types thereby reducing development effort. We implemented and evaluated our approach in an industry case study on developing CPPS control software with IEC 61499.

A Topological Data Analysis Approach On Predicting Phenotypes From Gene Expression Data (2020)

Sayan Mandal, Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu, Laxmi Parida

Abstract

The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis., We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods., This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.

Alzheimer Disease Detection From Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning (2023)

Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D’Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini

Abstract

The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer’s disease (AD) as well as of 5 pathological controls was collected and analyzed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (\textgreater87%). Although our results are preliminary, they indicate that RS and topological analysis may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to investigate whether topological data analysis could support the characterization of AD subtypes.

Quantitative and Interpretable Order Parameters for Phase Transitions From Persistent Homology (2020)

Alex Cole, Gregory J. Loges, Gary Shiu

Abstract

We apply modern methods in computational topology to the task of discovering and characterizing phase transitions. As illustrations, we apply our method to four two-dimensional lattice spin models: the Ising, square ice, XY, and fully-frustrated XY models. In particular, we use persistent homology, which computes the births and deaths of individual topological features as a coarse-graining scale or sublevel threshold is increased, to summarize multiscale and high-point correlations in a spin configuration. We employ vector representations of this information called persistence images to formulate and perform the statistical task of distinguishing phases. For the models we consider, a simple logistic regression on these images is sufficient to identify the phase transition. Interpretable order parameters are then read from the weights of the regression. This method suffices to identify magnetization, frustration, and vortex-antivortex structure as relevant features for phase transitions in our models. We also define "persistence" critical exponents and study how they are related to those critical exponents usually considered.

TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through Topology-Based Machine Learning (2021)

Parker Edwards, Kristen Skruber, Nikola Milićević, James B. Heidings, Tracy-Ann Read, Peter Bubenik, Eric A. Vitriol

Abstract

Recent advances in machine learning have greatly enhanced automatic methods to extract information from fluorescence microscopy data. However, current machine-learning-based models can require hundreds to thousands of images to train, and the most readily accessible models classify images without describing which parts of an image contributed to classification. Here, we introduce TDAExplore, a machine learning image analysis pipeline based on topological data analysis. It can classify different types of cellular perturbations after training with only 20–30 high-resolution images and performs robustly on images from multiple subjects and microscopy modes. Using only images and whole-image labels for training, TDAExplore provides quantitative, spatial information, characterizing which image regions contribute to classification. Computational requirements to train TDAExplore models are modest and a standard PC can perform training with minimal user input. TDAExplore is therefore an accessible, powerful option for obtaining quantitative information about imaging data in a wide variety of applications.

Topological Analysis of Low Dimensional Phase Space Trajectories of High Dimensional EEG Signals for Classification of Interictal Epileptiform Discharges (2023)

A. Stiehl, M. Flammer, F. Anselstetter, N. Ille, H. Bornfleth, S. Geißelsöder, C. Uhl

Abstract

A new topology based feature extraction method for classification of interictal epileptiform discharges (IEDs) in EEG recordings from patients with epilepsy is proposed. After dimension reduction of the recorded EEG signal, using dynamical component analysis (DyCA) or principal component analysis (PCA), a persistent homology analysis of the resulting phase space trajectories is performed. Features are extracted from the persistent homology analysis and used to train and evaluate a support vector machine (SVM). Classification results based on these persistent features are compared with statistical features of the dimension-reduced signals and combinations of all of these features. Combining the persistent and statistical features improves the results (accuracy 94.7 %) compared to using only statistical feature extraction, whereas applying only persistent features does not achieve sufficient performance. For this classification example the choice of the dimension reduction technique does not significantly influence the classification performance of the algorithm.

Mapping Geometric and Electromagnetic Feature Spaces With Machine Learning for Additively Manufactured RF Devices (2022)

Deanna Sessions, Venkatesh Meenakshisundaram, Andrew Gillman, Alexander Cook, Kazuko Fuchi, Philip R. Buskohl, Gregory H. Huff

Abstract

Multi-material additive manufacturing enables transformative capabilities in customized, low-cost, and multi-functional electromagnetic devices. However, process-specific fabrication anomalies can result in non-intuitive effects on performance; we propose a framework for identifying defect mechanisms and their performance impact by mapping geometric variances to electromagnetic performance metrics. This method can accelerate additive fabrication feedback while avoiding the high computational cost of in-line electromagnetic simulation. We first used dimension reduction to explore the population of geometric manufacturing anomalies and electromagnetic performance. Convolutional neural networks are then trained to predict the electromagnetic performance of the printed geometries. In generating the networks, we explored two inputs: one image-derived geometric description and one using the same description with additional simulated electromagnetic information. Network latent space analysis shows the networks learned both geometric and electromagnetic values even without electromagnetic input. This result demonstrates it is possible to create accelerated additive feedback systems predicting electromagnetic performance without in-line simulation.

A Topological Machine Learning Pipeline for Classification (2022)

Francesco Conti, Davide Moroni, Maria Antonietta Pascali

Abstract

In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.

Rapid and Precise Topological Comparison With Merge Tree Neural Networks (2024)

Yu Qin, Brittany Terese Fasy, Carola Wenk, Brian Summa

Abstract

Merge trees are a valuable tool in the scientific visualization of scalar fields; however, current methods for merge tree comparisons are computationally expensive, primarily due to the exhaustive matching between tree nodes. To address this challenge, we introduce the Merge Tree Neural Network (MTNN), a learned neural network model designed for merge tree comparison. The MTNN enables rapid and high-quality similarity computation. We first demonstrate how to train graph neural networks, which emerged as effective encoders for graphs, in order to produce embeddings of merge trees in vector spaces for efficient similarity comparison. Next, we formulate the novel MTNN model that further improves the similarity comparisons by integrating the tree and node embeddings with a new topological attention mechanism. We demonstrate the effectiveness of our model on real-world data in different domains and examine our model's generalizability across various datasets. Our experimental analysis demonstrates our approach's superiority in accuracy and efficiency. In particular, we speed up the prior state-of-the-art by more than \$100\times\$ on the benchmark datasets while maintaining an error rate below \$0.1\%\$.

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction (2020)

Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov, Katherine Yelick

Abstract

Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structure-based prediction schemes. We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own as well as a baseline neural network that learns from the same information. PersGNN achieves a 9.3% boost in area under the precision recall curve (AUPR) compared to the best individual model, as well as high F1 scores across different gene ontology categories, indicating the transferability of this approach.

Capturing Dynamics of Time-Varying Data via Topology (2020)

Lu Xian, Henry Adams, Chad M. Topaz, Lori Ziegelmeier

Abstract

One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of time-varying metric spaces including vineyards [17], crocker plots [52], and multiparameter rank functions [34]. We then introduce a new tool to summarize time-varying metric spaces: a crocker stack. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable stability property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [54]. Altogether, we aim to bring the broader applied mathematics community up-to-date on topological summaries of time-varying metric spaces.

Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology (2021)

David Pérez Fernández, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas

Abstract

Artificial Neural Networks (ANNs) are widely used for approximating complex functions. The process that is usually followed to define the most appropriate architecture for an ANN given a specific function is mostly empirical. Once this architecture has been defined, weights are usually optimized according to the error function. On the other hand, we observe that ANNs can be represented as graphs and their topological 'fingerprints' can be obtained using Persistent Homology (PH). In this paper, we describe a proposal focused on designing more principled architecture search procedures. To do this, different architectures for solving problems related to a heterogeneous set of datasets have been analyzed. The results of the evaluation corroborate that PH effectively characterizes the ANN invariants: when ANN density (layers and neurons) or sample feeding order is the only difference, PH topological invariants appear; in the opposite direction in different sub-problems (i.e. different labels), PH varies. This approach based on topological analysis helps towards the goal of designing more principled architecture search procedures and having a better understanding of ANNs.

Persistent Homology Based Graph Convolution Network for Fine-Grained 3D Shape Segmentation (2021)

Chi-Chong Wong, Chi-Man Vong

Abstract

Fine-grained 3D segmentation is an important task in 3D object understanding, especially in applications such as intelligent manufacturing or parts analysis for 3D objects. However, many challenges involved in such problem are yet to be solved, such as i) interpreting the complex structures located in different regions for 3D objects; ii) capturing fine-grained structures with sufficient topology correctness. Current deep learning and graph machine learning methods fail to tackle such challenges and thus provide inferior performance in fine-grained 3D analysis. In this work, methods in topological data analysis are incorporated with geometric deep learning model for the task of fine-grained segmentation for 3D objects. We propose a novel neural network model called Persistent Homology based Graph Convolution Network (PHGCN), which i) integrates persistent homology into graph convolution network to capture multi-scale structural information that can accurately represent complex structures for 3D objects; ii) applies a novel Persistence Diagram Loss (ℒPD) that provides sufficient topology correctness for segmentation over the fine-grained structures. Extensive experiments on fine-grained 3D segmentation validate the effectiveness of the proposed PHGCN model and show significant improvements over current state-of-the-art methods.

Persistence Images: A Stable Vector Representation of Persistent Homology (2017)

Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, Lori Ziegelmeier

Abstract

Many data sets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a data set. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite-dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs.

Community Resources

A Novel Multi-Task Machine Learning Classifier for Rare Disease Patterning Using Cardiac Strain Imaging Data (2024)

Nanda K. Siva, Yashbir Singh, Quincy A. Hathaway, Partho P. Sengupta, Naveena Yanamala

Abstract

To provide accurate predictions, current machine learning-based solutions require large, manually labeled training datasets. We implement persistent homology (PH), a topological tool for studying the pattern of data, to analyze echocardiography-based strain data and differentiate between rare diseases like constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM). Patient population (retrospectively registered) included those presenting with heart failure due to CP (n = 51), RCM (n = 47), and patients without heart failure symptoms (n = 53). Longitudinal, radial, and circumferential strains/strain rates for left ventricular segments were processed into topological feature vectors using Machine learning PH workflow. In differentiating CP and RCM, the PH workflow model had a ROC AUC of 0.94 (Sensitivity = 92%, Specificity = 81%), compared with the GLS model AUC of 0.69 (Sensitivity = 65%, Specificity = 66%). In differentiating between all three conditions, the PH workflow model had an AUC of 0.83 (Sensitivity = 68%, Specificity = 84%), compared with the GLS model AUC of 0.68 (Sensitivity = 52% and Specificity = 76%). By employing persistent homology to differentiate the “pattern” of cardiac deformations, our machine-learning approach provides reasonable accuracy when evaluating small datasets and aids in understanding and visualizing patterns of cardiac imaging data in clinically challenging disease states.

Exploring the Geometry and Topology of Neural Network Loss Landscapes (2022)

Stefan Horoi, Jessie Huang, Bastian Rieck, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy

Abstract

Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have proposed visualizing the loss landscape through the use of simple dimensionality reduction techniques. However, such visualization methods have been limited by their linear nature and only capture features in one or two dimensions, thus restricting sampling of the loss landscape to lines or planes. Here, we expand and improve upon these in three ways. First, we present a novel “jump and retrain” procedure for sampling relevant portions of the loss landscape. We show that the resulting sampled data holds more meaningful information about the network’s ability to generalize. Next, we show that non-linear dimensionality reduction of the jump and retrain trajectories via PHATE, a trajectory and manifold-preserving method, allows us to visualize differences between networks that are generalizing well vs poorly. Finally, we combine PHATE trajectories with a computational homology characterization to quantify trajectory differences.

Topological Echoes of Primordial Physics in the Universe at Large Scales (2020)

Alex Cole, Matteo Biagetti, Gary Shiu

Abstract

We present a pipeline for characterizing and constraining initial conditions in cosmology via persistent homology. The cosmological observable of interest is the cosmic web of large scale structure, and the initial conditions in question are non-Gaussianities (NG) of primordial density perturbations. We compute persistence diagrams and derived statistics for simulations of dark matter halos with Gaussian and non-Gaussian initial conditions. For computational reasons and to make contact with experimental observations, our pipeline computes persistence in sub-boxes of full simulations and simulations are subsampled to uniform halo number. We use simulations with large NG (\$f_\\rm NL\\textasciicircum\\rm loc\=250\$) as templates for identifying data with mild NG (\$f_\\rm NL\\textasciicircum\\rm loc\=10\$), and running the pipeline on several cubic volumes of size \$40~(\textrm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes for our best single statistic. Throughout we benefit from the interpretability of topological features as input for statistical inference, which allows us to make contact with previous first-principles calculations and make new predictions.

Topological Data Analysis and Machine Learning for Recognizing Atmospheric River Patterns in Large Climate Datasets (2019)

Grzegorz Muszynski, Karthik Kashinath, Vitaliy Kurlin, Michael Wehner, Prabhat

Abstract

Identifying weather patterns that frequently lead to extreme weather events is a crucial first step in understanding how they may vary under different climate change scenarios. Here, we propose an automated method for recognizing atmospheric rivers (ARs) in climate data using topological data analysis and machine learning. The method provides useful information about topological features (shape characteristics) and statistics of ARs. We illustrate this method by applying it to outputs of version 5.1 of the Community Atmosphere Model version 5.1 (CAM5.1) and the reanalysis product of the second Modern-Era Retrospective Analysis for Research and Applications (MERRA-2). An advantage of the proposed method is that it is threshold-free – there is no need to determine any threshold criteria for the detection method – when the spatial resolution of the climate model changes. Hence, this method may be useful in evaluating model biases in calculating AR statistics. Further, the method can be applied to different climate scenarios without tuning since it does not rely on threshold conditions. We show that the method is suitable for rapidly analyzing large amounts of climate model and reanalysis output data.

Efficient Planning of Multi-Robot Collective Transport Using Graph Reinforcement Learning With Higher Order Topological Abstraction (2023)

Steve Paul, Wenyuan Li, Brian Smyth, Yuzhou Chen, Yulia Gel, Souma Chowdhury

Abstract

Efficient multi-robot task allocation (MRTA) is fundamental to various time-sensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call MRTA-collective transport or MRTA-CT - here tasks present varying workloads and deadlines, and robots are subject to flight range, communication range, and payload constraints. For large instances of these problems involving 100s-1000's of tasks and 10s-100s of robots, traditional non-learning solvers are often time-inefficient, and emerging learning-based policies do not scale well to larger-sized problems without costly retraining. To address this gap, we use a recently proposed encoder-decoder graph neural network involving Capsule networks and multi-head attention mechanism, and innovatively add topological descriptors (TD) as new features to improve transferability to unseen problems of similar and larger size. Persistent homology is used to derive the TD, and proximal policy optimization is used to train our TD-augmented graph neural network. The resulting policy model compares favorably to state-of-the-art non-learning baselines while being much faster. The benefit of using TD is readily evident when scaling to test problems of size larger than those used in training.

Optimizing Porosity Detection in Wire Laser Metal Deposition Processes Through Data-Driven AI Classification Techniques (2023)

Meritxell Gomez-Omella, Jon Flores, Basilio Sierra, Susana Ferreiro, Nicolas Hascoët, Francisco Chinesta

Abstract

Additive manufacturing (AM) is an attractive solution for many companies that produce geometrically complex parts. This process consists of depositing material layer by layer following a sliced CAD geometry. It brings several benefits to manufacturing capabilities, such as design freedom, reduced material waste, and short-run customization. However, one of the current challenges faced by users of the process, mainly in wire laser metal deposition (wLMD), is to avoid defects in the manufactured part, especially the porosity. This defect is caused by extreme conditions and metallurgical transformations of the process. And not only does it directly affect the mechanical performance of the parts, especially the fatigue properties, but it also means an increase in costs due to the inspection tasks to which the manufactured parts must be subjected. This work compares three operational solution approaches, product-centric, based on signal-based feature extraction and Topological Data Analysis together with statistical and Machine Learning (ML) techniques, for the early detection and prediction of porosity failure in a wLMD process. The different forecasting and validation strategies demonstrate the variety of conclusions that can be drawn with different objectives in the analysis of the monitored data in AM problems.

Chatter Classification in Turning Using Machine Learning and Topological Data Analysis (2018)

Firas A. Khasawneh, Elizabeth Munch, Jose A. Perea

Abstract

Chatter identification and detection in machining processes has been an active area of research in the past two decades. Part of the challenge in studying chatter is that machining equations that describe its occurrence are often nonlinear delay differential equations. The majority of the available tools for chatter identification rely on defining a metric that captures the characteristics of chatter, and a threshold that signals its occurrence. The difficulty in choosing these parameters can be somewhat alleviated by utilizing machine learning techniques. However, even with a successful classification algorithm, the transferability of typical machine learning methods from one data set to another remains very limited. In this paper we combine supervised machine learning with Topological Data Analysis (TDA) to obtain a descriptor of the process which can detect chatter. The features we use are derived from the persistence diagram of an attractor reconstructed from the time series via Takens embedding. We test the approach using deterministic and stochastic turning models, where the stochasticity is introduced via the cutting coefficient term. Our results show a 97% successful classification rate on the deterministic model labeled by the stability diagram obtained using the spectral element method. The features gleaned from the deterministic model are then utilized for characterization of chatter in a stochastic turning model where there are very limited analysis methods.

Hepatic Tumor Classification Using Texture and Topology Analysis of Non-Contrast-Enhanced Three-Dimensional T1-Weighted MR Images With a Radiomics Approach (2019)

Asuka Oyama, Yasuaki Hiraoka, Ippei Obayashi, Yusuke Saikawa, Shigeru Furui, Kenshiro Shiraishi, Shinobu Kumagai, Tatsuya Hayashi, Jun’ichi Kotoku

Abstract

The purpose of this study is to evaluate the accuracy for classification of hepatic tumors by characterization of T1-weighted magnetic resonance (MR) images using two radiomics approaches with machine learning models: texture analysis and topological data analysis using persistent homology. This study assessed non-contrast-enhanced fat-suppressed three-dimensional (3D) T1-weighted images of 150 hepatic tumors. The lesions included 50 hepatocellular carcinomas (HCCs), 50 metastatic tumors (MTs), and 50 hepatic hemangiomas (HHs) found respectively in 37, 23, and 33 patients. For classification, texture features were calculated, and also persistence images of three types (degree 0, degree 1 and degree 2) were obtained for each lesion from the 3D MR imaging data. We used three classification models. In the classification of HCC and MT (resp. HCC and HH, HH and MT), we obtained accuracy of 92% (resp. 90%, 73%) by texture analysis, and the highest accuracy of 85% (resp. 84%, 74%) when degree 1 (resp. degree 1, degree 2) persistence images were used. Our methods using texture analysis or topological data analysis allow for classification of the three hepatic tumors with considerable accuracy, and thus might be useful when applied for computer-aided diagnosis with MR images.

Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations (2016)

Abdellah Tebani, Carlos Afonso, Stéphane Marret, Soumeya Bekri

Abstract

The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of next-generation diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by high-throughput omics technologies and the computational surge, it enables multi-scale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. High-throughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable high-dimensional data modeling and, therefore, accessible and user-friendly visualization. Furthermore, bioinformatics has enabled comprehensive multi-omics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present state-of-the-art multi-omics data analysis strategies in a clinical context. The challenges of omics-based biomarker translation are discussed. Perspectives regarding the use of multi-omics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the post-genomic era.

PI-Net: A Deep Learning Approach to Extract Topological Persistence Images (2020)

Anirudh Som, Hongjun Choi, Karthikeyan Natesan Ramamurthy, Matthew Buman, Pavan Turaga

Abstract

Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in real-world data, such as view-point, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difﬁculty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PINet respectively. To the best of our knowledge, we are the ﬁrst to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PI-Net architectures on two applications: human activity recognition using tri-axial accelerometer sensor data and image classiﬁcation. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.

Prediction in Cancer Genomics Using Topological Signatures and Machine Learning (2020)

Georgina Gonzalez, Arina Ushakova, Radmila Sazdanovic, Javier Arsuaga

Abstract

Copy Number Aberrations, gains and losses of genomic regions, are a hallmark of cancer and can be experimentally detected using microarray comparative genomic hybridization (aCGH). In previous works, we developed a topology based method to analyze aCGH data whose output are regions of the genome where copy number is altered in patients with a predetermined cancer phenotype. We call this method Topological Analysis of array CGH (TAaCGH). Here we combine TAaCGH with machine learning techniques to build classifiers using copy number aberrations. We chose logistic regression on two different binary phenotypes related to breast cancer to illustrate this approach. The first case consists of patients with over-expression of the ERBB2 gene. Over-expression of ERBB2 is commonly regulated by a copy number gain in chromosome arm 17q. TAaCGH found the region 17q11-q22 associated with the phenotype and using logistic regression we reduced this region to 17q12-q21.31 correctly classifying 78% of the ERBB2 positive individuals (sensitivity) in a validation data set. We also analyzed over-expression in Estrogen Receptor (ER), a second phenotype commonly observed in breast cancer patients and found that the region 5p14.3-12 together with six full arms were associated with the phenotype. Our method identified 4p, 6p and 16q as the strongest predictors correctly classifying 76% of ER positives in our validation data set. However, for this set there was a significant increase in the false positive rate (specificity). We suggest that topological and machine learning methods can be combined for prediction of phenotypes using genetic data.

Time-Inhomogeneous Diffusion Geometry and Topology (2022)

Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy

Abstract

Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.

The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)

Salvador Chulián, Bernadette J. Stolz, Álvaro Martínez-Rubio, Cristina Blázquez Goñi, Juan F. Rodríguez Gutiérrez, Teresa Caballero Velázquez, Águeda Molinos Quintana, Manuel Ramírez Orellana, Ana Castillo Robleda, José Luis Fuster Soler, Alfredo Minguela Puras, María Victoria Martínez Sánchez, María Rosa, Víctor M. Pérez-García, Helen Byrne

Abstract

Acute Lymphoblastic Leukaemia (ALL) is the most frequent paediatric cancer. Modern therapies have improved survival rates, but approximately 15-20 % of patients relapse. At present, patients’ risk of relapse are assessed by projecting high-dimensional flow cytometry data onto a subset of biomarkers and manually estimating the shape of this reduced data. Here, we apply methods from topological data analysis (TDA), which quantify shape in data via features such as connected components and loops, to pre-treatment ALL datasets with known outcomes. We combine these fully unsupervised analyses with machine learning to identify features in the pre-treatment data that are prognostic for risk of relapse. We find significant topological differences between relapsing and non-relapsing patients and confirm the predictive power of CD10, CD20, CD38, and CD45. Further, we are able to use the TDA descriptors to predict patients who relapsed. We propose three prognostic pipelines that readily extend to other haematological malignancies. Teaser Topology reveals features in flow cytometry data which predict relapse of patients with acute lymphoblastic leukemia

Weighted-Persistent-Homology-Based Machine Learning for RNA Flexibility Analysis (2020)

Chi Seng Pun, Brandon Yung Sin Yong, Kelin Xia

Abstract

With the great significance of biomolecular flexibility in biomolecular dynamics and functional analysis, various experimental and theoretical models are developed. Experimentally, Debye-Waller factor, also known as B-factor, measures atomic mean-square displacement and is usually considered as an important measurement for flexibility. Theoretically, elastic network models, Gaussian network model, flexibility-rigidity model, and other computational models have been proposed for flexibility analysis by shedding light on the biomolecular inner topological structures. Recently, a topology-based machine learning model has been proposed. By using the features from persistent homology, this model achieves a remarkable high Pearson correlation coefficient (PCC) in protein B-factor prediction. Motivated by its success, we propose weighted-persistent-homology (WPH)-based machine learning (WPHML) models for RNA flexibility analysis. Our WPH is a newly-proposed model, which incorporate physical, chemical and biological information into topological measurements using a weight function. In particular, we use local persistent homology (LPH) to focus on the topological information of local regions. Our WPHML model is validated on a well-established RNA dataset, and numerical experiments show that our model can achieve a PCC of up to 0.5822. The comparison with the previous sequence-information-based learning models shows that a consistent improvement in performance by at least 10% is achieved in our current model.

Raman Spectroscopy and Topological Machine Learning for Cancer Grading (2023)

Francesco Conti, Mario D’Acunto, Claudia Caudai, Sara Colantonio, Raffaele Gaeta, Davide Moroni, Maria Antonietta Pascali

Abstract

In the last decade, Raman Spectroscopy is establishing itself as a highly promising technique for the classification of tumour tissues as it allows to obtain the biochemical maps of the tissues under investigation, making it possible to observe changes among different tissues in terms of biochemical constituents (proteins, lipid structures, DNA, vitamins, and so on). In this paper, we aim to show that techniques emerging from the cross-fertilization of persistent homology and machine learning can support the classification of Raman spectra extracted from cancerous tissues for tumour grading. In more detail, topological features of Raman spectra and machine learning classifiers are trained in combination as an automatic classification pipeline in order to select the best-performing pair. The case study is the grading of chondrosarcoma in four classes: cross and leave-one-patient-out validations have been used to assess the classification accuracy of the method. The binary classification achieves a validation accuracy of 81% and a test accuracy of 90%. Moreover, the test dataset has been collected at a different time and with different equipment. Such results are achieved by a support vector classifier trained with the Betti Curve representation of the topological features extracted from the Raman spectra, and are excellent compared with the existing literature. The added value of such results is that the model for the prediction of the chondrosarcoma grading could easily be implemented in clinical practice, possibly integrated into the acquisition system.

Manifold Learning for Coherent Design Interpolation Based on Geometrical and Topological Descriptors (2023)

D. Muñoz, O. Allix, F. Chinesta, J. J. Ródenas, E. Nadal

Abstract

In the context of intellectual property in the manufacturing industry, know-how is referred to practical knowledge on how to accomplish a specific task. This know-how is often difficult to be synthesised in a set of rules or steps as it remains in the intuition and expertise of engineers, designers, and other professionals. Today, a new research line in this concern spot-up thanks to the explosion of Artificial Intelligence and Machine Learning algorithms and its alliance with Computational Mechanics and Optimisation tools. However, a key aspect with industrial design is the scarcity of available data, making it problematic to rely on deep-learning approaches. Assuming that the existing designs live in a manifold, in this paper, we propose a synergistic use of existing Machine Learning tools to infer a reduced manifold from the existing limited set of designs and, then, to use it to interpolate between the individuals, working as a generator basis, to create new and coherent designs. For this, a key aspect is to be able to properly interpolate in the reduced manifold, which requires a proper clustering of the individuals. From our experience, due to the scarcity of data, adding topological descriptors to geometrical ones considerably improves the quality of the clustering. Thus, a distance, mixing topology and geometry is proposed. This distance is used both, for the clustering and for the interpolation. For the interpolation, relying on optimal transport appear to be mandatory. Examples of growing complexity are proposed to illustrate the goodness of the method.

Semantic Segmentation of Microscopic Neuroanatomical Data by Combining Topological Priors With Encoder–decoder Deep Networks (2020)

Samik Banerjee, Lucas Magee, Dingkang Wang, Xu Li, Bing-Xing Huo, Jaikishan Jayakumar, Katherine Matho, Meng-Kuan Lin, Keerthi Ram, Mohanasankar Sivaprakasam, Josh Huang, Yusu Wang, Partha P. Mitra

Abstract

Understanding of neuronal circuitry at cellular resolution within the brain has relied on neuron tracing methods that involve careful observation and interpretation by experienced neuroscientists. With recent developments in imaging and digitization, this approach is no longer feasible with the large-scale (terabyte to petabyte range) images. Machine-learning-based techniques, using deep networks, provide an efficient alternative to the problem. However, these methods rely on very large volumes of annotated images for training and have error rates that are too high for scientific data analysis, and thus requires a substantial volume of human-in-the-loop proofreading. Here we introduce a hybrid architecture combining prior structure in the form of topological data analysis methods, based on discrete Morse theory, with the best-in-class deep-net architectures for the neuronal connectivity analysis. We show significant performance gains using our hybrid architecture on detection of topological structure (for example, connectivity of neuronal processes and local intensity maxima on axons corresponding to synaptic swellings) with precision and recall close to 90% compared with human observers. We have adapted our architecture to a high-performance pipeline capable of semantic segmentation of light-microscopic whole-brain image data into a hierarchy of neuronal compartments. We expect that the hybrid architecture incorporating discrete Morse techniques into deep nets will generalize to other data domains.

Topology-Based Kernels With Application to Inference Problems in Alzheimer’s Disease (2011)

Deepti Pachauri, Chris Hinrichs, Moo K. Chung, Sterling C. Johnson, Vikas Singh

Abstract

Alzheimer’s disease (AD) research has recently witnessed a great deal of activity focused on developing new statistical learning tools for automated inference using imaging data. The workhorse for many of these techniques is the Support Vector Machine (SVM) framework (or more generally kernel based methods). Most of these require, as a first step, specification of a kernel matrix between input examples (i.e., images). The inner product between images Ii and Ij in a feature space can generally be written in closed form, and so it is convenient to treat as “given”. However, in certain neuroimaging applications such an assumption becomes problematic. As an example, it is rather challenging to provide a scalar measure of similarity between two instances of highly attributed data such as cortical thickness measures on cortical surfaces. Note that cortical thickness is known to be discriminative for neurological disorders, so leveraging such information in an inference framework, especially within a multi-modal method, is potentially advantageous. But despite being clinically meaningful, relatively few works have successfully exploited this measure for classification or regression. Motivated by these applications, our paper presents novel techniques to compute similarity matrices for such topologically-based attributed data. Our ideas leverage recent developments to characterize signals (e.g., cortical thickness) motivated by the persistence of their topological features, leading to a scheme for simple constructions of kernel matrices. As a proof of principle, on a dataset of 356 subjects from the ADNI study, we report good performance on several statistical inference tasks without any feature selection, dimensionality reduction, or parameter tuning.

Community Resources

Data

Algebraic Topology-Based Machine Learning Using MRI Predicts Outcomes in Primary Sclerosing Cholangitis (2022)

Yashbir Singh, William A. Jons, John E. Eaton, Mette Vesterhus, Tom Karlsen, Ida Bjoerk, Andreas Abildgaard, Kristin Kaasen Jorgensen, Folseraas Trine, Derek Little, Aliya F. Gulamhusein, Kosta Petrovic, Anne Negard, Gian Marco Conte, Joseph D. Sobek, Jaidip Jagtap, Sudhakar K. Venkatesh, Gregory J. Gores, Nicholas F. LaRusso, Konstantinos N. Lazaridis, Bradley J. Erickson

Abstract

Background: Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease that can lead to cirrhosis and hepatic decompensation. However, predicting future outcomes in patients with PSC is challenging. Our aim was to extract magnetic resonance imaging (MRI) features that predict the development of hepatic decompensation by applying algebraic topology-based machine learning (ML). Methods: We conducted a retrospective multicenter study among adults with large duct PSC who underwent MRI. A topological data analysis-inspired nonlinear framework was used to predict the risk of hepatic decompensation, which was motivated by algebraic topology theory-based ML. The topological representations (persistence images) were employed as input for classifcation to predict who developed early hepatic decompensation within one year after their baseline MRI. Results: We reviewed 590 patients; 298 were excluded due to poor image quality or inadequate liver coverage, leaving 292 potentially eligible subjects, of which 169 subjects were included in the study. We trained our model using contrast-enhanced delayed phase T1-weighted images on a single center derivation cohort consisting of 54 patients (hepatic decompensation, n = 21; no hepatic decompensation, n = 33) and a multicenter independent validation cohort of 115 individuals (hepatic decompensation, n = 31; no hepatic decompensation, n = 84). When our model was applied in the independent validation cohort, it remained predictive of early hepatic decompensation (area under the receiver operating characteristic curve = 0.84). Conclusions: Algebraic topology-based ML is a methodological approach that can predict outcomes in patients with PSC and has the potential for application in other chronic liver diseases

Continuous Indexing of Fibrosis (CIF): Improving the Assessment and Classification of MPN Patients (2022)

Hosuk Ryou, Korsuk Sirinukunwattana, Alan Aberdeen, Gillian Grindstaff, Bernadette Stolz, Helen Byrne, Heather A. Harrington, Nikolaos Sousos, Anna L. Godfrey, Claire N. Harrison, Bethan Psaila, Adam J. Mead, Gabrielle Rees, Gareth D. H. Turner, Jens Rittscher, Daniel Royston

Abstract

The detection and grading of fibrosis in myeloproliferative neoplasms (MPN) is an important component of disease classification, prognostication and disease monitoring. However, current fibrosis grading systems are only semi-quantitative and fail to capture sample heterogeneity. To improve the detection, quantitation and representation of reticulin fibrosis, we developed a machine learning (ML) approach using bone marrow trephine (BMT) samples (n = 107) from patients diagnosed with MPN or a reactive / nonneoplastic marrow. The resulting Continuous Indexing of Fibrosis (CIF) enhances the detection and monitoring of fibrosis within BMTs, and aids the discrimination of MPN subtypes. When combined with megakaryocyte feature analysis, CIF discriminates between the frequently challenging differential diagnosis of essential thrombocythemia (ET) and pre-fibrotic myelofibrosis (pre-PMF) with high predictive accuracy [area under the curve = 0.94]. CIF also shows significant promise in the identification of MPN patients at risk of disease progression; analysis of samples from 35 patients diagnosed with ET and enrolled in the Primary Thrombocythemia-1 (PT-1) trial identified features predictive of post-ET myelofibrosis (area under the curve = 0.77). In addition to these clinical applications, automated analysis of fibrosis has clear potential to further refine disease classification boundaries and inform future studies of the micro-environmental factors driving disease initiation and progression in MPN and other stem cell disorders. The image analysis methods used to generate CIF can be readily integrated with those of other key morphological features in MPNs, including megakaryocyte morphology, that lie beyond the scope of conventional histological assessment. Key PointsMachine learning enables an objective and quantitative description of reticulin fibrosis within the bone marrow of patients with myeloproliferative neoplasms (MPN),Automated analysis and Continuous Indexing of Fibrosis (CIF) captures heterogeneity within MPN samples and has utility in refined classification and disease monitoringQuantitative fibrosis assessment combined with topological data analysis may help to predict patients at increased risk of progression to post-ET myelofibrosis, and assist in the discrimination of ET and pre-fibrotic PMF (pre-PMF)

Two-Tier Mapper, an Unbiased Topology-Based Clustering Method for Enhanced Global Gene Expression Analysis (2019)

Rachel Jeitziner, Mathieu Carrière, Jacques Rougemont, Steve Oudot, Kathryn Hess, Cathrin Brisken

Abstract

MOTIVATION: Unbiased clustering methods are needed to analyze growing numbers of complex datasets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small datasets. To overcome these shortcomings we used topological data analysis, an emerging field of mathematics that discerns additional feature and discovers hidden insights on datasets and has a wide application range. RESULTS: We have developed a topology-based clustering method called Two-Tier Mapper (TTMap) for enhanced analysis of global gene expression datasets. First, TTMap discerns divergent features in the control group, adjusts for them, and identifies outliers. Second, the deviation of each test sample from the control group in a high-dimensional space is computed, and the test samples are clustered using a new Mapper-based topological algorithm at two levels: a global tier and local tiers. All parameters are either carefully chosen or data-driven, avoiding any user-induced bias. The method is stable, different datasets can be combined for analysis, and significant subgroups can be identified. It outperforms current clustering methods in sensitivity and stability on synthetic and biological datasets, in particular when sample sizes are small; outcome is not affected by removal of control samples, by choice of normalization, or by subselection of data. TTMap is readily applicable to complex, highly variable biological samples and holds promise for personalized medicine. AVAILABILITY AND IMPLEMENTATION: TTMap is supplied as an R package in Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

Monica Nicolau, Arnold J. Levine, Gunnar Carlsson

Abstract

High-throughput biological data, whether generated as sequencing, transcriptional microarrays, proteomic, or other means, continues to require analytic methods that address its high dimensional aspects. Because the computational part of data analysis ultimately identifies shape characteristics in the organization of data sets, the mathematics of shape recognition in high dimensions continues to be a crucial part of data analysis. This article introduces a method that extracts information from high-throughput microarray data and, by using topology, provides greater depth of information than current analytic techniques. The method, termed Progression Analysis of Disease (PAD), first identifies robust aspects of cluster analysis, then goes deeper to find a multitude of biologically meaningful shape characteristics in these data. Additionally, because PAD incorporates a visualization tool, it provides a simple picture or graph that can be used to further explore these data. Although PAD can be applied to a wide range of high-throughput data types, it is used here as an example to analyze breast cancer transcriptional data. This identified a unique subgroup of Estrogen Receptor-positive (ER+) breast cancers that express high levels of c-MYB and low levels of innate inflammatory genes. These patients exhibit 100% survival and no metastasis. No supervised step beyond distinction between tumor and healthy patients was used to identify this subtype. The group has a clear and distinct, statistically significant molecular signature, it highlights coherent biology but is invisible to cluster methods, and does not fit into the accepted classification of Luminal A/B, Normal-like subtypes of ER+ breast cancers. We denote the group as c-MYB+ breast cancer.

How Many Parameters Does It Take to Describe Disease Tolerance? (2016)

Alexander Louie, Kyung Han Song, Alejandra Hotson, Ann Thomas Tate, David S. Schneider

Export citation

Using Topological Data Analysis for Diagnosis Pulmonary Embolism (2015)

M. Rucco, E. Merelli, D. Herman, D. Ramanan, T. Petrossian, L. Falsetti, C. Nitti, A. Salvi

Community Resources

Code

Zebrafish Behavior: Opportunities and Challenges (2017)

Michael B. Orger, Gonzalo G. de Polavieja

Export citation

Innate and Adaptive T Cells in Asthmatic Patients: Relationship to Severity and Disease Mechanisms (2015)

Timothy SC Hinks, Xiaoying Zhou, Karl J. Staples, Borislav D. Dimitrov, Alexander Manta, Tanya Petrossian, Pek Y. Lum, Caroline G. Smith, Jon A. Ward, Peter H. Howarth

Export citation

Acridine Derivatives as Inhibitors of the IRE1α–XBP1 Pathway Are Cytotoxic to Human Multiple Myeloma (2016)

Dadi Jiang, Arvin B. Tam, Muthuraman Alagappan, Michael P. Hay, Aparna Gupta, Margaret M. Kozak, David E. Solow-Cordero, Pek Y. Lum, Nicholas C. Denko, Amato J. Giaccia

Export citation

A Multimodal Data Analysis Approach for Targeted Drug Discovery Involving Topological Data Analysis (TDA) (2016)

Muthuraman Alagappan, Dadi Jiang, Nicholas Denko, Albert C. Koong

Export citation

Complex Politics: A Quantitative Semantic and Topological Analysis of Uk House of Commons Debates (2015)

Stefano Gurciullo, Michael Smallegan, María Pereda, Federico Battiston, Alice Patania, Sebastian Poledna, Daniel Hedblom, Bahattin Tolga Oztan, Alexander Herzog, Peter John

Export citation

Integrated Detection of Pathogens and Host Biomarkers for Wounds (2014)

Crystal Jaing

Export citation

Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis (2016)

Yen-Sin Ang, Renee N. Rivas, Alexandre JS Ribeiro, Rohith Srivas, Janell Rivera, Nicole R. Stone, Karishma Pratt, Tamer MA Mohamed, Ji-Dong Fu, C. Ian Spencer

Export citation

Biochemical Association of Metabolic Profile and Microbiome in Chronic Pressure Ulcer Wounds (2015)

Mary Cloud B. Ammons, Kathryn Morrissey, Brian P. Tripet, James T. Van Leuven, Anne Han, Gerald S. Lazarus, Jonathan M. Zenilman, Philip S. Stewart, Garth A. James, Valérie Copié

Export citation

Topological Features in Cancer Gene Expression Data (2014)

S. Lockwood, B. Krishnamoorthy

Identification of Type 2 Diabetes Subgroups Through Topological Analysis of Patient Similarity (2015)

Li Li, Wei-Yi Cheng, Benjamin S. Glicksberg, Omri Gottesman, Ronald Tamler, Rong Chen, Erwin P. Bottinger, Joel T. Dudley

Export citation

Topological Data Analysis With Metric Learning and an Application to High-Dimensional Football Data (2015)

David Alejandro Perdomo Meza

Export citation

Topological Data Analysis Quantifies Biological Nano-Structure From Single Molecule Localization Microscopy (2020)

Jeremy A. Pike, Abdullah O. Khan, Chiara Pallini, Steven G. Thomas, Markus Mund, Jonas Ries, Natalie S. Poulter, Iain B. Styles

Abstract

AbstractMotivation. Localization microscopy data is represented by a set of spatial coordinates, each corresponding to a single detection, that form a point cl

Resting-State fMRI Functional Connectivity: Big Data Preprocessing Pipelines and Topological Data Analysis (2017)

Angkoon Phinyomark, Esther Ibáñez-Marcelo, Giovanni Petri

Export citation

Specific Mutations in H5N1 Mainly Impact the Magnitude and Velocity of the Host Response in Mice (2013)

Nicolas Tchitchek, Amie J. Eisfeld, Jennifer Tisoncik-Go, Laurence Josset, Lisa E. Gralinski, Christophe Bécavin, Susan C. Tilton, Bobbie-Jo Webb-Robertson, Martin T. Ferris, Allison L. Totura

Export citation

A Survey of Topological Data Analysis Methods for Big Data in Healthcare Intelligence (2019)

Milan Joshi, Dhananjay Joshi

Export citation

A New Approach to Investigate the Association Between Brain Functional Connectivity and Disease Characteristics of Attention-Deficit/Hyperactivity Disorder: Topological Neuroimaging Data Analysis (2015)

Sunghyon Kyeong, Seonjeong Park, Keun-Ah Cheon, Jae-Jin Kim, Dong-Ho Song, Eunjoo Kim

Export citation

A Transcriptome-Driven Analysis of Epithelial Brushings and Bronchial Biopsies to Define Asthma Phenotypes in U-Biopred (2017)

Chih-Hsi Scott Kuo, Stelios Pavlidis, Matthew Loza, Fred Baribaud, Anthony Rowe, Ioannis Pandis, Uruj Hoda, Christos Rossios, Ana Sousa, Susan J. Wilson

Export citation

A Collaborative Visual Analytics Suite for Protein Folding Research (2014)

William Harvey, In-Hee Park, Oliver Rübel, Valerio Pascucci, Peer-Timo Bremer, Chenglong Li, Yusu Wang

Exploring Hyperspectral Imaging Data Sets With Topological Data Analysis (2017)

Ludovic Duponchel

Export citation

Construction of Personalized Health Curves in Disease Space for Human Malaria Infections (2015)

Else M. Bijker, Brenda Y. Torres, David S. Schneider, Robert W. Sauerwein

A Severe Asthma Disease Signature From Gene Expression Profiling of Peripheral Blood From U-Biopred Cohorts (2017)

Jeannette Bigler, Michael Boedigheimer, James PR Schofield, Paul J. Skipp, Julie Corfield, Anthony Rowe, Ana R. Sousa, Martin Timour, Lori Twehues, Xuguang Hu

Export citation

Microarray of 16S rRNA Gene Probes for Quantifying Population Differences Across Microbiome Samples (2014)

Alexander J. Probst, Pek Yee Lum, Bettina John, Eric A. Dubinsky, Yvette M. Piceno, Lauren M. Tom, Gary L. Andersen, Zhili He, Todd Z. DeSantis

Export citation

Multidimensional Endotyping in Patients With Severe Asthma Reveals Inflammatory Heterogeneity in Matrix Metalloproteinases and Chitinase 3–like Protein 1 (2016)

Timothy SC Hinks, Tom Brown, Laurie CK Lau, Hitasha Rupani, Clair Barber, Scott Elliott, Jon A. Ward, Junya Ono, Shoichiro Ohta, Kenji Izuhara

Export citation

Quantifying Similarity of Pore-Geometry in Nanoporous Materials (2017)

Yongjin Lee, Senja D. Barthel, Pawe\\textbackslash\l D\\textbackslash\lotko, S. Mohamad Moosavi, Kathryn Hess, Berend Smit

Export citation

Networked Data Analytics: Network Comparison and Applied Graph Signal Processing (2018)

Weiyu Huang

An Introduction to a New Text Classification and Visualization for Natural Language Processing Using Topological Data Analysis (2019)

Naiereh Elyasi, Mehdi Hosseini Moghadam

Export citation

Patient Similarity: Emerging Concepts in Systems and Precision Medicine (2016)

Sherry-Ann Brown

Export citation

Integrative Methods for Analyzing Big Data in Precision Medicine (2016)

Vladimir Gligorijević, Noël Malod-Dognin, Nataša Pržulj

Export citation

Novel Subgroups of Attention-Deficit/Hyperactivity Disorder Identified by Topological Data Analysis and Their Functional Network Modular Organizations (2017)

Sunghyon Kyeong, Jae-Jin Kim, Eunjoo Kim

Export citation

Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury (2015)

Jessica L. Nielson, Jesse Paquette, Aiwen W. Liu, Cristian F. Guandique, C. Amy Tovar, Tomoo Inoue, Karen-Amanda Irvine, John C. Gensel, Jennifer Kloke, Tanya C. Petrossian, Pek Y. Lum, Gunnar E. Carlsson, Geoffrey T. Manley, Wise Young, Michael S. Beattie, Jacqueline C. Bresnahan, Adam R. Ferguson

Abstract

Data-driven discovery in complex neurological disorders has potential to extract meaningful knowledge from large, heterogeneous datasets. Here the authors apply topological data analysis to assess therapeutic effects in preclinical traumatic brain injury and spinal cord injury research studies.

Topographical Transcriptome Mapping of the Mouse Medial Ganglionic Eminence by Spatially Resolved RNA-seq (2014)

Sabrina Zechel, Pawel Zajac, Peter Lönnerberg, Carlos F. Ibáñez, Sten Linnarsson

Abstract

Cortical interneurons originating from the medial ganglionic eminence, MGE, are among the most diverse cells within the CNS. Different pools of proliferating progenitor cells are thought to exist in the ventricular zone of the MGE, but whether the underlying subventricular and mantle regions of the MGE are spatially patterned has not yet been addressed. Here, we combined laser-capture microdissection and multiplex RNA-sequencing to map the transcriptome of MGE cells at a spatial resolution of 50 μm.

Structural Insight Into RNA Hairpin Folding Intermediates (2008)

Gregory R. Bowman, Xuhui Huang, Yuan Yao, Jian Sun, Gunnar Carlsson, Leonidas J. Guibas, Vijay S. Pande

Abstract

, Hairpins are a ubiquitous secondary structure motif in RNA molecules. Despite their simple structure, there is some debate over whether they fold in a two-state or multi-state manner. We have studied the folding of a small tetraloop hairpin using a serial version of replica exchange molecular dynamics on a distributed computing environment. On the basis of these simulations, we have identified a number of intermediates that are consistent with experimental results. We also find that folding is not simply the reverse of high-temperature unfolding and suggest that this may be a general feature of biomolecular folding.

Classification of Skin Lesions by Topological Data Analysis Alongside With Neural Network (2020)

Naiereh Elyasi, Mehdi Hosseini Moghadam

Abstract

In this paper we use TDA mapper alongside with deep convolutional neural networks in the classification of 7 major skin diseases. First we apply kepler mapper with neural network as one of its filter steps to classify the dataset HAM10000. Mapper visualizes the classification result by a simplicial complex, where neural network can not do this alone, but as a filter step neural network helps to classify data better. Furthermore we apply TDA mapper and persistent homology to understand the weights of layers of mobilenet network in different training epochs of HAM10000. Also we use persistent diagrams to visualize the results of analysis of layers of mobilenet network.

Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition (2007)

Gurjeet Singh, Facundo Mémoli, Gunnar Carlsson

Abstract

We present a computational method for extracting simple descriptions of high dimensional data sets in the form of simplicial complexes. Our method, called Mapper, is based on the idea of partial clustering of the data guided by a set of functions deﬁned on the data. The proposed method is not dependent on any particular clustering algorithm, i.e. any clustering algorithm may be used with Mapper. We implement this method and present a few sample applications in which simple descriptions of the data present important information about its structure.

Export citation

Topological Pattern Recognition for Point Cloud Data* (2014)

Gunnar Carlsson

Abstract

In this paper we discuss the adaptation of the methods of homology from algebraic topology to the problem of pattern recognition in point cloud data sets. The method is referred to as persistent homology, and has numerous applications to scientific problems. We discuss the definition and computation of homology in the standard setting of simplicial complexes and topological spaces, then show how one can obtain useful signatures, called barcodes, from finite metric spaces, thought of as sampled from a continuous object. We present several different cases where persistent homology is used, to illustrate the different ways in which the method can be applied.

Mapping Firms' Locations in Technological Space: A Topological Analysis of Patent Statistics (2020)

Emerson G. Escolar, Yasuaki Hiraoka, Mitsuru Igami, Yasin Ozcan

Abstract

Where do ﬁrms innovate? Mapping their locations in technological space is diﬃcult, because it is high dimensional and unstructured. We address this issue by using a method in computational topology called the Mapper algorithm, which combines local clustering with global reconstruction. We apply this method to a panel of 333 major ﬁrms’ patent portfolios in 1976–2005 across 430 technological areas. Results suggest the Mapper graph captures salient patterns in ﬁrms’ patenting histories, and our measures of their uniqueness (the length of “ﬂares”) are correlated with ﬁrms’ ﬁnancial performances in a statistically and economically signiﬁcant manner. We then compare this approach with a widely used clustering method by Jaﬀe (1989) to highlight additional ﬁndings.

Conserved Abundance and Topological Features in Chromatin-Remodeling Protein Interaction Networks (2015)

Mihaela E Sardiu, Joshua M Gilmore, Brad D Groppe, Damir Herman, Sreenivasa R Ramisetty, Yong Cai, Jingji Jin, Ronald C Conaway, Joan W Conaway, Laurence Florens, Michael P Washburn

Abstract

Abstract The study of conserved protein interaction networks seeks to better understand the evolution and regulation of protein interactions. Here, we present a quantitative proteomic analysis of 18 orthologous baits from three distinct chromatin-remodeling complexes in Saccharomyces cerevisiae and Homo sapiens. We demonstrate that abundance levels of orthologous proteins correlate strongly between the two organisms and both networks have highly similar topologies. We therefore used the protein abundances in one species to cross-predict missing protein abundance levels in the other species. Lastly, we identified a novel conserved low-abundance subnetwork further demonstrating the value of quantitative analysis of networks.

Topological Data Analysis for Genomics and Evolution: Topology in Biology (2019)

Raul Rabadan, Andrew J. Blumberg

Abstract

Biology has entered the age of Big Data. A technical revolution has transformed the field, and extracting meaningful information from large biological data sets is now a central methodological challenge. Algebraic topology is a well-established branch of pure mathematics that studies qualitative descriptors of the shape of geometric objects. It aims to reduce comparisons of shape to a comparison of algebraic invariants, such as numbers, which are typically easier to work with. Topological data analysis is a rapidly developing subfield that leverages the tools of algebraic topology to provide robust multiscale analysis of data sets. This book introduces the central ideas and techniques of topological data analysis and its specific applications to biology, including the evolution of viruses, bacteria and humans, genomics of cancer, and single cell characterization of developmental processes. Bridging two disciplines, the book is for researchers and graduate students in genomics and evolutionary biology as well as mathematicians interested in applied topology.

Topological Data Analysis: Concepts, Computation, and Applications in Chemical Engineering (2021)

Alexander D. Smith, Paweł Dłotko, Victor M. Zavala

Abstract

A primary hypothesis that drives scientific and engineering studies is that data has structure. The dominant paradigms for describing such structure are statistics (e.g., moments, correlation functions) and signal processing (e.g., convolutional neural nets, Fourier series). Topological Data Analysis (TDA) is a field of mathematics that analyzes data from a fundamentally different perspective. TDA represents datasets as geometric objects and provides dimensionality reduction techniques that project such objects onto low-dimensional descriptors. The key properties of these descriptors (also known as topological features) are that they provide multiscale information and that they are stable under perturbations (e.g., noise, translation, and rotation). In this work, we review the key mathematical concepts and methods of TDA and present different applications in chemical engineering.

Identification of Topological Network Modules in Perturbed Protein Interaction Networks (2017)

Mihaela E. Sardiu, Joshua M. Gilmore, Brad Groppe, Laurence Florens, Michael P. Washburn

Abstract

Biological networks consist of functional modules, however detecting and characterizing such modules in networks remains challenging. Perturbing networks is one strategy for identifying modules. Here we used an advanced mathematical approach named topological data analysis (TDA) to interrogate two perturbed networks. In one, we disrupted the S. cerevisiae INO80 protein interaction network by isolating complexes after protein complex components were deleted from the genome. In the second, we reanalyzed previously published data demonstrating the disruption of the human Sin3 network with a histone deacetylase inhibitor. Here we show that disrupted networks contained topological network modules (TNMs) with shared properties that mapped onto distinct locations in networks. We define TMNs as proteins that occupy close network positions depending on their coordinates in a topological space. TNMs provide new insight into networks by capturing proteins from different categories including proteins within a complex, proteins with shared biological functions, and proteins disrupted across networks.

Topological Analysis Reveals State Transitions in Human Gut and Marine Bacterial Communities (2020)

William K. Chang, David VanInsberghe, Libusha Kelly

Abstract

Microbiome dynamics influence the health and functioning of human physiology and the environment and are driven in part by interactions between large numbers of microbial taxa, making large-scale prediction and modeling a challenge. Here, using topological data analysis, we identify states and dynamical features relevant to macroscopic processes. We show that gut disease processes and marine geochemical events are associated with transitions between community states, defined as topological features of the data density. We find a reproducible two-state succession during recovery from cholera in the gut microbiomes of multiple patients, evidence of dynamic stability in the gut microbiome of a healthy human after experiencing diarrhea during travel, and periodic state transitions in a marine Prochlorococcus community driven by water column cycling. Our approach bridges small-scale fluctuations in microbiome composition and large-scale changes in phenotype without details of underlying mechanisms, and provides an assessment of microbiome stability and its relation to human and environmental health.

Reconceiving the Hippocampal Map as a Topological Template (2014)

Yuri Dabaghian, Vicky L. Brandt, Loren M. Frank

Abstract

The role of the hippocampus in spatial cognition is incontrovertible yet controversial. Place cells, initially thought to be location-specifiers, turn out to respond promiscuously to a wide range of stimuli. Here we test the idea, which we have recently demonstrated in a computational model, that the hippocampal place cells may ultimately be interested in a space's topological qualities (its connectivity) more than its geometry (distances and angles); such higher-order functioning would be more consistent with other known hippocampal functions. We recorded place cell activity in rats exploring morphing linear tracks that allowed us to dissociate the geometry of the track from its topology. The resulting place fields preserved the relative sequence of places visited along the track but did not vary with the metrical features of the track or the direction of the rat's movement. These results suggest a reinterpretation of previous studies and new directions for future experiments.

Topic Detection in Twitter Using Topology Data Analysis (2015)

Pablo Torres-Tramón, Hugo Hromic, Bahareh Rahmanzadeh Heravi

Abstract

The massive volume of content generated by social media greatly exceeds human capacity to manually process this data in order to identify topics of interest. As a solution, various automated topic detection approaches have been proposed, most of which are based on document clustering and burst detection. These approaches normally represent textual features in standard n-dimensional Euclidean metric spaces. However, in these cases, directly filtering noisy documents is challenging for topic detection. Instead we propose Topol, a topic detection method based on Topology Data Analysis (TDA) that transforms the Euclidean feature space into a topological space where the shapes of noisy irrelevant documents are much easier to distinguish from topically-relevant documents. This topological space is organised in a network according to the connectivity of the points, i.e. the documents, and by only filtering based on the size of the connected components we obtain competitive results compared to other state of the art topic detection methods.

Extracting Insights From the Shape of Complex Data Using Topology (2013)

P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson

Abstract

This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.

Single-Cell Topological RNA-Seq Analysis Reveals Insights Into Cellular Differentiation and Development (2017)

Abbas H. Rizvi, Pablo G. Camara, Elena K. Kandror, Thomas J. Roberts, Ira Schieren, Tom Maniatis, Raul Rabadan

Abstract

Transcriptional programs control cellular lineage commitment and differentiation during development. Understanding cell fate has been advanced by studying single-cell RNA-seq, but is limited by the assumptions of current analytic methods regarding the structure of data. We present single-cell topological data analysis (scTDA), an algorithm for topology-based computational analyses to study temporal, unbiased transcriptional regulation. Compared to other methods, scTDA is a non-linear, model-independent, unsupervised statistical framework that can characterize transient cellular states. We applied scTDA to the analysis of murine embryonic stem cell (mESC) differentiation in vitro in response to inducers of motor neuron differentiation. scTDA resolved asynchrony and continuity in cellular identity over time, and identified four transient states (pluripotent, precursor, progenitor, and fully differentiated cells) based on changes in stage-dependent combinations of transcription factors, RNA-binding proteins and long non-coding RNAs. scTDA can be applied to study asynchronous cellular responses to either developmental cues or environmental perturbations.

Topological Data Analysis of Single-Cell Hi-C Contact Maps (2020)

Mathieu Carrière, Raúl Rabadán

Abstract

Due to recent breakthroughs in high-throughput sequencing, it is now possible to use chromosome conformation capture (CCC) to understand the three dimensional conformation of DNA at the whole genome level, and to characterize it with the so-called contact maps. This is very useful since many biological processes are correlated with DNA folding, such as DNA transcription. However, the methods for the analysis of such conformations are still lacking mathematical guarantees and statistical power. To handle this issue, we propose to use the Mapper, which is a standard tool of Topological Data Analysis (TDA) that allows one to efficiently encode the inherent continuity and topology of underlying biological processes in data, in the form of a graph with various features such as branches and loops. In this article, we show how recent statistical techniques developed in TDA for the Mapper algorithm can be extended and leveraged to formally define and statistically quantify the presence of topological structures coming from biological phenomena, such as the cell cyle, in datasets of CCC contact maps.

Identification of Key Features Using Topological Data Analysis for Accurate Prediction of Manufacturing System Outputs (2017)

Wei Guo, Ashis G. Banerjee

Abstract

Topological data analysis (TDA) has emerged as one of the most promising approaches to extract insights from high-dimensional data of varying types such as images, point clouds, and meshes, in an unsupervised manner. To the best of our knowledge, here, we provide the first successful application of TDA in the manufacturing systems domain. We apply a widely used TDA method, known as the Mapper algorithm, on two benchmark data sets for chemical process yield prediction and semiconductor wafer fault detection, respectively. The algorithm yields topological networks that capture the intrinsic clusters and connections among the clusters present in the data sets, which are difficult to detect using traditional methods. We select key process variables or features that impact the system outcomes by analyzing the network shapes. We then use predictive models to evaluate the impact of the selected features. Results show that the models achieve at least the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.

Improved Understanding of Aqueous Solubility Modeling Through Topological Data Analysis (2018)

Mariam Pirashvili, Lee Steinberg, Francisco Belchi Guillamon, Mahesan Niranjan, Jeremy G. Frey, Jacek Brodzki

Abstract

Topological data analysis is a family of recent mathematical techniques seeking to understand the ‘shape’ of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.

Molecular Phenotyping Using Networks, Diffusion, and Topology: Soft Tissue Sarcoma (2019)

James C. Mathews, Maryam Pouryahya, Caroline Moosmüller, Yannis G. Kevrekidis, Joseph O. Deasy, Allen Tannenbaum

Abstract

Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.

Genomics Data Analysis via Spectral Shape and Topology (2022)

Erik J. Amézquita, Farzana Nasrin, Kathleen M. Storey, Masato Yoshizawa

Abstract

Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper and differential gene expression. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-SNE. Although Mapper shows promise in analyzing high-dimensional data, building tools to statistically analyze Mapper graphical structures is limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.

Topological Gene Expression Networks Recapitulate Brain Anatomy and Function (2019)

Alice Patania, Pierluigi Selvaggi, Mattia Veronese, Ottavia Dipasquale, Paul Expert, Giovanni Petri

Abstract

Understanding how gene expression translates to and affects human behavior is one of the ultimate goals of neuroscience. In this paper, we present a pipeline based on Mapper, a topological simplification tool, to analyze gene co-expression data. We first validate the method by reproducing key results from the literature on the Allen Human Brain Atlas and the correlations between resting-state fMRI and gene co-expression maps. We then analyze a dopamine-related gene set and find that co-expression networks produced by Mapper return a structure that matches the well-known anatomy of the dopaminergic pathway. Our results suggest that network based descriptions can be a powerful tool to explore the relationships between genetic pathways and their association with brain function and its perturbation due to illness and/or pharmacological challenges., In this paper, we described a gene co-expression analysis pipeline that produces networks that we show to be closely related to either brain function and to neurotransmitter pathways. Our results suggest that this pipeline could be developed into a platform enabling the exploration of the effects of physiological and pathological alterations to specific gene sets, including profiling drugs effects.

CD8 T-Cell Reactivity to Islet Antigens Is Unique to Type 1 While CD4 T-Cell Reactivity Exists in Both Type 1 and Type 2 Diabetes (2014)

Ghanashyam Sarikonda, Jeremy Pettus, Sonal Phatak, Sowbarnika Sachithanantham, Jacqueline F. Miller, Johnna D. Wesley, Eithon Cadag, Ji Chae, Lakshmi Ganesan, Ronna Mallios, Steve Edelman, Bjoern Peters, Matthias von Herrath

Abstract

Previous cross-sectional analyses demonstrated that CD8+ and CD4+ T-cell reactivity to islet-specific antigens was more prevalent in T1D subjects than in healthy donors (HD). Here, we examined T1D-associated epitope-specific CD4+ T-cell cytokine production and autoreactive CD8+ T-cell frequency on a monthly basis for one year in 10 HD, 33 subjects with T1D, and 15 subjects with T2D. Autoreactive CD4+ T-cells from both T1D and T2D subjects produced more IFN-γ when stimulated than cells from HD. In contrast, higher frequencies of islet antigen-specific CD8+ T-cells were detected only in T1D. These observations support the hypothesis that general beta-cell stress drives autoreactive CD4+ T-cell activity while islet over-expression of MHC class I commonly seen in T1D mediates amplification of CD8+ T-cells and more rapid beta-cell loss. In conclusion, CD4+ T-cell autoreactivity appears to be present in both T1D and T2D while autoreactive CD8+ T-cells are unique to T1D. Thus, autoreactive CD8+ cells may serve as a more T1D-specific biomarker.

A Novel Quality Clustering Methodology on Fab-Wide Wafer Map Images in Semiconductor Manufacturing (2022)

Yuan-Ming Hsu, Xiaodong Jia, Wenzhe Li, Jay Lee

Abstract

Abstract. In semiconductor manufacturing, clustering the fab-wide wafer map images is of critical importance for practitioners to understand the subclusters of wafer defects, recognize novel clusters or anomalies, and develop fast reactions to quality issues. However, due to the high-mix manufacturing of diversified wafer products of different sizes and technologies, it is difficult to cluster the wafer map images across the fab. This paper addresses this challenge by proposing a novel methodology for fab-wide wafer map data clustering. In the proposed methodology, a well-known deep learning technique, vision transformer with multi-head attention is first trained to convert binary wafer images of different sizes into condensed feature vectors for efficient clustering. Then, the Topological Data Analysis (TDA), which is widely used in biomedical applications, is employed to visualize the data clusters and identify the anomalies. The TDA yields a topological representation of high-dimensional big data as well as its local clusters by creating a graph that shows nodes corresponding to the clusters within the data. The effectiveness of the proposed methodology is demonstrated by clustering the public wafer map dataset WM-811k from the real application which has a total of 811,457 wafer map images. We further demonstrate the potential applicability of topology data analytics in the semiconductor area by visualization.

Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome (2014)

David Romano, Monica Nicolau, Eve-Marie Quintin, Paul K. Mazaika, Amy A. Lightbody, Heather Cody Hazlett, Joseph Piven, Gunnar Carlsson, Allan L. Reiss

Abstract

Fragile X syndrome (FXS), due to mutations of the FMR1 gene, is the most common known inherited cause of developmental disability as well as the most common single-gene risk factor for autism. Our goal was to examine variation in brain structure in FXS with topological data analysis (TDA), and to assess how such variation is associated with measures of IQ and autism-related behaviors. To this end, we analyzed imaging and behavioral data from young boys (n = 52; aged 1.57–4.15 years) diagnosed with FXS. Application of topological methods to structural MRI data revealed two large subgroups within the study population. Comparison of these subgroups showed significant between-subgroup neuroanatomical differences similar to those previously reported to distinguish children with FXS from typically developing controls (e.g., enlarged caudate). In addition to neuroanatomy, the groups showed significant differences in IQ and autism severity scores. These results suggest that despite arising from a single gene mutation, FXS may encompass two biologically, and clinically separable phenotypes. In addition, these findings underscore the potential of TDA as a powerful tool in the search for biological phenotypes of neuropsychiatric disorders. Hum Brain Mapp 35:4904–4915, 2014. © 2014 Wiley Periodicals, Inc.

Identification of Relevant Genetic Alterations in Cancer Using Topological Data Analysis (2020)

Raúl Rabadán, Yamina Mohamedi, Udi Rubin, Tim Chu, Adam N. Alghalith, Oliver Elliott, Luis Arnés, Santiago Cal, Álvaro J. Obaya, Arnold J. Levine, Pablo G. Cámara

Abstract

Large-scale cancer genomic studies enable the systematic identification of mutations that lead to the genesis and progression of tumors, uncovering the underlying molecular mechanisms and potential therapies. While some such mutations are recurrently found in many tumors, many others exist solely within a few samples, precluding detection by conventional recurrence-based statistical approaches. Integrated analysis of somatic mutations and RNA expression data across 12 tumor types reveals that mutations of cancer genes are usually accompanied by substantial changes in expression. We use topological data analysis to leverage this observation and uncover 38 elusive candidate cancer-associated genes, including inactivating mutations of the metalloproteinase ADAMTS12 in lung adenocarcinoma. We show that ADAMTS12−/− mice have a five-fold increase in the susceptibility to develop lung tumors, confirming the role of ADAMTS12 as a tumor suppressor gene. Our results demonstrate that data integration through topological techniques can increase our ability to identify previously unreported cancer-related alterations., Rare cancer mutations are often missed using recurrence-based statistical approaches, but are usually accompanied by changes in expression. Here the authors leverage this information to uncover several elusive candidate cancer-associated genes using topological data analysis.

Tracking Resilience to Infections by Mapping Disease Space (2016)

Brenda Y. Torres, Jose Henrique M. Oliveira, Ann Thomas Tate, Poonam Rath, Katherine Cumnock, David S. Schneider

Abstract

Infected hosts differ in their responses to pathogens; some hosts are resilient and recover their original health, whereas others follow a divergent path and die. To quantitate these differences, we propose mapping the routes infected individuals take through “disease space.” We find that when plotting physiological parameters against each other, many pairs have hysteretic relationships that identify the current location of the host and predict the future route of the infection. These maps can readily be constructed from experimental longitudinal data, and we provide two methods to generate the maps from the cross-sectional data that is commonly gathered in field trials. We hypothesize that resilient hosts tend to take small loops through disease space, whereas nonresilient individuals take large loops. We support this hypothesis with experimental data in mice infected with Plasmodium chabaudi, finding that dying mice trace a large arc in red blood cells (RBCs) by reticulocyte space as compared to surviving mice. We find that human malaria patients who are heterozygous for sickle cell hemoglobin occupy a small area of RBCs by reticulocyte space, suggesting this approach can be used to distinguish resilience in human populations. This technique should be broadly useful in describing the in-host dynamics of infections in both model hosts and patients at both population and individual levels.

Topological Data Analysis: A Promising Big Data Exploration Tool in Biology, Analytical Chemistry and Physical Chemistry (2016)

Marc Offroy, Ludovic Duponchel

Abstract

An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data).

Topological Data Analysis of Escherichia Coli O157:H7 and Non-O157 Survival in Soils (2014)

Abasiofiok M. Ibekwe, Jincai Ma, David E. Crowley, Ching-Hong Yang, Alexis M. Johnson, Tanya C. Petrossian, Pek Y. Lum

Abstract

Shiga toxin-producing E. coli O157:H7 and non-O157 have been implicated in many foodborne illnesses caused by the consumption of contaminated fresh produce. However, data on their persistence in soils are limited due to the complexity in datasets generated from different environmental variables and bacterial taxa. There is a continuing need to distinguish the various environmental variables and different bacterial groups to understand the relationships among these factors and the pathogen survival. Using an approach called Topological Data Analysis (TDA); we reconstructed the relationship structure of E. coli O157 and non-O157 survival in 32 soils (16 organic and 16 conventionally managed soils) from California (CA) and Arizona (AZ) with a multi-resolution output. In our study, we took a community approach based on total soil microbiome to study community level survival and examining the network of the community as a whole and the relationship between its topology and biological processes. TDA produces a geometric representation of complex data sets. Network analysis showed that Shiga toxin negative strain E. coli O157:H7 4554 survived significantly longer in comparison to E. coli O157:H7 EDL933, while the survival time of E. coli O157:NM was comparable to that of E. coli O157:H7 strain 933 in all of the tested soils. Two non-O157 strains, E. coli O26:H11 and E. coli O103:H2 survived much longer than E. coli O91:H21 and the three strains of E. coli O157. We show that there are complex interactions between E. coli strain survival, microbial community structures, and soil parameters.

MRI and Biomechanics Multidimensional Data Analysis Reveals R2 -R1ρ as an Early Predictor of Cartilage Lesion Progression in Knee Osteoarthritis (2017)

Valentina Pedoia, Jenny Haefeli, Kazuhito Morioka, Hsiang-Ling Teng, Lorenzo Nardo, Richard B. Souza, Adam R. Ferguson, Sharmila Majumdar

Abstract

PURPOSE: To couple quantitative compositional MRI, gait analysis, and machine learning multidimensional data analysis to study osteoarthritis (OA). OA is a multifactorial disorder accompanied by biochemical and morphological changes in the articular cartilage, modulated by skeletal biomechanics and gait. While we can now acquire detailed information about the knee joint structure and function, we are not yet able to leverage the multifactorial factors for diagnosis and disease management of knee OA. MATERIALS AND METHODS: We mapped 178 subjects in a multidimensional space integrating: demographic, clinical information, gait kinematics and kinetics, cartilage compositional T1ρ and T2 and R2 -R1ρ (1/T2 -1/T1ρ ) acquired at 3T and whole-organ magnetic resonance imaging score morphological grading. Topological data analysis (TDA) and Kolmogorov-Smirnov test were adopted for data integration, analysis, and hypothesis generation. Regression models were used for hypothesis testing. RESULTS: The results of the TDA showed a network composed of three main patient subpopulations, thus potentially identifying new phenotypes. T2 and T1ρ values (T2 lateral femur P = 1.45*10-8 , T1ρ medial tibia P = 1.05*10-5 ), the presence of femoral cartilage defects (P = 0.0013), lesions in the meniscus body (P = 0.0035), and race (P = 2.44*10-4 ) were key markers in the subpopulation classification. Within one of the subpopulations we observed an association between the composite metric R2 -R1ρ and the longitudinal progression of cartilage lesions. CONCLUSION: The analysis presented demonstrates some of the complex multitissue biochemical and biomechanical interactions that define joint degeneration and OA using a multidimensional approach, and potentially indicates that R2 -R1ρ may be an imaging biomarker for early OA. LEVEL OF EVIDENCE: 3 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;47:78-90.

Topological Data Analysis Reveals a Core Gene Expression Backbone That Defines Form and Function Across Flowering Plants (2023)

Sourabh Palande, Joshua A. M. Kaste, Miles D. Roberts, Kenia Segura Abá, Carly Claucherty, Jamell Dacon, Rei Doko, Thilani B. Jayakody, Hannah R. Jeffery, Nathan Kelly, Andriana Manousidaki, Hannah M. Parks, Emily M. Roggenkamp, Ally M. Schumacher, Jiaxin Yang, Sarah Percival, Jeremy Pardo, Aman Y. Husbands, Arjun Krishnan, Beronda L. Montgomery, Elizabeth Munch, Addie M. Thompson, Alejandra Rougon-Cardoso, Daniel H. Chitwood, Robert VanBuren

Abstract

Since they emerged approximately 125 million years ago, flowering plants have evolved to dominate the terrestrial landscape and survive in the most inhospitable environments on earth. At their core, these adaptations have been shaped by changes in numerous, interconnected pathways and genes that collectively give rise to emergent biological phenomena. Linking gene expression to morphological outcomes remains a grand challenge in biology, and new approaches are needed to begin to address this gap. Here, we implemented topological data analysis (TDA) to summarize the high dimensionality and noisiness of gene expression data using lens functions that delineate plant tissue and stress responses. Using this framework, we created a topological representation of the shape of gene expression across plant evolution, development, and environment for the phylogenetically diverse flowering plants. The TDA-based Mapper graphs form a well-defined gradient of tissues from leaves to seeds, or from healthy to stressed samples, depending on the lens function. This suggests that there are distinct and conserved expression patterns across angiosperms that delineate different tissue types or responses to biotic and abiotic stresses. Genes that correlate with the tissue lens function are enriched in central processes such as photosynthetic, growth and development, housekeeping, or stress responses. Together, our results highlight the power of TDA for analyzing complex biological data and reveal a core expression backbone that defines plant form and function.

Community Resources

Code

Using Multidimensional Topological Data Analysis to Identify Traits of Hip Osteoarthritis (2018)

Jasmine Rossi‐deVries, Valentina Pedoia, Michael A. Samaan, Adam R. Ferguson, Richard B. Souza, Sharmila Majumdar

Abstract

Background Osteoarthritis (OA) is a multifaceted disease with many variables affecting diagnosis and progression. Topological data analysis (TDA) is a state-of-the-art big data analytics tool that can combine all variables into multidimensional space. TDA is used to simultaneously analyze imaging and gait analysis techniques. Purpose To identify biochemical and biomechanical biomarkers able to classify different disease progression phenotypes in subjects with and without radiographic signs of hip OA. Study Type Longitudinal study for comparison of progressive and nonprogressive subjects. Population In all, 102 subjects with and without radiographic signs of hip osteoarthritis. Field Strength/Sequence 3T, SPGR 3D MAPSS T1ρ/T2, intermediate-weighted fat-suppressed fast spin-echo (FSE). Assessment Multidimensional data analysis including cartilage composition, bone shape, Kellgren–Lawrence (KL) classification of osteoarthritis, scoring hip osteoarthritis with MRI (SHOMRI), hip disability and osteoarthritis outcome score (HOOS). Statistical Tests Analysis done using TDA, Kolmogorov–Smirnov (KS) testing, and Benjamini-Hochberg to rank P-value results to correct for multiple comparisons. Results Subjects in the later stages of the disease had an increased SHOMRI score (P \textless 0.0001), increased KL (P = 0.0012), and older age (P \textless 0.0001). Subjects in the healthier group showed intact cartilage and less pain. Subjects found between these two groups had a range of symptoms. Analysis of this subgroup identified knee biomechanics (P \textless 0.0001) as an initial marker of the disease that is noticeable before the morphological progression and degeneration. Further analysis of an OA subgroup with femoroacetabular impingement (FAI) showed anterior labral tears to be the most significant marker (P = 0.0017) between those FAI subjects with and without OA symptoms. Data Conclusion The data-driven analysis obtained with TDA proposes new phenotypes of these subjects that partially overlap with the radiographic-based classical disease status classification and also shows the potential for further examination of an early onset biomechanical intervention. Level of Evidence: 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;48:1046–1058.

Community Resources

Code (Software)

Uncovering Precision Phenotype-Biomarker Associations in Traumatic Brain Injury Using Topological Data Analysis (2017)

Jessica L. Nielson, Shelly R. Cooper, John K. Yue, Marco D. Sorani, Tomoo Inoue, Esther L. Yuh, Pratik Mukherjee, Tanya C. Petrossian, Jesse Paquette, Pek Y. Lum, Gunnar E. Carlsson, Mary J. Vassar, Hester F. Lingsma, Wayne A. Gordon, Alex B. Valadka, David O. Okonkwo, Geoffrey T. Manley, Adam R. Ferguson, Track-Tbi Investigators

Abstract

Background Traumatic brain injury (TBI) is a complex disorder that is traditionally stratified based on clinical signs and symptoms. Recent imaging and molecular biomarker innovations provide unprecedented opportunities for improved TBI precision medicine, incorporating patho-anatomical and molecular mechanisms. Complete integration of these diverse data for TBI diagnosis and patient stratification remains an unmet challenge. Methods and findings The Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Pilot multicenter study enrolled 586 acute TBI patients and collected diverse common data elements (TBI-CDEs) across the study population, including imaging, genetics, and clinical outcomes. We then applied topology-based data-driven discovery to identify natural subgroups of patients, based on the TBI-CDEs collected. Our hypothesis was two-fold: 1) A machine learning tool known as topological data analysis (TDA) would reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery, and 2) TDA-identified biomarkers would significantly predict patient outcome recovery after TBI using more traditional methods of univariate statistical tests. TDA algorithms organized and mapped the data of TBI patients in multidimensional space, identifying a subset of mild TBI patients with a specific multivariate phenotype associated with unfavorable outcome at 3 and 6 months after injury. Further analyses revealed that this patient subset had high rates of post-traumatic stress disorder (PTSD), and enrichment in several distinct genetic polymorphisms associated with cellular responses to stress and DNA damage (PARP1), and in striatal dopamine processing (ANKK1, COMT, DRD2). Conclusions TDA identified a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms. Machine learning methods such as TDA may provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials in TBI patients. Trial Registration ClinicalTrials.gov Identifier NCT01565551

Community Resources

Code (Software)

🍩 Database of Original & Non-Theoretical Uses of Topology

Toward Automated Prediction of Manufacturing Productivity Based on Feature Selection Using Topological Data Analysis (2016)

An Industry Case of Large-Scale Demand Forecasting of Hierarchical Components (2019)

Using Topological Data Analysis for Text Classification (2018)

Community Resources

A Topological Data Analysis Based Classification Method for Multiple Measurements (2019)

Fibers of Failure: Classifying Errors in Predictive Processes (2020)

A Topological Framework for Deep Learning (2020)

Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

Steinhaus Filtration and Stable Paths in the Mapper (2020)

Topological Machine Learning With Persistence Indicator Functions (2019)

Topological Graph Neural Networks (2021)

Topologically Densified Distributions (2020)

Unsupervised Topological Learning for Identification of Atomic Structures (2022)

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science (2023)

Acute Lymphoblastic Leukemia Classification Using Persistent Homology (2024)

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)

Differentiable Euler Characteristic Transforms for Shape Classification (2023)

Protein Classification With Improved Topological Data Analysis (2018)

Deep Learning With Topological Signatures (2017)

Curvature Filtrations for Graph Generative Model Evaluation (2023)

Topology-Informed Machine Learning for Efficient Prediction of Solid Oxide Fuel Cell Electrode Polarization (2025)

Community Resources

Simplicial Representation Learning With Neural \$K\$-Forms (2023)

A Novel Multi-Task Machine Learning Classifier for Rare Disease Patterning Using Cardiac Strain Imaging Data (2024)

Community Resources

Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

Community Resources

Filtration Surfaces for Dynamic Graph Classification (2023)

Topological Singularity Detection at Multiple Scales (2023)

Mapping the Multiverse of Latent Representations (2024)

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

Position: Topological Deep Learning Is the New Frontier for Relational Learning (2024)

Topological Data Analysis for Electric Motor Eccentricity Fault Detection (2022)

Topological Feature Vectors for Chatter Detection in Turning Processes (2019)

Identification of Coenzyme-Binding Proteins With Machine Learning Algorithms (2019)

Persistent Homology for the Automatic Classification of Prostate Cancer Aggressiveness in Histopathology Images (2019)

Cell Complex Neural Networks (2020)

Filtration Curves for Graph Representation (2021)

The Classification of Endoscopy Images With Persistent Homology (2016)

Interpretable Phase Detection and Classification With Persistent Homology (2020)

Graph Filtration Learning (2020)

Simplicial Neural Networks (2020)

Topological Autoencoders (2020)

Data-Driven and Automatic Surface Texture Analysis Using Persistent Homology (2021)

Can Neural Networks Learn Persistent Homology Features? (2020)

Topological Data Analysis of C. Elegans Locomotion and Behavior (2021)

A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)

A Stable Multi-Scale Kernel for Topological Machine Learning (2015)

Topological Detection of Trojaned Neural Networks (2021)

Topological Detection of Trojaned Neural Networks (2021)

A Novel Method of Extracting Topological Features From Word Embeddings (2020)

Generalized Penalty for Circular Coordinate Representation (2020)

Fast and Accurate Tumor Segmentation of Histology Images Using Persistent Homology and Deep Convolutional Features (2019)

Persistent Homology Machine Learning for Fingerprint Classification (2019)

Hyperparameter Optimization of Topological Features for Machine Learning Applications (2019)

Contagion Dynamics for Manifold Learning (2020)

Persistent Homology Advances Interpretable Machine Learning for Nanoporous Materials (2020)

Topological Machine Learning for Mixed Numeric and Categorical Data (2020)

Explainable Machine Learning Approach to Yield and Quality Improvements Using Deep Topological Data Analytics (2023)

Quantitative Analysis of Phase Transitions in Two-Dimensional XY Models Using Persistent Homology (2022)

Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)

Topological Regularization for Dense Prediction (2021)

TopoGAN: A Topology-Aware Generative Adversarial Network (2020)

Community Resources

Severe Slugging Flow Identification From Topological Indicators (2022)

The Extended Persistent Homology Transform of Manifolds With Boundary (2022)

Towards a New Approach to Reveal Dynamical Organization of the Brain Using Topological Data Analysis (2018)

Analyzing Collective Motion With Machine Learning and Topology (2019)

Identifying Repeating Patterns in IEC 61499 Systems Using Feature-Based Embeddings (2022)

A Topological Data Analysis Approach On Predicting Phenotypes From Gene Expression Data (2020)

Alzheimer Disease Detection From Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning (2023)

Quantitative and Interpretable Order Parameters for Phase Transitions From Persistent Homology (2020)

TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through Topology-Based Machine Learning (2021)

Topological Analysis of Low Dimensional Phase Space Trajectories of High Dimensional EEG Signals for Classification of Interictal Epileptiform Discharges (2023)

Mapping Geometric and Electromagnetic Feature Spaces With Machine Learning for Additively Manufactured RF Devices (2022)

A Topological Machine Learning Pipeline for Classification (2022)

Rapid and Precise Topological Comparison With Merge Tree Neural Networks (2024)