A Topological Framework for Deep Learning (2020)

Abstract

We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective.

Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

Mehmet Emin Aktas, Esra Akbas

Abstract

Due to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach.

Topological Graph Neural Networks (2021)

Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt

Abstract

Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data.

Toward Automated Prediction of Manufacturing Productivity Based on Feature Selection Using Topological Data Analysis (2016)

Wei Guo, Ashis G. Banerjee

Abstract

In this paper, we extend the application of topological data analysis (TDA) to the field of manufacturing for the first time to the best of our knowledge. We apply a particular TDA method, known as the Mapper algorithm, on a benchmark chemical processing data set. The algorithm yields a topological network that captures the intrinsic clusters and connections among the clusters present in the high-dimensional data set, which are difficult to detect using traditional methods. We select key process variables or features that impact the final product yield by analyzing the shape of this network. We then use three prediction models to evaluate the impact of the selected features. Results show that the models achieve the same level of high prediction accuracy as with all the process variables, thereby, providing a way to carry out process monitoring and control in a more cost-effective manner.

Differentiable Euler Characteristic Transforms for Shape Classification (2023)

Ernst Röell, Bastian Rieck

Abstract

The _Euler Characteristic Transform_ (ECT) is a powerful invariant, combining geometrical and topological characteristics of shapes and graphs. However, the ECT was hitherto unable to learn task-specific representations. We overcome this issue and develop a novel computational layer that enables learning the ECT in an end-to-end fashion. Our method, the _Differentiable Euler Characteristic Transform_ (DECT) is fast and computationally efficient, while exhibiting performance on a par with more complex models in both graph and point cloud classification tasks. Moreover, we show that this seemingly simple statistic provides the same topological expressivity as more complex topological deep learning layers.

Simplicial Representation Learning With Neural \$K\$-Forms (2023)

Kelly Maggs, Celia Hacker, Bastian Rieck

Abstract

Geometric deep learning extends deep learning to incorporate information about the geometry and topology data, especially in complex domains like graphs. Despite the popularity of message passing in this field, it has limitations such as the need for graph rewiring, ambiguity in interpreting data, and over-smoothing. In this paper, we take a different approach, focusing on leveraging geometric information from simplicial complexes embedded in \$\mathbb\R\\textasciicircumn\$ using node coordinates. We use differential \$k\$-forms in \$\mathbb\R\\textasciicircumn\$ to create representations of simplices, offering interpretability and geometric consistency without message passing. This approach also enables us to apply differential geometry tools and achieve universal approximation. Our method is efficient, versatile, and applicable to various input complexes, including graphs, simplicial complexes, and cell complexes. It outperforms existing message passing neural networks in harnessing information from geometrical graphs with node features serving as coordinates.

Filtration Surfaces for Dynamic Graph Classification (2023)

Franz Srambical, Bastian Rieck

Abstract

Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation among similarly scalable methods.

Position: Topological Deep Learning Is the New Frontier for Relational Learning (2024)

Theodore Papamarkou, Tolga Birdal, Michael M. Bronstein, Gunnar E. Carlsson, Justin Curry, Yue Gao, Mustafa Hajij, Roland Kwitt, Pietro Lio, Paolo Di Lorenzo, Vasileios Maroulas, Nina Miolane, Farzana Nasrin, Karthikeyan Natesan Ramamurthy, Bastian Rieck, Simone Scardapane, Michael T. Schaub, Petar Veličković, Bei Wang, Yusu Wang, Guowei Wei, Ghada Zamzmi

Abstract

Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.

Cell Complex Neural Networks (2020)

Mustafa Hajij, Kyle Istvan, Ghada Zamzami

Abstract

Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. We propose a general, combinatorial, and unifying construction for performing neural network-type computations on cell complexes. Furthermore, we introduce inter-cellular message passing schemes, message passing schemes on cell complexes that take the topology of the underlying space into account. In particular, our method generalizes many of the most popular types of graph neural networks.

Filtration Curves for Graph Representation (2021)

Leslie O'Bray, Bastian Rieck, Karsten Borgwardt

Abstract

The two predominant approaches to graph comparison in recent years are based on (i) enumerating matching subgraphs or (ii) comparing neighborhoods of nodes. In this work, we complement these two perspectives with a third way of representing graphs: using filtration curves from topological data analysis that capture both edge weight information and global graph structure. Filtration curves are highly efficient to compute and lead to expressive representations of graphs, which we demonstrate on graph classification benchmark datasets. Our work opens the door to a new form of graph representation in data mining.

Simplicial Neural Networks (2020)

Stefania Ebli, Michaël Defferrard, Gard Spreemann

Abstract

We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices - allowing us to consider richer data, including vector fields and \$n\$-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes.

A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)

Xiaoyu Zhang, Takanori Fujiwara, Senthil Chandrasegaran, Michael P. Brundage, Thurston Sexton, Alden Dima, Kwan-Liu Ma

Abstract

Analysis of large, high-dimensional, and heterogeneous datasets is challenging as no one technique is suitable for visualizing and clustering such data in order to make sense of the underlying information. For instance, heterogeneous logs detailing machine repair and maintenance in an organization often need to be analyzed to diagnose errors and identify abnormal patterns, formalize root-cause analyses, and plan preventive maintenance. Such real-world datasets are also beset by issues such as inconsistent and/or missing entries. To conduct an effective diagnosis, it is important to extract and understand patterns from the data with support from analytic algorithms (e.g., finding that certain kinds of machine complaints occur more in the summer) while involving the human-in-the-loop. To address these challenges, we adopt existing techniques for dimensionality reduction (DR) and clustering of numerical, categorical, and text data dimensions, and introduce a visual analytics approach that uses multiple coordinated views to connect DR + clustering results across each kind of the data dimension stated. To help analysts label the clusters, each clustering view is supplemented with techniques and visualizations that contrast a cluster of interest with the rest of the dataset. Our approach assists analysts to make sense of machine maintenance logs and their errors. Then the gained insights help them carry out preventive maintenance. We illustrate and evaluate our approach through use cases and expert studies respectively, and discuss generalization of the approach to other heterogeneous data.

Generalized Penalty for Circular Coordinate Representation (2020)

Hengrui Luo, Alice Patania, Jisu Kim, Mikael Vejdemo-Johansson

Abstract

Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account sparsity in high-dimensional applications. We use a generalized penalty function instead of an \$L_\2\\$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analysis to support our claim that circular coordinates with generalized penalty will accommodate the sparsity in high-dimensional datasets under different sampling schemes while preserving the topological structures.

Hyperparameter Optimization of Topological Features for Machine Learning Applications (2019)

Francis Motta, Christopher Tralie, Rossella Bedini, Fabiano Bini, Gilberto Bini, Hamed Eramian, Marcio Gameiro, Steve Haase, Hugh Haddox, John Harer, Nick Leiby, Franco Marinozzi, Scott Novotney, Gabe Rocklin, Jed Singer, Devin Strickland, Matt Vaughn

Abstract

This paper describes a general pipeline for generating optimal vector representations of topological features of data for use with machine learning algorithms. This pipeline can be viewed as a costly black-box function defined over a complex configuration space, each point of which specifies both how features are generated and how predictive models are trained on those features. We propose using state-of-the-art Bayesian optimization algorithms to inform the choice of topological vectorization hyperparameters while simultaneously choosing learning model parameters. We demonstrate the need for and effectiveness of this pipeline using two difficult biological learning problems, and illustrate the nontrivial interactions between topological feature generation and learning model hyperparameters.

An Industry Case of Large-Scale Demand Forecasting of Hierarchical Components (2019)

Rodrigo Rivera-Castro, Ivan Nazarov, Yuke Xiang, Ivan Maksimov, Aleksandr Pletnev, Evgeny Burnaev

Abstract

Demand forecasting of hierarchical components is essential in manufacturing. However, its discussion in the machine-learning literature has been limited, and judgemental forecasts remain pervasive in the industry. Demand planners require easy-to-understand tools capable of delivering state-of-the-art results. This work presents an industry case of demand forecasting at one of the largest manufacturers of electronics in the world. It seeks to support practitioners with five contributions: (1) A benchmark of fourteen demand forecast methods applied to a relevant data set, (2) A data transformation technique yielding comparable results with state of the art, (3) An alternative to ARIMA based on matrix factorization, (4) A model selection technique based on topological data analysis for time series and (5) A novel data set. Organizations seeking to up-skill existing personnel and increase forecast accuracy will find value in this work.

Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)

Bryan Bischof, Eric Bunch

Abstract

We experimentally investigate a collection of feature engineering pipelines for use with a CNN for classifying eyes-open or eyes-closed from electroencephalogram (EEG) time-series from the Bonn dataset. Using the Takens' embedding--a geometric representation of time-series--we construct simplicial complexes from EEG data. We then compare \$\epsilon\$-series of Betti-numbers and \$\epsilon\$-series of graph spectra (a novel construction)--two topological invariants of the latent geometry from these complexes--to raw time series of the EEG to fill in a gap in the literature for benchmarking. These methods, inspired by Topological Data Analysis, are used for feature engineering to capture local geometry of the time-series. Additionally, we test these feature pipelines' robustness to downsampling and data reduction. This paper seeks to establish clearer expectations for both time-series classification via geometric features, and how CNNs for time-series respond to data of degraded resolution.

Identifying Repeating Patterns in IEC 61499 Systems Using Feature-Based Embeddings (2022)

Markus Unterdechler, Antonio M. Gutiérrez, Lisa Sonnleithner, Rick Rabiser, Alois Zoitl

Abstract

Cyber-Physical Production Systems (CPPSs) are highly variable systems of systems comprised of software and hardware interacting with each other and the environment. The increasing integration of technologies and devices has brought an unprecedented level of automation and customization. At the same time, it has also increased the efforts to maintain highly complex and heterogeneous systems. Although engineering practices support the reuse of common components to ease the development and maintenance of the systems in different projects, the identification of common components is still manually performed, which is a time-consuming, error-prone task. In this paper, a novel approach identifying repeating patterns in CPPSs based on artificial intelligence techniques is presented. This approach allows finding exact and similar components to support the CPPS design. Furthermore, it enables the maintenance of common components by reusing predefined types thereby reducing development effort. We implemented and evaluated our approach in an industry case study on developing CPPS control software with IEC 61499.

A Topological Data Analysis Based Classification Method for Multiple Measurements (2019)

Henri Riihimäki, Wojciech Chachólski, Jakob Theorell, Jan Hillert, Ryan Ramanujam

Abstract

\textlessh3\textgreaterAbstract\textless/h3\textgreater \textlessh3\textgreaterBackground\textless/h3\textgreater \textlessp\textgreaterMachine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values.\textless/p\textgreater\textlessh3\textgreaterResults\textless/h3\textgreater \textlessp\textgreaterFor 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model.\textless/p\textgreater\textlessh3\textgreaterConclusions\textless/h3\textgreater \textlessp\textgreaterThis algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.\textless/p\textgreater

Mapping Geometric and Electromagnetic Feature Spaces With Machine Learning for Additively Manufactured RF Devices (2022)

Deanna Sessions, Venkatesh Meenakshisundaram, Andrew Gillman, Alexander Cook, Kazuko Fuchi, Philip R. Buskohl, Gregory H. Huff

Abstract

Multi-material additive manufacturing enables transformative capabilities in customized, low-cost, and multi-functional electromagnetic devices. However, process-specific fabrication anomalies can result in non-intuitive effects on performance; we propose a framework for identifying defect mechanisms and their performance impact by mapping geometric variances to electromagnetic performance metrics. This method can accelerate additive fabrication feedback while avoiding the high computational cost of in-line electromagnetic simulation. We first used dimension reduction to explore the population of geometric manufacturing anomalies and electromagnetic performance. Convolutional neural networks are then trained to predict the electromagnetic performance of the printed geometries. In generating the networks, we explored two inputs: one image-derived geometric description and one using the same description with additional simulated electromagnetic information. Network latent space analysis shows the networks learned both geometric and electromagnetic values even without electromagnetic input. This result demonstrates it is possible to create accelerated additive feedback systems predicting electromagnetic performance without in-line simulation.

Rapid and Precise Topological Comparison With Merge Tree Neural Networks (2024)

Yu Qin, Brittany Terese Fasy, Carola Wenk, Brian Summa

Abstract

Merge trees are a valuable tool in the scientific visualization of scalar fields; however, current methods for merge tree comparisons are computationally expensive, primarily due to the exhaustive matching between tree nodes. To address this challenge, we introduce the Merge Tree Neural Network (MTNN), a learned neural network model designed for merge tree comparison. The MTNN enables rapid and high-quality similarity computation. We first demonstrate how to train graph neural networks, which emerged as effective encoders for graphs, in order to produce embeddings of merge trees in vector spaces for efficient similarity comparison. Next, we formulate the novel MTNN model that further improves the similarity comparisons by integrating the tree and node embeddings with a new topological attention mechanism. We demonstrate the effectiveness of our model on real-world data in different domains and examine our model's generalizability across various datasets. Our experimental analysis demonstrates our approach's superiority in accuracy and efficiency. In particular, we speed up the prior state-of-the-art by more than \$100\times\$ on the benchmark datasets while maintaining an error rate below \$0.1\%\$.

Fibers of Failure: Classifying Errors in Predictive Processes (2020)

Leo S. Carlsson, Mikael Vejdemo-Johansson, Gunnar Carlsson, Pär G. Jönsson

Abstract

Predictive models are used in many different fields of science and engineering and are always prone to make faulty predictions. These faulty predictions can be more or less malignant depending on the model application. We describe fibers of failure (FiFa), a method to classify failure modes of predictive processes. Our method uses Mapper, an algorithm from topological data analysis (TDA), to build a graphical model of input data stratified by prediction errors. We demonstrate two ways to use the failure mode groupings: either to produce a correction layer that adjusts predictions by similarity to the failure modes; or to inspect members of the failure modes to illustrate and investigate what characterizes each failure mode. We demonstrate FiFa on two scenarios: a convolutional neural network (CNN) predicting MNIST images with added noise, and an artificial neural network (ANN) predicting the electrical energy consumption of an electric arc furnace (EAF). The correction layer on the CNN model improved its prediction accuracy significantly while the inspection of failure modes for the EAF model provided guiding insights into the domain-specific reasons behind several high-error regions.

Capturing Dynamics of Time-Varying Data via Topology (2020)

Lu Xian, Henry Adams, Chad M. Topaz, Lori Ziegelmeier

Abstract

One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of time-varying metric spaces including vineyards [17], crocker plots [52], and multiparameter rank functions [34]. We then introduce a new tool to summarize time-varying metric spaces: a crocker stack. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable stability property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [54]. Altogether, we aim to bring the broader applied mathematics community up-to-date on topological summaries of time-varying metric spaces.

Persistence Images: A Stable Vector Representation of Persistent Homology (2017)

Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, Lori Ziegelmeier

Abstract

Many data sets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a data set. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite-dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs.

Community Resources

Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations (2016)

Abdellah Tebani, Carlos Afonso, Stéphane Marret, Soumeya Bekri

Abstract

The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of next-generation diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by high-throughput omics technologies and the computational surge, it enables multi-scale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. High-throughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable high-dimensional data modeling and, therefore, accessible and user-friendly visualization. Furthermore, bioinformatics has enabled comprehensive multi-omics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present state-of-the-art multi-omics data analysis strategies in a clinical context. The challenges of omics-based biomarker translation are discussed. Perspectives regarding the use of multi-omics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the post-genomic era.

PI-Net: A Deep Learning Approach to Extract Topological Persistence Images (2020)

Anirudh Som, Hongjun Choi, Karthikeyan Natesan Ramamurthy, Matthew Buman, Pavan Turaga

Abstract

Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in real-world data, such as view-point, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difﬁculty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PINet respectively. To the best of our knowledge, we are the ﬁrst to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PI-Net architectures on two applications: human activity recognition using tri-axial accelerometer sensor data and image classiﬁcation. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.

Prediction in Cancer Genomics Using Topological Signatures and Machine Learning (2020)

Georgina Gonzalez, Arina Ushakova, Radmila Sazdanovic, Javier Arsuaga

Abstract

Copy Number Aberrations, gains and losses of genomic regions, are a hallmark of cancer and can be experimentally detected using microarray comparative genomic hybridization (aCGH). In previous works, we developed a topology based method to analyze aCGH data whose output are regions of the genome where copy number is altered in patients with a predetermined cancer phenotype. We call this method Topological Analysis of array CGH (TAaCGH). Here we combine TAaCGH with machine learning techniques to build classifiers using copy number aberrations. We chose logistic regression on two different binary phenotypes related to breast cancer to illustrate this approach. The first case consists of patients with over-expression of the ERBB2 gene. Over-expression of ERBB2 is commonly regulated by a copy number gain in chromosome arm 17q. TAaCGH found the region 17q11-q22 associated with the phenotype and using logistic regression we reduced this region to 17q12-q21.31 correctly classifying 78% of the ERBB2 positive individuals (sensitivity) in a validation data set. We also analyzed over-expression in Estrogen Receptor (ER), a second phenotype commonly observed in breast cancer patients and found that the region 5p14.3-12 together with six full arms were associated with the phenotype. Our method identified 4p, 6p and 16q as the strongest predictors correctly classifying 76% of ER positives in our validation data set. However, for this set there was a significant increase in the false positive rate (specificity). We suggest that topological and machine learning methods can be combined for prediction of phenotypes using genetic data.

Semantic Segmentation of Microscopic Neuroanatomical Data by Combining Topological Priors With Encoder–decoder Deep Networks (2020)

Samik Banerjee, Lucas Magee, Dingkang Wang, Xu Li, Bing-Xing Huo, Jaikishan Jayakumar, Katherine Matho, Meng-Kuan Lin, Keerthi Ram, Mohanasankar Sivaprakasam, Josh Huang, Yusu Wang, Partha P. Mitra

Abstract

Understanding of neuronal circuitry at cellular resolution within the brain has relied on neuron tracing methods that involve careful observation and interpretation by experienced neuroscientists. With recent developments in imaging and digitization, this approach is no longer feasible with the large-scale (terabyte to petabyte range) images. Machine-learning-based techniques, using deep networks, provide an efficient alternative to the problem. However, these methods rely on very large volumes of annotated images for training and have error rates that are too high for scientific data analysis, and thus requires a substantial volume of human-in-the-loop proofreading. Here we introduce a hybrid architecture combining prior structure in the form of topological data analysis methods, based on discrete Morse theory, with the best-in-class deep-net architectures for the neuronal connectivity analysis. We show significant performance gains using our hybrid architecture on detection of topological structure (for example, connectivity of neuronal processes and local intensity maxima on axons corresponding to synaptic swellings) with precision and recall close to 90% compared with human observers. We have adapted our architecture to a high-performance pipeline capable of semantic segmentation of light-microscopic whole-brain image data into a hierarchy of neuronal compartments. We expect that the hybrid architecture incorporating discrete Morse techniques into deep nets will generalize to other data domains.

Steinhaus Filtration and Stable Paths in the Mapper (2020)

Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel Saul

Abstract

Two central concepts from topological data analysis are persistence and the Mapper construction. Persistence employs a sequence of objects built on data called a filtration. A Mapper produces insightful summaries of data, and has found widespread applications in diverse areas. We define a new filtration called the cover filtration built from a single cover based on a generalized Steinhaus distance, which is a generalization of Jaccard distance. We prove a stability result: the cover filtrations of two covers are \$\alpha/m\$ interleaved, where \$\alpha\$ is a bound on bottleneck distance between covers and \$m\$ is the size of smallest set in either cover. We also show our construction is equivalent to the Cech filtration under certain settings, and the Vietoris-Rips filtration completely determines the cover filtration in all cases. We then develop a theory for stable paths within this filtration. Unlike standard results on stability in topological persistence, our definition of path stability aligns exactly with the above result on stability of cover filtration. We demonstrate how our framework can be employed in a variety of applications where a metric is not obvious but a cover is readily available. First we present a new model for recommendation systems using cover filtration. For an explicit example, stable paths identified on a movies data set represent sequences of movies constituting gentle transitions from one genre to another. As a second application in explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations. Stable paths in the Mapper from a supervised machine learning model trained on the FashionMNIST data set provide improved explanations of relationships between subpopulations of images.

🍩 Database of Original & Non-Theoretical Uses of Topology

A Topological Framework for Deep Learning (2020)

Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

Topological Graph Neural Networks (2021)

Toward Automated Prediction of Manufacturing Productivity Based on Feature Selection Using Topological Data Analysis (2016)

Differentiable Euler Characteristic Transforms for Shape Classification (2023)

Simplicial Representation Learning With Neural \$K\$-Forms (2023)

Filtration Surfaces for Dynamic Graph Classification (2023)

Position: Topological Deep Learning Is the New Frontier for Relational Learning (2024)

Cell Complex Neural Networks (2020)

Filtration Curves for Graph Representation (2021)

Simplicial Neural Networks (2020)

A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)

Generalized Penalty for Circular Coordinate Representation (2020)

Hyperparameter Optimization of Topological Features for Machine Learning Applications (2019)

An Industry Case of Large-Scale Demand Forecasting of Hierarchical Components (2019)

Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)

Identifying Repeating Patterns in IEC 61499 Systems Using Feature-Based Embeddings (2022)

A Topological Data Analysis Based Classification Method for Multiple Measurements (2019)

Mapping Geometric and Electromagnetic Feature Spaces With Machine Learning for Additively Manufactured RF Devices (2022)

Rapid and Precise Topological Comparison With Merge Tree Neural Networks (2024)

Fibers of Failure: Classifying Errors in Predictive Processes (2020)

Capturing Dynamics of Time-Varying Data via Topology (2020)

Persistence Images: A Stable Vector Representation of Persistent Homology (2017)

Community Resources

Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations (2016)

PI-Net: A Deep Learning Approach to Extract Topological Persistence Images (2020)

Prediction in Cancer Genomics Using Topological Signatures and Machine Learning (2020)

Semantic Segmentation of Microscopic Neuroanatomical Data by Combining Topological Priors With Encoder–decoder Deep Networks (2020)

Steinhaus Filtration and Stable Paths in the Mapper (2020)