🍩 Database of Original & NonTheoretical Uses of Topology
(found 69 matches in 0.008357s)


Topological Singularity Detection at Multiple Scales (2023)
Julius von Rohrscheidt, Bastian RieckAbstract
The manifold hypothesis, which assumes that data lies on or close to an unknown manifold of low intrinsic dimension, is a staple of modern machine learning research. However, recent work has shown that realworld data exhibits distinct nonmanifold structures, i.e. singularities, that can lead to erroneous findings. Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ’manifoldness’ of a point along multiple scales. Our approach identifies singularities of complex spaces, while also capturing singular structures and local geometric complexity in image data. 
Efficient Planning of MultiRobot Collective Transport Using Graph Reinforcement Learning With Higher Order Topological Abstraction (2023)
Steve Paul, Wenyuan Li, Brian Smyth, Yuzhou Chen, Yulia Gel, Souma ChowdhuryAbstract
Efficient multirobot task allocation (MRTA) is fundamental to various timesensitive applications such as disaster response, warehouse operations, and construction. This paper tackles a particular class of these problems that we call MRTAcollective transport or MRTACT  here tasks present varying workloads and deadlines, and robots are subject to flight range, communication range, and payload constraints. For large instances of these problems involving 100s1000's of tasks and 10s100s of robots, traditional nonlearning solvers are often timeinefficient, and emerging learningbased policies do not scale well to largersized problems without costly retraining. To address this gap, we use a recently proposed encoderdecoder graph neural network involving Capsule networks and multihead attention mechanism, and innovatively add topological descriptors (TD) as new features to improve transferability to unseen problems of similar and larger size. Persistent homology is used to derive the TD, and proximal policy optimization is used to train our TDaugmented graph neural network. The resulting policy model compares favorably to stateoftheart nonlearning baselines while being much faster. The benefit of using TD is readily evident when scaling to test problems of size larger than those used in training. 
Optimizing Porosity Detection in Wire Laser Metal Deposition Processes Through DataDriven AI Classification Techniques (2023)
Meritxell GomezOmella, Jon Flores, Basilio Sierra, Susana Ferreiro, Nicolas Hascoët, Francisco ChinestaAbstract
Additive manufacturing (AM) is an attractive solution for many companies that produce geometrically complex parts. This process consists of depositing material layer by layer following a sliced CAD geometry. It brings several benefits to manufacturing capabilities, such as design freedom, reduced material waste, and shortrun customization. However, one of the current challenges faced by users of the process, mainly in wire laser metal deposition (wLMD), is to avoid defects in the manufactured part, especially the porosity. This defect is caused by extreme conditions and metallurgical transformations of the process. And not only does it directly affect the mechanical performance of the parts, especially the fatigue properties, but it also means an increase in costs due to the inspection tasks to which the manufactured parts must be subjected. This work compares three operational solution approaches, productcentric, based on signalbased feature extraction and Topological Data Analysis together with statistical and Machine Learning (ML) techniques, for the early detection and prediction of porosity failure in a wLMD process. The different forecasting and validation strategies demonstrate the variety of conclusions that can be drawn with different objectives in the analysis of the monitored data in AM problems. 
Manifold Learning for Coherent Design Interpolation Based on Geometrical and Topological Descriptors (2023)
D. Muñoz, O. Allix, F. Chinesta, J. J. Ródenas, E. NadalAbstract
In the context of intellectual property in the manufacturing industry, knowhow is referred to practical knowledge on how to accomplish a specific task. This knowhow is often difficult to be synthesised in a set of rules or steps as it remains in the intuition and expertise of engineers, designers, and other professionals. Today, a new research line in this concern spotup thanks to the explosion of Artificial Intelligence and Machine Learning algorithms and its alliance with Computational Mechanics and Optimisation tools. However, a key aspect with industrial design is the scarcity of available data, making it problematic to rely on deeplearning approaches. Assuming that the existing designs live in a manifold, in this paper, we propose a synergistic use of existing Machine Learning tools to infer a reduced manifold from the existing limited set of designs and, then, to use it to interpolate between the individuals, working as a generator basis, to create new and coherent designs. For this, a key aspect is to be able to properly interpolate in the reduced manifold, which requires a proper clustering of the individuals. From our experience, due to the scarcity of data, adding topological descriptors to geometrical ones considerably improves the quality of the clustering. Thus, a distance, mixing topology and geometry is proposed. This distance is used both, for the clustering and for the interpolation. For the interpolation, relying on optimal transport appear to be mandatory. Examples of growing complexity are proposed to illustrate the goodness of the method. 
Topological Data Analysis for Electric Motor Eccentricity Fault Detection (2022)
Bingnan Wang, Chungwei Lin, Hiroshi Inoue, Makoto KanemaruAbstract
In this paper, we develop topological data analysis (TDA) method for motor current signature analysis (MCSA), and apply it to induction motor eccentricity fault detection. We introduce TDA and present the procedure of extracting topological features from timedomain data that will be represented using persistence diagrams and vectorized Betti sequences. The procedure is applied to induction machine phase current signal analysis, and shown to be highly effective in differentiating signals from different eccentricity levels. With TDA, we are able to use a simple regression model that can predict the fault levels with reasonable accuracy, even for the data of eccentricity levels that are not seen in the training data. The proposed method is modelfree, and only requires a small segment of timedomain data to make prediction. These advantages make it attractive for a wide range of fault detection applications. 
Quantitative Analysis of Phase Transitions in TwoDimensional XY Models Using Persistent Homology (2022)
Nicholas Sale, Jeffrey Giansiracusa, Biagio LuciniAbstract
We use persistent homology and persistence images as an observable of three different variants of the twodimensional XY model in order to identify and study their phase transitions. We examine models with the classical XY action, a topological lattice action, and an action with an additional nematic term. In particular, we introduce a new way of computing the persistent homology of lattice spin model configurations and, by considering the fluctuations in the output of logistic regression and knearest neighbours models trained on persistence images, we develop a methodology to extract estimates of the critical temperature and the critical exponent of the correlation length. We put particular emphasis on finitesize scaling behaviour and producing estimates with quantifiable error. For each model we successfully identify its phase transition(s) and are able to get an accurate determination of the critical temperatures and critical exponents of the correlation length. 
Severe Slugging Flow Identification From Topological Indicators (2022)
Simone CasoloAbstract
In this work, topological data analysis is used to identify the onset of severe slug flow in offshore petroleum production systems. Severe slugging is a multiphase flow regime known to be very inefficient and potentially harmful to process equipment and it is characterized by large oscillations in the production fluid pressure. Time series from pressure sensors in subsea oil wells are processed by means of Takens embedding to produce point clouds of data. Embedded sensor data is then analyzed using persistent homology to obtain topological indicators capable of revealing the occurrence of severe slugging in a conditionbased monitoring approach. A large dataset of well events consisting of both real and simulated data is used to demonstrate the possibilty of authomatizing severe slugging detection from live data via topological data analysis. Methods based on persistence diagrams are shown to accurately identify severe slugging and to classify different flow regimes from pressure signals of producing wells with supervised machine learning. 
Identifying Repeating Patterns in IEC 61499 Systems Using FeatureBased Embeddings (2022)
Markus Unterdechler, Antonio M. Gutiérrez, Lisa Sonnleithner, Rick Rabiser, Alois ZoitlAbstract
CyberPhysical Production Systems (CPPSs) are highly variable systems of systems comprised of software and hardware interacting with each other and the environment. The increasing integration of technologies and devices has brought an unprecedented level of automation and customization. At the same time, it has also increased the efforts to maintain highly complex and heterogeneous systems. Although engineering practices support the reuse of common components to ease the development and maintenance of the systems in different projects, the identification of common components is still manually performed, which is a timeconsuming, errorprone task. In this paper, a novel approach identifying repeating patterns in CPPSs based on artificial intelligence techniques is presented. This approach allows finding exact and similar components to support the CPPS design. Furthermore, it enables the maintenance of common components by reusing predefined types thereby reducing development effort. We implemented and evaluated our approach in an industry case study on developing CPPS control software with IEC 61499. 
Mapping Geometric and Electromagnetic Feature Spaces With Machine Learning for Additively Manufactured RF Devices (2022)
Deanna Sessions, Venkatesh Meenakshisundaram, Andrew Gillman, Alexander Cook, Kazuko Fuchi, Philip R. Buskohl, Gregory H. HuffAbstract
Multimaterial additive manufacturing enables transformative capabilities in customized, lowcost, and multifunctional electromagnetic devices. However, processspecific fabrication anomalies can result in nonintuitive effects on performance; we propose a framework for identifying defect mechanisms and their performance impact by mapping geometric variances to electromagnetic performance metrics. This method can accelerate additive fabrication feedback while avoiding the high computational cost of inline electromagnetic simulation. We first used dimension reduction to explore the population of geometric manufacturing anomalies and electromagnetic performance. Convolutional neural networks are then trained to predict the electromagnetic performance of the printed geometries. In generating the networks, we explored two inputs: one imagederived geometric description and one using the same description with additional simulated electromagnetic information. Network latent space analysis shows the networks learned both geometric and electromagnetic values even without electromagnetic input. This result demonstrates it is possible to create accelerated additive feedback systems predicting electromagnetic performance without inline simulation. 
A Topological Machine Learning Pipeline for Classification (2022)
Francesco Conti, Davide Moroni, Maria Antonietta PascaliAbstract
In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and readytouse pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another. 
Exploring the Geometry and Topology of Neural Network Loss Landscapes (2022)
Stefan Horoi, Jessie Huang, Bastian Rieck, Guillaume Lajoie, Guy Wolf, Smita KrishnaswamyAbstract
Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have proposed visualizing the loss landscape through the use of simple dimensionality reduction techniques. However, such visualization methods have been limited by their linear nature and only capture features in one or two dimensions, thus restricting sampling of the loss landscape to lines or planes. Here, we expand and improve upon these in three ways. First, we present a novel “jump and retrain” procedure for sampling relevant portions of the loss landscape. We show that the resulting sampled data holds more meaningful information about the network’s ability to generalize. Next, we show that nonlinear dimensionality reduction of the jump and retrain trajectories via PHATE, a trajectory and manifoldpreserving method, allows us to visualize differences between networks that are generalizing well vs poorly. Finally, we combine PHATE trajectories with a computational homology characterization to quantify trajectory differences. 
TimeInhomogeneous Diffusion Geometry and Topology (2022)
Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita KrishnaswamyAbstract
Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of highdimensional data. Diffusion condensation is constructed as a timeinhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroidbased hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic diffusion homology. We use this intrinsic topology as well as an ambient topology to study how the data changes over diffusion time. We demonstrate both homologies in wellunderstood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis. 
Continuous Indexing of Fibrosis (CIF): Improving the Assessment and Classification of MPN Patients (2022)
Hosuk Ryou, Korsuk Sirinukunwattana, Alan Aberdeen, Gillian Grindstaff, Bernadette Stolz, Helen Byrne, Heather A. Harrington, Nikolaos Sousos, Anna L. Godfrey, Claire N. Harrison, Bethan Psaila, Adam J. Mead, Gabrielle Rees, Gareth D. H. Turner, Jens Rittscher, Daniel RoystonAbstract
The detection and grading of fibrosis in myeloproliferative neoplasms (MPN) is an important component of disease classification, prognostication and disease monitoring. However, current fibrosis grading systems are only semiquantitative and fail to capture sample heterogeneity. To improve the detection, quantitation and representation of reticulin fibrosis, we developed a machine learning (ML) approach using bone marrow trephine (BMT) samples (n = 107) from patients diagnosed with MPN or a reactive / nonneoplastic marrow. The resulting Continuous Indexing of Fibrosis (CIF) enhances the detection and monitoring of fibrosis within BMTs, and aids the discrimination of MPN subtypes. When combined with megakaryocyte feature analysis, CIF discriminates between the frequently challenging differential diagnosis of essential thrombocythemia (ET) and prefibrotic myelofibrosis (prePMF) with high predictive accuracy [area under the curve = 0.94]. CIF also shows significant promise in the identification of MPN patients at risk of disease progression; analysis of samples from 35 patients diagnosed with ET and enrolled in the Primary Thrombocythemia1 (PT1) trial identified features predictive of postET myelofibrosis (area under the curve = 0.77). In addition to these clinical applications, automated analysis of fibrosis has clear potential to further refine disease classification boundaries and inform future studies of the microenvironmental factors driving disease initiation and progression in MPN and other stem cell disorders. The image analysis methods used to generate CIF can be readily integrated with those of other key morphological features in MPNs, including megakaryocyte morphology, that lie beyond the scope of conventional histological assessment. Key PointsMachine learning enables an objective and quantitative description of reticulin fibrosis within the bone marrow of patients with myeloproliferative neoplasms (MPN),Automated analysis and Continuous Indexing of Fibrosis (CIF) captures heterogeneity within MPN samples and has utility in refined classification and disease monitoringQuantitative fibrosis assessment combined with topological data analysis may help to predict patients at increased risk of progression to postET myelofibrosis, and assist in the discrimination of ET and prefibrotic PMF (prePMF) 
Filtration Curves for Graph Representation (2021)
Leslie O'Bray, Bastian Rieck, Karsten BorgwardtAbstract
The two predominant approaches to graph comparison in recent years are based on (i) enumerating matching subgraphs or (ii) comparing neighborhoods of nodes. In this work, we complement these two perspectives with a third way of representing graphs: using filtration curves from topological data analysis that capture both edge weight information and global graph structure. Filtration curves are highly efficient to compute and lead to expressive representations of graphs, which we demonstrate on graph classification benchmark datasets. Our work opens the door to a new form of graph representation in data mining. 
DataDriven and Automatic Surface Texture Analysis Using Persistent Homology (2021)
Melih C. Yesilli, Firas A. KhasawnehAbstract
Surface roughness plays an important role in analyzing engineering surfaces. It quantifies the surface topography and can be used to determine whether the resulting surface finish is acceptable or not. Nevertheless, while several existing tools and standards are available for computing surface roughness, these methods rely heavily on user input thus slowing down the analysis and increasing manufacturing costs. Therefore, fast and automatic determination of the roughness level is essential to avoid costs resulting from surfaces with unacceptable finish, and userintensive analysis. In this study, we propose a Topological Data Analysis (TDA) based approach to classify the roughness level of synthetic surfaces using both their areal images and profiles. We utilize persistent homology from TDA to generate persistence diagrams that encapsulate information on the shape of the surface. We then obtain feature matrices for each surface or profile using Carlsson coordinates, persistence images, and template functions. We compare our results to two widely used methods in the literature: Fast Fourier Transform (FFT) and Gaussian filtering. The results show that our approach yields mean accuracies as high as 97%. We also show that, in contrast to existing surface analysis tools, our TDAbased approach is fully automatable and provides adaptive feature extraction. 
Topological Graph Neural Networks (2021)
Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten BorgwardtAbstract
Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the WeisfeilerLehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on realworld data. 
Topological Data Analysis of C. Elegans Locomotion and Behavior (2021)
Ashleigh Thomas, Kathleen Bates, Alex Elchesen, Iryna Hartsock, Hang Lu, Peter BubenikAbstract
Video of nematodes/roundworms was analyzed using persistent homology to study locomotion and behavior. In each frame, an organism's body posture was represented by a highdimensional vector. By concatenating points in fixedduration segments of this time series, we created a sliding window embedding (sometimes called a time delay embedding) where each point corresponds to a sequence of postures of an organism. Persistent homology on the points in this time series detected behaviors and comparisons of these persistent homology computations detected variation in their corresponding behaviors. We used average persistence landscapes and machine learning techniques to study changes in locomotion and behavior in varying environments. 
A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)
Xiaoyu Zhang, Takanori Fujiwara, Senthil Chandrasegaran, Michael P. Brundage, Thurston Sexton, Alden Dima, KwanLiu MaAbstract
Analysis of large, highdimensional, and heterogeneous datasets is challenging as no one technique is suitable for visualizing and clustering such data in order to make sense of the underlying information. For instance, heterogeneous logs detailing machine repair and maintenance in an organization often need to be analyzed to diagnose errors and identify abnormal patterns, formalize rootcause analyses, and plan preventive maintenance. Such realworld datasets are also beset by issues such as inconsistent and/or missing entries. To conduct an effective diagnosis, it is important to extract and understand patterns from the data with support from analytic algorithms (e.g., finding that certain kinds of machine complaints occur more in the summer) while involving the humanintheloop. To address these challenges, we adopt existing techniques for dimensionality reduction (DR) and clustering of numerical, categorical, and text data dimensions, and introduce a visual analytics approach that uses multiple coordinated views to connect DR + clustering results across each kind of the data dimension stated. To help analysts label the clusters, each clustering view is supplemented with techniques and visualizations that contrast a cluster of interest with the rest of the dataset. Our approach assists analysts to make sense of machine maintenance logs and their errors. Then the gained insights help them carry out preventive maintenance. We illustrate and evaluate our approach through use cases and expert studies respectively, and discuss generalization of the approach to other heterogeneous data. 
Topological Regularization for Dense Prediction (2021)
Deqing Fu, Bradley J. NelsonAbstract
Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with these topological descriptions. Experimental results show that the output topology can also appear in the internal activations of trained neural networks which allows for a novel use of topological regularization to the internal states of neural networks during training, reducing the computational cost of the regularization. We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures. 
Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)
Bryan Bischof, Eric BunchAbstract
We experimentally investigate a collection of feature engineering pipelines for use with a CNN for classifying eyesopen or eyesclosed from electroencephalogram (EEG) timeseries from the Bonn dataset. Using the Takens' embeddinga geometric representation of timeserieswe construct simplicial complexes from EEG data. We then compare \$\epsilon\$series of Bettinumbers and \$\epsilon\$series of graph spectra (a novel construction)two topological invariants of the latent geometry from these complexesto raw time series of the EEG to fill in a gap in the literature for benchmarking. These methods, inspired by Topological Data Analysis, are used for feature engineering to capture local geometry of the timeseries. Additionally, we test these feature pipelines' robustness to downsampling and data reduction. This paper seeks to establish clearer expectations for both timeseries classification via geometric features, and how CNNs for timeseries respond to data of degraded resolution. 
TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through TopologyBased Machine Learning (2021)
Parker Edwards, Kristen Skruber, Nikola Milićević, James B. Heidings, TracyAnn Read, Peter Bubenik, Eric A. VitriolAbstract
Recent advances in machine learning have greatly enhanced automatic methods to extract information from fluorescence microscopy data. However, current machinelearningbased models can require hundreds to thousands of images to train, and the most readily accessible models classify images without describing which parts of an image contributed to classification. Here, we introduce TDAExplore, a machine learning image analysis pipeline based on topological data analysis. It can classify different types of cellular perturbations after training with only 20–30 highresolution images and performs robustly on images from multiple subjects and microscopy modes. Using only images and wholeimage labels for training, TDAExplore provides quantitative, spatial information, characterizing which image regions contribute to classification. Computational requirements to train TDAExplore models are modest and a standard PC can perform training with minimal user input. TDAExplore is therefore an accessible, powerful option for obtaining quantitative information about imaging data in a wide variety of applications. 
Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology (2021)
David Pérez Fernández, Asier GutiérrezFandiño, Jordi ArmengolEstapé, Marta VillegasAbstract
Artificial Neural Networks (ANNs) are widely used for approximating complex functions. The process that is usually followed to define the most appropriate architecture for an ANN given a specific function is mostly empirical. Once this architecture has been defined, weights are usually optimized according to the error function. On the other hand, we observe that ANNs can be represented as graphs and their topological 'fingerprints' can be obtained using Persistent Homology (PH). In this paper, we describe a proposal focused on designing more principled architecture search procedures. To do this, different architectures for solving problems related to a heterogeneous set of datasets have been analyzed. The results of the evaluation corroborate that PH effectively characterizes the ANN invariants: when ANN density (layers and neurons) or sample feeding order is the only difference, PH topological invariants appear; in the opposite direction in different subproblems (i.e. different labels), PH varies. This approach based on topological analysis helps towards the goal of designing more principled architecture search procedures and having a better understanding of ANNs. 
Persistent Homology Based Graph Convolution Network for FineGrained 3D Shape Segmentation (2021)
ChiChong Wong, ChiMan VongAbstract
Finegrained 3D segmentation is an important task in 3D object understanding, especially in applications such as intelligent manufacturing or parts analysis for 3D objects. However, many challenges involved in such problem are yet to be solved, such as i) interpreting the complex structures located in different regions for 3D objects; ii) capturing finegrained structures with sufficient topology correctness. Current deep learning and graph machine learning methods fail to tackle such challenges and thus provide inferior performance in finegrained 3D analysis. In this work, methods in topological data analysis are incorporated with geometric deep learning model for the task of finegrained segmentation for 3D objects. We propose a novel neural network model called Persistent Homology based Graph Convolution Network (PHGCN), which i) integrates persistent homology into graph convolution network to capture multiscale structural information that can accurately represent complex structures for 3D objects; ii) applies a novel Persistence Diagram Loss (ℒPD) that provides sufficient topology correctness for segmentation over the finegrained structures. Extensive experiments on finegrained 3D segmentation validate the effectiveness of the proposed PHGCN model and show significant improvements over current stateoftheart methods. 
The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)
Salvador Chulián, Bernadette J. Stolz, Álvaro MartínezRubio, Cristina Blázquez Goñi, Juan F. Rodríguez Gutiérrez, Teresa Caballero Velázquez, Águeda Molinos Quintana, Manuel Ramírez Orellana, Ana Castillo Robleda, José Luis Fuster Soler, Alfredo Minguela Puras, María Victoria Martínez Sánchez, María Rosa, Víctor M. PérezGarcía, Helen ByrneAbstract
Acute Lymphoblastic Leukaemia (ALL) is the most frequent paediatric cancer. Modern therapies have improved survival rates, but approximately 1520 % of patients relapse. At present, patients’ risk of relapse are assessed by projecting highdimensional flow cytometry data onto a subset of biomarkers and manually estimating the shape of this reduced data. Here, we apply methods from topological data analysis (TDA), which quantify shape in data via features such as connected components and loops, to pretreatment ALL datasets with known outcomes. We combine these fully unsupervised analyses with machine learning to identify features in the pretreatment data that are prognostic for risk of relapse. We find significant topological differences between relapsing and nonrelapsing patients and confirm the predictive power of CD10, CD20, CD38, and CD45. Further, we are able to use the TDA descriptors to predict patients who relapsed. We propose three prognostic pipelines that readily extend to other haematological malignancies. Teaser Topology reveals features in flow cytometry data which predict relapse of patients with acute lymphoblastic leukemia 
Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)
Nicole Bussola, Bruno Papa, Ombretta Melaiu, Aurora Castellano, Doriana Fruci, Giuseppe JurmanAbstract
We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a UNet with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 Tcell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical densitybased spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the UNet model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform. 
A Topological Framework for Deep Learning (2020)
Mustafa Hajij, Kyle IstvanAbstract
We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective. 
Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)
Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek ZadroznyAbstract
While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TFIDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TFIDF representations, we analyze the topology of the graph whose vertices come from the TFIDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier. 
Cell Complex Neural Networks (2020)
Mustafa Hajij, Kyle Istvan, Ghada ZamzamiAbstract
Cell complexes are topological spaces constructed from simple blocks called cells. They generalize graphs, simplicial complexes, and polyhedral complexes that form important domains for practical applications. We propose a general, combinatorial, and unifying construction for performing neural networktype computations on cell complexes. Furthermore, we introduce intercellular message passing schemes, message passing schemes on cell complexes that take the topology of the underlying space into account. In particular, our method generalizes many of the most popular types of graph neural networks. 
Interpretable Phase Detection and Classification With Persistent Homology (2020)
Alex Cole, Gregory J. Loges, Gary ShiuAbstract
We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for conducting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortexantivortex structure are identified as relevant features for characterizing phase transitions. 
Graph Filtration Learning (2020)
Christoph Hofer, Florian Graf, Bastian Rieck, Marc Niethammer, Roland KwittAbstract
We propose an approach to learning with graphstructured data in the problem domain of graph classification. In particular, we present a novel type of readout operation to aggregate node features into a graphlevel representation. To this end, we leverage persistent homology computed via a realvalued, learnable, filter function. We establish the theoretical foundation for differentiating through the persistent homology computation. Empirically, we show that this type of readout operation compares favorably to previous techniques, especially when the graph connectivity structure is informative for the learning problem. 
Simplicial Neural Networks (2020)
Stefania Ebli, Michaël Defferrard, Gard SpreemannAbstract
We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multidimensional extensions of graphs that encode not only pairwise relationships but also higherorder interactions between vertices  allowing us to consider richer data, including vector fields and \$n\$fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes. 
Can Neural Networks Learn Persistent Homology Features? (2020)
Guido Montúfar, Nina Otter, Yuguang WangAbstract
Topological data analysis uses tools from topology  the mathematical area that studies shapes  to create representations of data. In particular, in persistent homology, one studies oneparameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the oneparameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks. 
Topological Autoencoders (2020)
Michael Moor, Max Horn, Bastian Rieck, Karsten BorgwardtAbstract
We propose a novel approach for preserving topological structures of the input space in latent representations of autoencoders. Using persistent homology, a technique from topological data analysis, we calculate topological signatures of both the input and latent space to derive a topological loss term. Under weak theoretical assumptions, we construct this loss in a differentiable manner, such that the encoding learns to retain multiscale connectivity information. We show that our approach is theoretically wellfounded and that it exhibits favourable latent representations on a synthetic manifold as well as on realworld image data sets, while preserving low reconstruction errors. 
Topologically Densified Distributions (2020)
Christoph Hofer, Florian Graf, Marc Niethammer, Roland KwittAbstract
We study regularization in the context of small samplesize learning with overparametrized neural networks. Specifically, we shift focus from architectural properties, such as norms on the network weights, to properties of the internal representations before a linear classifier. Specifically, we impose a topological constraint on samples drawn from the probability measure induced in that space. This provably leads to mass concentration effects around the representations of training instances, i.e., a property beneficial for generalization. By leveraging previous work to impose topological constrains in a neural network setting, we provide empirical evidence (across various vision benchmarks) to support our claim for better generalization. 
Topological Machine Learning for Multivariate Time Series (2020)
Chengyuan Wu, Carol Anne HargreavesAbstract
We develop a framework for analyzing multivariate time series using topological data analysis (TDA) methods. The proposed methodology involves converting the multivariate time series to point cloud data, calculating Wasserstein distances between the persistence diagrams and using the \$k\$nearest neighbors algorithm (\$k\$NN) for supervised machine learning. Two methods (symmetrybreaking and anchor points) are also introduced to enable TDA to better analyze data with heterogeneous features that are sensitive to translation, rotation, or choice of coordinates. We apply our methods to room occupancy detection based on 5 timedependent variables (temperature, humidity, light, CO2 and humidity ratio). Experimental results show that topological methods are effective in predicting room occupancy during a time window. We also apply our methods to an Activity Recognition dataset and obtained good results. 
A Novel Method of Extracting Topological Features From Word Embeddings (2020)
Shafie Gholizadeh, Armin Seyeditabari, Wlodek ZadroznyAbstract
In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding highdimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features. 
Generalized Penalty for Circular Coordinate Representation (2020)
Hengrui Luo, Alice Patania, Jisu Kim, Mikael VejdemoJohanssonAbstract
Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for highdimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account sparsity in highdimensional applications. We use a generalized penalty function instead of an \$L_\2\\$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analysis to support our claim that circular coordinates with generalized penalty will accommodate the sparsity in highdimensional datasets under different sampling schemes while preserving the topological structures. 
Contagion Dynamics for Manifold Learning (2020)
Barbara I. MahlerAbstract
Contagion maps exploit activation times in threshold contagions to assign vectors in highdimensional Euclidean space to the nodes of a network. A point cloud that is the image of a contagion map reflects both the structure underlying the network and the spreading behaviour of the contagion on it. Intuitively, such a point cloud exhibits features of the network's underlying structure if the contagion spreads along that structure, an observation which suggests contagion maps as a viable manifoldlearning technique. We test contagion maps as a manifoldlearning tool on a number of different realworld and synthetic data sets, and we compare their performance to that of Isomap, one of the most wellknown manifoldlearning algorithms. We find that, under certain conditions, contagion maps are able to reliably detect underlying manifold structure in noisy data, while Isomap fails due to noiseinduced error. This consolidates contagion maps as a technique for manifold learning. 
Persistent Homology Advances Interpretable Machine Learning for Nanoporous Materials (2020)
Aditi S. Krishnapriyan, Joseph Montoya, Jens Hummelshøj, Dmitriy MorozovAbstract
Machine learning for nanoporous materials design and discovery has emerged as a promising alternative to more timeconsuming experiments and simulations. The challenge with this approach is the selection of features that enable universal and interpretable materials representations across multiple prediction tasks. We use persistent homology to construct holistic representations of the materials structure. We show that these representations can also be augmented with other generic features such as word embeddings from natural language processing to capture chemical information. We demonstrate our approach on multiple metalorganic framework datasets by predicting a variety of gas adsorption targets. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from commonly used manually curated features. Persistent homology features allow us to locate the pores that correlate best to adsorption at different pressures, contributing to understanding atomic level structureproperty relationships for materials design. 
Quantitative and Interpretable Order Parameters for Phase Transitions From Persistent Homology (2020)
Alex Cole, Gregory J. Loges, Gary ShiuAbstract
We apply modern methods in computational topology to the task of discovering and characterizing phase transitions. As illustrations, we apply our method to four twodimensional lattice spin models: the Ising, square ice, XY, and fullyfrustrated XY models. In particular, we use persistent homology, which computes the births and deaths of individual topological features as a coarsegraining scale or sublevel threshold is increased, to summarize multiscale and highpoint correlations in a spin configuration. We employ vector representations of this information called persistence images to formulate and perform the statistical task of distinguishing phases. For the models we consider, a simple logistic regression on these images is sufficient to identify the phase transition. Interpretable order parameters are then read from the weights of the regression. This method suffices to identify magnetization, frustration, and vortexantivortex structure as relevant features for phase transitions in our models. We also define "persistence" critical exponents and study how they are related to those critical exponents usually considered. 
Topological Echoes of Primordial Physics in the Universe at Large Scales (2020)
Alex Cole, Matteo Biagetti, Gary ShiuAbstract
We present a pipeline for characterizing and constraining initial conditions in cosmology via persistent homology. The cosmological observable of interest is the cosmic web of large scale structure, and the initial conditions in question are nonGaussianities (NG) of primordial density perturbations. We compute persistence diagrams and derived statistics for simulations of dark matter halos with Gaussian and nonGaussian initial conditions. For computational reasons and to make contact with experimental observations, our pipeline computes persistence in subboxes of full simulations and simulations are subsampled to uniform halo number. We use simulations with large NG (\$f_\\rm NL\\textasciicircum\\rm loc\=250\$) as templates for identifying data with mild NG (\$f_\\rm NL\\textasciicircum\\rm loc\=10\$), and running the pipeline on several cubic volumes of size \$40~(\textrm\Gpc/h\)\textasciicircum\3\\$, we detect \$f_\\rm NL\\textasciicircum\\rm loc\=10\$ at \$97.5\%\$ confidence on \$\sim 85\%\$ of the volumes for our best single statistic. Throughout we benefit from the interpretability of topological features as input for statistical inference, which allows us to make contact with previous firstprinciples calculations and make new predictions. 
Fibers of Failure: Classifying Errors in Predictive Processes (2020)
Leo S. Carlsson, Mikael VejdemoJohansson, Gunnar Carlsson, Pär G. JönssonAbstract
Predictive models are used in many different fields of science and engineering and are always prone to make faulty predictions. These faulty predictions can be more or less malignant depending on the model application. We describe fibers of failure (FiFa), a method to classify failure modes of predictive processes. Our method uses Mapper, an algorithm from topological data analysis (TDA), to build a graphical model of input data stratified by prediction errors. We demonstrate two ways to use the failure mode groupings: either to produce a correction layer that adjusts predictions by similarity to the failure modes; or to inspect members of the failure modes to illustrate and investigate what characterizes each failure mode. We demonstrate FiFa on two scenarios: a convolutional neural network (CNN) predicting MNIST images with added noise, and an artificial neural network (ANN) predicting the electrical energy consumption of an electric arc furnace (EAF). The correction layer on the CNN model improved its prediction accuracy significantly while the inspection of failure modes for the EAF model provided guiding insights into the domainspecific reasons behind several higherror regions. 
PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to StructureBased Protein Function Prediction (2020)
Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov, Katherine YelickAbstract
Understanding protein structurefunction relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structurebased prediction schemes. We present PersGNN  an endtoend trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own as well as a baseline neural network that learns from the same information. PersGNN achieves a 9.3% boost in area under the precision recall curve (AUPR) compared to the best individual model, as well as high F1 scores across different gene ontology categories, indicating the transferability of this approach. 
Capturing Dynamics of TimeVarying Data via Topology (2020)
Lu Xian, Henry Adams, Chad M. Topaz, Lori ZiegelmeierAbstract
One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A timevarying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of timevarying metric spaces including vineyards [17], crocker plots [52], and multiparameter rank functions [34]. We then introduce a new tool to summarize timevarying metric spaces: a crocker stack. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable stability property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [54]. Altogether, we aim to bring the broader applied mathematics community uptodate on topological summaries of timevarying metric spaces. 
Uncovering the Topology of TimeVarying fMRI Data Using Cubical Persistence (2020)
Bastian Rieck, Tristan Yates, Christian Bock, Karsten Borgwardt, Guy Wolf, Nicholas TurkBrowne, Smita KrishnaswamyAbstract
Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and persontoperson variation in how information is represented in the brain. To address this challenge, we present a novel topological approach that encodes each time point in an fMRI data set as a persistence diagram of topological features, i.e. highdimensional voids present in the data. This representation naturally does not rely on voxelbyvoxel correspondence and is robust to noise. We show that these timevarying persistence diagrams can be clustered to find meaningful groupings between participants, and that they are also useful in studying withinsubject brain state trajectories of subjects performing a particular task. Here, we apply both clustering and trajectory analysis techniques to a group of participants watching the movie 'Partly Cloudy'. We observe significant differences in both brain state trajectories and overall topological activity between adults and children watching the same movie. 
PINet: A Deep Learning Approach to Extract Topological Persistence Images (2020)
Anirudh Som, Hongjun Choi, Karthikeyan Natesan Ramamurthy, Matthew Buman, Pavan TuragaAbstract
Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in realworld data, such as viewpoint, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difﬁculty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel onestep approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multivariate time series signals as input and another that accepts multichannel images as input. We call these networks Signal PINet and Image PINet respectively. To the best of our knowledge, we are the ﬁrst to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PINet architectures on two applications: human activity recognition using triaxial accelerometer sensor data and image classiﬁcation. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PINet. 
Prediction in Cancer Genomics Using Topological Signatures and Machine Learning (2020)
Georgina Gonzalez, Arina Ushakova, Radmila Sazdanovic, Javier ArsuagaAbstract
Copy Number Aberrations, gains and losses of genomic regions, are a hallmark of cancer and can be experimentally detected using microarray comparative genomic hybridization (aCGH). In previous works, we developed a topology based method to analyze aCGH data whose output are regions of the genome where copy number is altered in patients with a predetermined cancer phenotype. We call this method Topological Analysis of array CGH (TAaCGH). Here we combine TAaCGH with machine learning techniques to build classifiers using copy number aberrations. We chose logistic regression on two different binary phenotypes related to breast cancer to illustrate this approach. The first case consists of patients with overexpression of the ERBB2 gene. Overexpression of ERBB2 is commonly regulated by a copy number gain in chromosome arm 17q. TAaCGH found the region 17q11q22 associated with the phenotype and using logistic regression we reduced this region to 17q12q21.31 correctly classifying 78% of the ERBB2 positive individuals (sensitivity) in a validation data set. We also analyzed overexpression in Estrogen Receptor (ER), a second phenotype commonly observed in breast cancer patients and found that the region 5p14.312 together with six full arms were associated with the phenotype. Our method identified 4p, 6p and 16q as the strongest predictors correctly classifying 76% of ER positives in our validation data set. However, for this set there was a significant increase in the false positive rate (specificity). We suggest that topological and machine learning methods can be combined for prediction of phenotypes using genetic data. 
Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials (2020)
Aditi S. Krishnapriyan, Maciej Haranczyk, Dmitriy MorozovAbstract
Machine learning has emerged as an attractive alternative to experiments and simulations for predicting material properties. Usually, such an approach relies on specific domain knowledge for feature design: each learning target requires careful selection of features that an expert recognizes as important for the specific task. The major drawback of this approach is that computation of only a few structural features has been implemented so far, and it is difficult to tell a priori which features are important for a particular application. The latter problem has been empirically observed for predictors of guest uptake in nanoporous materials: local and global porosity features become dominant descriptors at low and high pressures, respectively. We investigate a feature representation of materials using tools from topological data analysis. Specifically, we use persistent homology to describe the geometry of nanoporous materials at various scales. We combine our topological descriptor with traditional structural features and investigate the relative importance of each to the prediction tasks. We demonstrate an application of this feature representation by predicting methane adsorption in zeolites, for pressures in the range of 1200 bar. Our results not only show a considerable improvement compared to the baseline, but they also highlight that topological features capture information complementary to the structural features: this is especially important for the adsorption at low pressure, a task particularly difficult for the traditional features. Furthermore, by investigation of the importance of individual topological features in the adsorption model, we are able to pinpoint the location of the pores that correlate best to adsorption at different pressure, contributing to our atomlevel understanding of structureproperty relationships. 
Steinhaus Filtration and Stable Paths in the Mapper (2020)
Dustin L. Arendt, Matthew Broussard, Bala Krishnamoorthy, Nathaniel SaulAbstract
Two central concepts from topological data analysis are persistence and the Mapper construction. Persistence employs a sequence of objects built on data called a filtration. A Mapper produces insightful summaries of data, and has found widespread applications in diverse areas. We define a new filtration called the cover filtration built from a single cover based on a generalized Steinhaus distance, which is a generalization of Jaccard distance. We prove a stability result: the cover filtrations of two covers are \$\alpha/m\$ interleaved, where \$\alpha\$ is a bound on bottleneck distance between covers and \$m\$ is the size of smallest set in either cover. We also show our construction is equivalent to the Cech filtration under certain settings, and the VietorisRips filtration completely determines the cover filtration in all cases. We then develop a theory for stable paths within this filtration. Unlike standard results on stability in topological persistence, our definition of path stability aligns exactly with the above result on stability of cover filtration. We demonstrate how our framework can be employed in a variety of applications where a metric is not obvious but a cover is readily available. First we present a new model for recommendation systems using cover filtration. For an explicit example, stable paths identified on a movies data set represent sequences of movies constituting gentle transitions from one genre to another. As a second application in explainable machine learning, we apply the Mapper for model induction, providing explanations in the form of paths between subpopulations. Stable paths in the Mapper from a supervised machine learning model trained on the FashionMNIST data set provide improved explanations of relationships between subpopulations of images. 
Text Classification via Network Topology: A Case Study on the Holy Quran (2019)
Mehmet Emin Aktas, Esra AkbasAbstract
Due to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach. 
Topological Feature Vectors for Chatter Detection in Turning Processes (2019)
Melih C. Yesilli, Firas A. Khasawneh, Andreas Otto 
Identification of CoenzymeBinding Proteins With Machine Learning Algorithms (2019)
Yong Liu, Cristian R. Munteanu, Zhiwei Kong, Tao Ran, Alfredo SahagúnRuiz, Zhixiong He, Chuanshe Zhou, Zhiliang Tan 
Persistent Homology for the Automatic Classification of Prostate Cancer Aggressiveness in Histopathology Images (2019)
Peter Lawson, Jordan Schupbach, Brittany Terese Fasy, John W. Sheppard 
Topological Machine Learning With Persistence Indicator Functions (2019)
Bastian Rieck, Filip Sadlo, Heike LeitteAbstract
Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distancebased data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernelbased similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data. 
Persistent Homology Machine Learning for Fingerprint Classification (2019)
N. Giansiracusa, R. Giansiracusa, C. MoonAbstract
The fingerprint classification problem is to sort fingerprints into predetermined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near stateoftheart classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint inkroll images, which yields a lower accuracy rate but still shows promise; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD27. 
Hyperparameter Optimization of Topological Features for Machine Learning Applications (2019)
Francis Motta, Christopher Tralie, Rossella Bedini, Fabiano Bini, Gilberto Bini, Hamed Eramian, Marcio Gameiro, Steve Haase, Hugh Haddox, John Harer, Nick Leiby, Franco Marinozzi, Scott Novotney, Gabe Rocklin, Jed Singer, Devin Strickland, Matt VaughnAbstract
This paper describes a general pipeline for generating optimal vector representations of topological features of data for use with machine learning algorithms. This pipeline can be viewed as a costly blackbox function defined over a complex configuration space, each point of which specifies both how features are generated and how predictive models are trained on those features. We propose using stateoftheart Bayesian optimization algorithms to inform the choice of topological vectorization hyperparameters while simultaneously choosing learning model parameters. We demonstrate the need for and effectiveness of this pipeline using two difficult biological learning problems, and illustrate the nontrivial interactions between topological feature generation and learning model hyperparameters. 
Fast and Accurate Tumor Segmentation of Histology Images Using Persistent Homology and Deep Convolutional Features (2019)
Talha Qaiser, YeeWah Tsang, Daiki Taniyama, Naoya Sakamoto, Kazuaki Nakane, David Epstein, Nasir RajpootAbstract
Tumor segmentation in wholeslide images of histology slides is an important step towards computerassisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. We propose an efficient way of computing topological persistence of an image, alternative to simplicial homology. The PHPs are devised to distinguish tumor regions from their normal counterparts by modeling the atypical characteristics of tumor nuclei. We propose two variants of our method for tumor segmentation: one that targets speed without compromising accuracy and the other that targets higher accuracy. The fast version is based on a selection of exemplar image patches from a convolution neural network (CNN) and patch classification by quantifying the divergence between the PHPs of exemplars and the input image patch. Detailed comparative evaluation shows that the proposed algorithm is significantly faster than competing algorithms while achieving comparable results. The accurate version combines the PHPs and highlevel CNN features and employs a multistage ensemble strategy for image patch labeling. Experimental results demonstrate that the combination of PHPs and CNN features outperform competing algorithms. This study is performed on two independently collected colorectal datasets containing adenoma, adenocarcinoma, signet, and healthy cases. Collectively, the accurate tumor segmentation produces the highest average patchlevel F1score, as compared with competing algorithms, on malignant and healthy cases from both the datasets. Overall the proposed framework highlights the utility of persistent homology for histopathology image analysis. 
An Industry Case of LargeScale Demand Forecasting of Hierarchical Components (2019)
Rodrigo RiveraCastro, Ivan Nazarov, Yuke Xiang, Ivan Maksimov, Aleksandr Pletnev, Evgeny BurnaevAbstract
Demand forecasting of hierarchical components is essential in manufacturing. However, its discussion in the machinelearning literature has been limited, and judgemental forecasts remain pervasive in the industry. Demand planners require easytounderstand tools capable of delivering stateoftheart results. This work presents an industry case of demand forecasting at one of the largest manufacturers of electronics in the world. It seeks to support practitioners with five contributions: (1) A benchmark of fourteen demand forecast methods applied to a relevant data set, (2) A data transformation technique yielding comparable results with state of the art, (3) An alternative to ARIMA based on matrix factorization, (4) A model selection technique based on topological data analysis for time series and (5) A novel data set. Organizations seeking to upskill existing personnel and increase forecast accuracy will find value in this work. 
Analyzing Collective Motion With Machine Learning and Topology (2019)
Dhananjay Bhaskar, Angelika Manhart, Jesse Milzman, John T. Nardini, Kathleen M. Storey, Chad M. Topaz, Lori ZiegelmeierAbstract
We use topological data analysis and machine learning to study a seminal model of collective motion in biology [M. R. D’Orsogna et al., Phys. Rev. Lett. 96, 104302 (2006)]. This model describes agents interacting nonlinearly via attractiverepulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based on topology that summarize the timevarying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters. 
A Topological Data Analysis Based Classification Method for Multiple Measurements (2019)
Henri Riihimäki, Wojciech Chachólski, Jakob Theorell, Jan Hillert, Ryan RamanujamAbstract
\textlessh3\textgreaterAbstract\textless/h3\textgreater \textlessh3\textgreaterBackground\textless/h3\textgreater \textlessp\textgreaterMachine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values.\textless/p\textgreater\textlessh3\textgreaterResults\textless/h3\textgreater \textlessp\textgreaterFor 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model.\textless/p\textgreater\textlessh3\textgreaterConclusions\textless/h3\textgreater \textlessp\textgreaterThis algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.\textless/p\textgreater 
Hepatic Tumor Classification Using Texture and Topology Analysis of NonContrastEnhanced ThreeDimensional T1Weighted MR Images With a Radiomics Approach (2019)
Asuka Oyama, Yasuaki Hiraoka, Ippei Obayashi, Yusuke Saikawa, Shigeru Furui, Kenshiro Shiraishi, Shinobu Kumagai, Tatsuya Hayashi, Jun’ichi KotokuAbstract
The purpose of this study is to evaluate the accuracy for classification of hepatic tumors by characterization of T1weighted magnetic resonance (MR) images using two radiomics approaches with machine learning models: texture analysis and topological data analysis using persistent homology. This study assessed noncontrastenhanced fatsuppressed threedimensional (3D) T1weighted images of 150 hepatic tumors. The lesions included 50 hepatocellular carcinomas (HCCs), 50 metastatic tumors (MTs), and 50 hepatic hemangiomas (HHs) found respectively in 37, 23, and 33 patients. For classification, texture features were calculated, and also persistence images of three types (degree 0, degree 1 and degree 2) were obtained for each lesion from the 3D MR imaging data. We used three classification models. In the classification of HCC and MT (resp. HCC and HH, HH and MT), we obtained accuracy of 92% (resp. 90%, 73%) by texture analysis, and the highest accuracy of 85% (resp. 84%, 74%) when degree 1 (resp. degree 1, degree 2) persistence images were used. Our methods using texture analysis or topological data analysis allow for classification of the three hepatic tumors with considerable accuracy, and thus might be useful when applied for computeraided diagnosis with MR images. 
Protein Classification With Improved Topological Data Analysis (2018)
Tamal K. Dey, Sayan Mandal 
Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)
Zixuan Cang, Lin Mu, GuoWei WeiAbstract
This work introduces a number of algebraic topology approaches, including multicomponent persistent homology, multilevel persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multilevel persistent homology enables a tailored topological description of inter and/or intramolecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including knearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for proteinligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 proteinligand complexes from the PDBBind database and 128,374 ligandtarget and decoytarget pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in proteinligand binding affinity prediction and liganddecoy discrimination. 
Chatter Classification in Turning Using Machine Learning and Topological Data Analysis (2018)
Firas A. Khasawneh, Elizabeth Munch, Jose A. PereaAbstract
Chatter identification and detection in machining processes has been an active area of research in the past two decades. Part of the challenge in studying chatter is that machining equations that describe its occurrence are often nonlinear delay differential equations. The majority of the available tools for chatter identification rely on defining a metric that captures the characteristics of chatter, and a threshold that signals its occurrence. The difficulty in choosing these parameters can be somewhat alleviated by utilizing machine learning techniques. However, even with a successful classification algorithm, the transferability of typical machine learning methods from one data set to another remains very limited. In this paper we combine supervised machine learning with Topological Data Analysis (TDA) to obtain a descriptor of the process which can detect chatter. The features we use are derived from the persistence diagram of an attractor reconstructed from the time series via Takens embedding. We test the approach using deterministic and stochastic turning models, where the stochasticity is introduced via the cutting coefficient term. Our results show a 97% successful classification rate on the deterministic model labeled by the stability diagram obtained using the spectral element method. The features gleaned from the deterministic model are then utilized for characterization of chatter in a stochastic turning model where there are very limited analysis methods. 
Deep Learning With Topological Signatures (2017)
Christoph Hofer, Roland Kwitt, Marc Niethammer, Andreas Uhl 
Persistence Images: A Stable Vector Representation of Persistent Homology (2017)
Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, Lori ZiegelmeierAbstract
Many data sets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a data set. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finitedimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vectorbased machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic KuramotoSivashinsky equation) provide a novel application of the discriminatory power of PIs.Community Resources

The Classification of Endoscopy Images With Persistent Homology (2016)
Olga Dunaeva, Herbert Edelsbrunner, Anton Lukyanov, Michael Machin, Daria Malkova, Roman Kuvaev, Sergey KashinAbstract
Aiming at the automatic diagnosis of tumors using narrow band imaging (NBI) magnifying endoscopic (ME) images of the stomach, we combine methods from image processing, topology, geometry, and machine learning to classify patterns into three classes: oval, tubular and irregular. Training the algorithm on a small number of images of each type, we achieve a high rate of correct classifications. The analysis of the learning algorithm reveals that a handful of geometric and topological features are responsible for the overwhelming majority of decisions. 
OmicsBased Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations (2016)
Abdellah Tebani, Carlos Afonso, Stéphane Marret, Soumeya BekriAbstract
The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of nextgeneration diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by highthroughput omics technologies and the computational surge, it enables multiscale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. Highthroughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable highdimensional data modeling and, therefore, accessible and userfriendly visualization. Furthermore, bioinformatics has enabled comprehensive multiomics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present stateoftheart multiomics data analysis strategies in a clinical context. The challenges of omicsbased biomarker translation are discussed. Perspectives regarding the use of multiomics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the postgenomic era.