Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)

Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny

Abstract

Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.

Using Topological Data Analysis for Text Classification (2018)

Pratik Doshi

Community Resources

Code

Export citation

Stochastic Multiresolution Persistent Homology Kernel. (2016)

Xiaojin Zhu, Ara Vartanian, Manish Bansal, Duy Nguyen, Luke Brandl

Export citation

Differentiable Euler Characteristic Transforms for Shape Classification (2023)

Ernst Röell, Bastian Rieck

Abstract

The _Euler Characteristic Transform_ (ECT) is a powerful invariant, combining geometrical and topological characteristics of shapes and graphs. However, the ECT was hitherto unable to learn task-specific representations. We overcome this issue and develop a novel computational layer that enables learning the ECT in an end-to-end fashion. Our method, the _Differentiable Euler Characteristic Transform_ (DECT) is fast and computationally efficient, while exhibiting performance on a par with more complex models in both graph and point cloud classification tasks. Moreover, we show that this seemingly simple statistic provides the same topological expressivity as more complex topological deep learning layers.

A Proof-of-Concept Investigation Into Predicting Follicular Carcinoma on Ultrasound Using Topological Data Analysis and Radiomics (2025)

Andrew M. Thomas, Ann C. Lin, Grace Deng, Yuchen Xu, Gustavo Fernandez Ranvier, Aida Taye, David S. Matteson, Denise Lee

Abstract

Background Sonographic risk patterns identified in established risk stratification systems (RSS) may not accurately stratify follicular carcinoma from adenoma, which share many similar US characteristics. The purpose of this study is to investigate the performance of a multimodal machine learning model utilizing radiomics and topological data analysis (TDA) to predict malignancy in follicular thyroid neoplasms on ultrasound. Patients & Methods This is a retrospective study of patients who underwent thyroidectomy with pathology confirmed follicular adenoma or carcinoma at a single academic medical center between 2010 and 2022. Features derived from radiomics and TDA were calculated from processed ultrasound images and high-dimensional features in each modality were projected onto their first two principal components. Logistic regression with L2 penalty was used to predict malignancy and performance was evaluated using leave-one-out cross-validation and area under the curve (AUC). Results Patients with follicular adenomas (n = 7) and follicular carcinomas (n = 11) with available imaging were included. The best multimodal model achieved an AUC of 0.88 (95% CI: [0.85, 1]), whereas the best radiomics model achieved an AUC of 0.68 (95% CI: [0.61, 0.84]). Conclusions We demonstrate that inclusion of topological features yields strong improvement over radiomics-based features alone in the prediction of follicular carcinoma on ultrasound. Despite low volume data, the TDA features explicitly capture shape information that likely augments performance of the multimodal machine learning model. This approach suggests that a quantitative based US RSS may contribute to the preoperative prediction of follicular carcinoma.

Community Resources

Code

Persistent Homology Machine Learning for Fingerprint Classification (2019)

N. Giansiracusa, R. Giansiracusa, C. Moon

Abstract

The fingerprint classification problem is to sort fingerprints into predetermined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near state-of-the-art classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3-dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint ink-roll images, which yields a lower accuracy rate but still shows promise; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD-27.

Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)

Kelin Xia, Zhixiong Zhao, Guo-Wei Wei

Abstract

Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)

Rayna Andreeva, Anwesha Sarkar, Rik Sarkar

Abstract

The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics.

Community Resources

Code

Protein Classification With Improved Topological Data Analysis (2018)

Tamal K. Dey, Sayan Mandal

Identification of Coenzyme-Binding Proteins With Machine Learning Algorithms (2019)

Yong Liu, Cristian R. Munteanu, Zhiwei Kong, Tao Ran, Alfredo Sahagún-Ruiz, Zhixiong He, Chuanshe Zhou, Zhiliang Tan

Export citation

Reconstructing Linearly Embedded Graphs: A First Step to Stratified Space Learning (2021)

Yossi Bokor, Christopher Williams, Katharine Turner

Community Resources

Code (GitHub)

Deep Learning With Topological Signatures (2017)

Christoph Hofer, Roland Kwitt, Marc Niethammer, Andreas Uhl

TopoResNet: A Hybrid Deep Learning Architecture and Its Application to Skin Lesion Classification (2019)

Yu-Min Chung, Chuan-Shen Hu, Austin Lawson, Clifford Smyth

Community Resources

Data

Export citation

Automatic Detection of Image Morphing by Topology-Based Analysis (2018)

Sabah Jassim, Aras Asaad

Export citation

Topological Approaches to Skin Disease Image Analysis (2018)

Yu-Min Chung, Chuan-Shen Hu, Austin Lawson, Clifford Smyth

Export citation

An Introduction to a New Text Classification and Visualization for Natural Language Processing Using Topological Data Analysis (2019)

Naiereh Elyasi, Mehdi Hosseini Moghadam

Export citation

Classifying RGB Images With Multi-Colour Persistent Homology (2019)

Wolf Byttner

Export citation

A Study on Topological Descriptors for the Analysis of 3D Surface Texture (2018)

Matthias Zeppelzauer, Bartosz Zieliński, Mateusz Juda, Markus Seidl

Export citation

Curvature Filtrations for Graph Generative Model Evaluation (2023)

Joshua Southern, Jeremy Wayland, Michael Bronstein, Bastian Rieck

Filtration Curves for Graph Representation (2021)

Leslie O'Bray, Bastian Rieck, Karsten Borgwardt

Abstract

The two predominant approaches to graph comparison in recent years are based on (i) enumerating matching subgraphs or (ii) comparing neighborhoods of nodes. In this work, we complement these two perspectives with a third way of representing graphs: using filtration curves from topological data analysis that capture both edge weight information and global graph structure. Filtration curves are highly efficient to compute and lead to expressive representations of graphs, which we demonstrate on graph classification benchmark datasets. Our work opens the door to a new form of graph representation in data mining.

Graph Filtration Learning (2020)

Christoph Hofer, Florian Graf, Bastian Rieck, Marc Niethammer, Roland Kwitt

Abstract

We propose an approach to learning with graph-structured data in the problem domain of graph classification. In particular, we present a novel type of readout operation to aggregate node features into a graph-level representation. To this end, we leverage persistent homology computed via a real-valued, learnable, filter function. We establish the theoretical foundation for differentiating through the persistent homology computation. Empirically, we show that this type of readout operation compares favorably to previous techniques, especially when the graph connectivity structure is informative for the learning problem.

Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

Mehmet Emin Aktas, Esra Akbas

Abstract

Due to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach.

A Persistent Weisfeiler-Lehman Procedure for Graph Classification (2019)

Bastian Rieck, Christian Bock, Karsten Borgwardt

Abstract

The Weisfeiler–Lehman graph kernel exhibits competitive performance in many graph classification tasks. However, its subtree features are not able to capture connected components and cycles, topological features known for characterising graphs. To extract such features, we leverage propagated node label information and transform unweighted graphs into metric ones. This permits us to augment the subtree features with topological information obtained using persistent homology, a concept from topological data analysis. Our method, which we formalise as a generalisation of Weisfeiler–Lehman subtree features, exhibits favourable classification accuracy and its improvements in predictive performance are mainly driven by including cycle information.

Filtration Surfaces for Dynamic Graph Classification (2023)

Franz Srambical, Bastian Rieck

Abstract

Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation among similarly scalable methods.

Topological Machine Learning With Persistence Indicator Functions (2019)

Bastian Rieck, Filip Sadlo, Heike Leitte

Abstract

Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distance-based data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernel-based similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data.

Topological Graph Neural Networks (2021)

Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten Borgwardt

Abstract

Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data.

A Stable Multi-Scale Kernel for Topological Machine Learning (2015)

Jan Reininghaus, Stefan Huber, Ulrich Bauer, Roland Kwitt

Abstract

Topological data analysis offers a rich source of valuable information to study vision problems. Yet, so far we lack a theoretically sound connection to popular kernel-based learning techniques, such as kernel SVMs or kernel PCA. In this work, we establish such a connection by designing a multi-scale kernel for persistence diagrams, a stable summary representation of topological features in data. We show that this kernel is positive definite and prove its stability with respect to the 1-Wasserstein distance. Experiments on two benchmark datasets for 3D shape classification/retrieval and texture recognition show considerable performance gains of the proposed method compared to an alternative approach that is based on the recently introduced persistence landscapes.

Euler Characteristic Surfaces (2021)

Gabriele Beltramo, Rayna Andreeva, Ylenia Giarratano, Miguel O. Bernabeu, Rik Sarkar, Primoz Skraba

Abstract

We study the use of the Euler characteristic for multiparameter topological data analysis. Euler characteristic is a classical, well-understood topological invariant that has appeared in numerous applications, including in the context of random fields. The goal of this paper is to present the extension of using the Euler characteristic in higher-dimensional parameter spaces. While topological data analysis of higher-dimensional parameter spaces using stronger invariants such as homology continues to be the subject of intense research, Euler characteristic is more manageable theoretically and computationally, and this analysis can be seen as an important intermediary step in multi-parameter topological data analysis. We show the usefulness of the techniques using artificially generated examples, and a real-world application of detecting diabetic retinopathy in retinal images.

A Topological Framework for Deep Learning (2020)

Mustafa Hajij, Kyle Istvan

Abstract

We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective.

A Novel Method of Extracting Topological Features From Word Embeddings (2020)

Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny

Abstract

In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.

Unsupervised Topological Learning for Identification of Atomic Structures (2022)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

We propose an unsupervised learning methodology with descriptors based on topological data analysis (TDA) concepts to describe the local structural properties of materials at the atomic scale. Based only on atomic positions and without a priori knowledge, our method allows for an autonomous identification of clusters of atomic structures through a Gaussian mixture model. We apply successfully this approach to the analysis of elemental Zr in the crystalline and liquid states as well as homogeneous nucleation events under deep undercooling conditions. This opens the way to deeper and autonomous study of complex phenomena in materials at the atomic scale.

Simplicial Representation Learning With Neural \$K\$-Forms (2023)

Kelly Maggs, Celia Hacker, Bastian Rieck

Abstract

Geometric deep learning extends deep learning to incorporate information about the geometry and topology data, especially in complex domains like graphs. Despite the popularity of message passing in this field, it has limitations such as the need for graph rewiring, ambiguity in interpreting data, and over-smoothing. In this paper, we take a different approach, focusing on leveraging geometric information from simplicial complexes embedded in \$\mathbb\R\\textasciicircumn\$ using node coordinates. We use differential \$k\$-forms in \$\mathbb\R\\textasciicircumn\$ to create representations of simplices, offering interpretability and geometric consistency without message passing. This approach also enables us to apply differential geometry tools and achieve universal approximation. Our method is efficient, versatile, and applicable to various input complexes, including graphs, simplicial complexes, and cell complexes. It outperforms existing message passing neural networks in harnessing information from geometrical graphs with node features serving as coordinates.

Graph Classification via Heat Diffusion on Simplicial Complexes (2020)

Mehmet Emin Aktas, Esra Akbas

Abstract

In this paper, we study the graph classification problem in vertex-labeled graphs. Our main goal is to classify the graphs comparing their higher-order structures thanks to heat diffusion on their simplices. We first represent vertex-labeled graphs as simplex-weighted super-graphs. We then define the diffusion Frechet function over their simplices to encode the higher-order network topology and finally reach our goal by combining the function values with machine learning algorithms. Our experiments on real-world bioinformatics networks show that using diffusion Fr\éḩet function on simplices is promising in graph classification and more effective than the baseline methods. To the best of our knowledge, this paper is the first paper in the literature using heat diffusion on higher-dimensional simplices in a graph mining problem. We believe that our method can be extended to different graph mining domains, not only the graph classification problem.

Topological Attention for Time Series Forecasting (2021)

Sebastian Zeng, Florian Graf, Christoph Hofer, Roland Kwitt

Abstract

The problem of (point) forecasting univariate time series is considered. Most approaches, ranging from traditional statistical methods to recent learning-based techniques with neural networks, directly operate on raw time series observations. As an extension, we study whether local topological properties, as captured via persistent homology, can serve as a reliable signal that provides complementary information for learning to forecast. To this end, we propose topological attention, which allows attending to local topological features within a time horizon of historical data. Our approach easily integrates into existing end-to-end trainable forecasting models, such as N-BEATS, and, in combination with the latter exhibits state-of-the-art performance on the large-scale M4 benchmark dataset of 100,000 diverse time series from different domains. Ablation experiments, as well as a comparison to recent techniques in a setting where only a single time series is available for training, corroborate the beneficial nature of including local topological information through an attention mechanism.

Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)

Bryan Bischof, Eric Bunch

Abstract

We experimentally investigate a collection of feature engineering pipelines for use with a CNN for classifying eyes-open or eyes-closed from electroencephalogram (EEG) time-series from the Bonn dataset. Using the Takens' embedding--a geometric representation of time-series--we construct simplicial complexes from EEG data. We then compare \$\epsilon\$-series of Betti-numbers and \$\epsilon\$-series of graph spectra (a novel construction)--two topological invariants of the latent geometry from these complexes--to raw time series of the EEG to fill in a gap in the literature for benchmarking. These methods, inspired by Topological Data Analysis, are used for feature engineering to capture local geometry of the time-series. Additionally, we test these feature pipelines' robustness to downsampling and data reduction. This paper seeks to establish clearer expectations for both time-series classification via geometric features, and how CNNs for time-series respond to data of degraded resolution.

Acute Lymphoblastic Leukemia Classification Using Persistent Homology (2024)

Waqar Hussain Shah, Abdullah Baloch, Rider Jaimes-Reátegui, Sohail Iqbal, Syeda Rafia Fatima, Alexander N. Pisarchik

Abstract

Acute Lymphoblastic Leukemia (ALL) is a prevalent form of childhood blood cancer characterized by the proliferation of immature white blood cells that rapidly replace normal cells in the bone marrow. The exponential growth of these leukemic cells can be fatal if not treated promptly. Classifying lymphoblasts and healthy cells poses a significant challenge, even for domain experts, due to their morphological similarities. Automated computer analysis of ALL can provide substantial support in this domain and potentially save numerous lives. In this paper, we propose a novel classification approach that involves analyzing shapes and extracting topological features of ALL cells. We employ persistent homology to capture these topological features. Our technique accurately and efficiently detects and classifies leukemia blast cells, achieving a recall of 98.2% and an F1-score of 94.6%. This approach has the potential to significantly enhance leukemia diagnosis and therapy.

Constructing Shape Spaces From a Topological Perspective (2017)

Christoph Hofer, Roland Kwitt, Marc Niethammer, Yvonne Höller, Eugen Trinka, Andreas Uhl

Abstract

We consider the task of constructing (metric) shape space(s) from a topological perspective. In particular, we present a generic construction scheme and demonstrate how to apply this scheme when shape is interpreted as the differences that remain after factoring out translation, scaling and rotation. This is achieved by leveraging a recently proposed injective functional transform of 2D/3D (binary) objects, based on persistent homology. The resulting shape space is then equipped with a similarity measure that is (1) by design robust to noise and (2) fulfills all metric axioms. From a practical point of view, analyses of object shape can then be carried out directly on segmented objects obtained from some imaging modality without any preprocessing, such as alignment, smoothing, or landmark selection. We demonstrate the utility of the approach on the problem of distinguishing segmented hippocampi from normal controls vs. patients with Alzheimer’s disease in a challenging setup where volume changes are no longer discriminative.

Tenfold Topology of Crystals (2020)

Eyal Cornfeld, Shachar Carmeli

Abstract

The celebrated tenfold-way of Altland-Zirnbauer symmetry classes discern any quantum system by its pattern of non-spatial symmetries. It lays at the core of the periodic table of topological insulators and superconductors which provided a complete classification of weakly-interacting electrons' non-crystalline topological phases for all symmetry classes. Over recent years, a plethora of topological phenomena with diverse surface states has been discovered in crystalline materials. In this paper, we obtain an exhaustive classification of topologically distinct groundstates as well as topological phases with anomalous surface states of crystalline topological insulators and superconductors for key space-groups, layer-groups, and rod-groups. This is done in a unified manner for the full tenfold-way of Altland-Zirnbauer non-spatial symmetry classes. We establish a comprehensive paradigm that harnesses the modern mathematical framework of equivariant spectra; it allows us to obtain results applicable to generic topological classification problems. In particular, this paradigm provides efficient computational tools that enable an inherently unified treatment of the full tenfold-way.

Topological Data Analysis on Simple English Wikipedia Articles (2020)

Matthew Wright, Xiaojun Zheng

Abstract

Single-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems, with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology, and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. These approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. We demonstrate the applicability of our methods by distinguishing certain subsets of the Wikipedia data, and by comparison with random data. Results include insights into the construction of null distributions and stability of our methods with respect to noisy data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter.

TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through Topology-Based Machine Learning (2021)

Parker Edwards, Kristen Skruber, Nikola Milićević, James B. Heidings, Tracy-Ann Read, Peter Bubenik, Eric A. Vitriol

Abstract

Recent advances in machine learning have greatly enhanced automatic methods to extract information from fluorescence microscopy data. However, current machine-learning-based models can require hundreds to thousands of images to train, and the most readily accessible models classify images without describing which parts of an image contributed to classification. Here, we introduce TDAExplore, a machine learning image analysis pipeline based on topological data analysis. It can classify different types of cellular perturbations after training with only 20–30 high-resolution images and performs robustly on images from multiple subjects and microscopy modes. Using only images and whole-image labels for training, TDAExplore provides quantitative, spatial information, characterizing which image regions contribute to classification. Computational requirements to train TDAExplore models are modest and a standard PC can perform training with minimal user input. TDAExplore is therefore an accessible, powerful option for obtaining quantitative information about imaging data in a wide variety of applications.

A Topological Machine Learning Pipeline for Classification (2022)

Francesco Conti, Davide Moroni, Maria Antonietta Pascali

Abstract

In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.

Topological Analysis of Gene Expression Arrays Identifies High Risk Molecular Subtypes in Breast Cancer (2012)

Javier Arsuaga, Nils A. Baas, Daniel DeWoskin, Hideaki Mizuno, Aleksandr Pankov, Catherine Park

Abstract

Genomic technologies measure thousands of molecular signals with the goal of understanding complex biological processes. In cancer these molecular signals have been used to characterize disease subtypes, signaling pathways and to identify subsets of patients with specific prognosis. However molecular signals for any disease type are so vast and complex that novel mathematical approaches are required for further analyses. Persistent and computational homology provide a new method for these analyses. In our previous work we presented a new homology-based supervised classification method to identify copy number aberrations from comparative genomic hybridization arrays. In this work we first propose a theoretical framework for our classification method and second we extend our analysis to gene expression data. We analyze a published breast cancer data set and find that that our method can distinguish most, but not all, different breast cancer subtypes. This result suggests that specific relationships between genes, captured by our algorithm, help distinguish between breast cancer subtypes. We propose that topological methods can be used for the classification and clustering of gene expression profiles.

Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

Arjuna P. H. Don, James F. Peters

Abstract

This article introduces an application of Ghrist barcodes in the study of persistent Betti numbers derived from vortex nerve complexes found in triangulations of video frames. A Ghrist barcode (also called a persistence barcode) is a topology of data pic- tograph useful in representing the persistence of the features of changing shapes. The basic approach is to introduce a free Abelian group representation of intersecting filled polygons on the barycenters of the triangles of Alexandroff nerves. An Alexandroff nerve is a maximal collection of triangles of a common vertex in the triangulation of a finite, bounded planar region. In our case, the planar region is a video frame. A Betti number is a count of the number of generators is a finite Abelian group. The focus here is on the persistent Betti numbers across sequences of triangulated video frames. Each Betti number is mapped to an entry in a Ghrist barcode. Two main results are given, namely, vortex nerves are Edelsbrunner-Harer nerve complexes and the Betti number of a vortex nerve equals k + 2 for a vortex nerve containing k edges attached between a pair of vortex cycles in the nerve.

Hepatic Tumor Classification Using Texture and Topology Analysis of Non-Contrast-Enhanced Three-Dimensional T1-Weighted MR Images With a Radiomics Approach (2019)

Asuka Oyama, Yasuaki Hiraoka, Ippei Obayashi, Yusuke Saikawa, Shigeru Furui, Kenshiro Shiraishi, Shinobu Kumagai, Tatsuya Hayashi, Jun’ichi Kotoku

Abstract

The purpose of this study is to evaluate the accuracy for classification of hepatic tumors by characterization of T1-weighted magnetic resonance (MR) images using two radiomics approaches with machine learning models: texture analysis and topological data analysis using persistent homology. This study assessed non-contrast-enhanced fat-suppressed three-dimensional (3D) T1-weighted images of 150 hepatic tumors. The lesions included 50 hepatocellular carcinomas (HCCs), 50 metastatic tumors (MTs), and 50 hepatic hemangiomas (HHs) found respectively in 37, 23, and 33 patients. For classification, texture features were calculated, and also persistence images of three types (degree 0, degree 1 and degree 2) were obtained for each lesion from the 3D MR imaging data. We used three classification models. In the classification of HCC and MT (resp. HCC and HH, HH and MT), we obtained accuracy of 92% (resp. 90%, 73%) by texture analysis, and the highest accuracy of 85% (resp. 84%, 74%) when degree 1 (resp. degree 1, degree 2) persistence images were used. Our methods using texture analysis or topological data analysis allow for classification of the three hepatic tumors with considerable accuracy, and thus might be useful when applied for computer-aided diagnosis with MR images.

Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

Zixuan Cang, Lin Mu, Guo-Wei Wei

Abstract

This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.

Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

Paolo Bajardi, Matteo Delfino, André Panisson, Giovanni Petri, Michele Tizzoni

Abstract

We analyse a large mobile phone activity dataset provided by Telecom Italia for the Telecom Big Data Challenge contest. The dataset reports the international country codes of every call/SMS made and received by mobile phone users in Milan, Italy, between November and December 2013, with a spatial resolution of about 200 meters. We first show that the observed spatial distribution of international codes well matches the distribution of international communities reported by official statistics, confirming the value of mobile phone data for demographic research. Next, we define an entropy function to measure the heterogeneity of the international phone activity in space and time. By comparing the entropy function to empirical data, we show that it can be used to identify the city’s hotspots, defined by the presence of points of interests. Eventually, we use the entropy function to characterize the spatial distribution of international communities in the city. Adopting a topological data analysis approach, we find that international mobile phone users exhibit some robust clustering patterns that correlate with basic socio-economic variables. Our results suggest that mobile phone records can be used in conjunction with topological data analysis tools to study the geography of migrant communities in a global city.

Community Resources

Code

PI-Net: A Deep Learning Approach to Extract Topological Persistence Images (2020)

Anirudh Som, Hongjun Choi, Karthikeyan Natesan Ramamurthy, Matthew Buman, Pavan Turaga

Abstract

Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in real-world data, such as view-point, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difﬁculty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PINet respectively. To the best of our knowledge, we are the ﬁrst to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PI-Net architectures on two applications: human activity recognition using tri-axial accelerometer sensor data and image classiﬁcation. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net.

Learning Representations of Persistence Barcodes (2019)

Christoph D. Hofer, Roland Kwitt, Marc Niethammer

Abstract

We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals.

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny

Abstract

While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

Hongxiao Wang, Gang Huang, Zhuo Zhao, Liang Cheng, Anna Juncker-Jensen, Máté Levente Nagy, Xin Lu, Xiangliang Zhang, Danny Z. Chen

Abstract

Pathology images contain rich information of cell appearance, microenvironment, and topology features for cancer analysis and diagnosis. Among such features, topology becomes increasingly important in analysis for cancer immunotherapy. By analyzing geometric and hierarchically structured cell distribution topology, oncologists can identify densely-packed and cancer-relevant cell communities (CCs) for making decisions. Compared to commonly-used pixel-level Convolution Neural Network (CNN) features and cell-instance-level Graph Neural Network (GNN) features, CC topology features are at a higher level of granularity and geometry. However, topological features have not been well exploited by recent deep learning (DL) methods for pathology image classification due to lack of effective topological descriptors for cell distribution and gathering patterns. In this paper, inspired by clinical practice, we analyze and classify pathology images by comprehensively learning cell appearance, microenvironment, and topology in a fine-to-coarse manner. To describe and exploit topology, we design Cell Community Forest (CCF), a novel graph that represents the hierarchical formulation process of big-sparse CCs from small-dense CCs. Using CCF as a new geometric topological descriptor of tumor cells in pathology images, we propose CCF-GNN, a GNN model that successively aggregates heterogeneous features (e.g., appearance, microenvironment) from cell-instance-level, cell-community-level, into image-level for pathology image classification. Extensive cross-validation experiments show that our method significantly outperforms alternative methods on H&E-stained; immunofluorescence images for disease grading tasks with multiple cancer types. Our proposed CCF-GNN establishes a new topological data analysis (TDA) based method, which facilitates integrating multi-level heterogeneous features of point clouds (e.g., for cells) into a unified DL framework.

🍩 Database of Original & Non-Theoretical Uses of Topology

Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)

Using Topological Data Analysis for Text Classification (2018)

Community Resources

Stochastic Multiresolution Persistent Homology Kernel. (2016)

Differentiable Euler Characteristic Transforms for Shape Classification (2023)

A Proof-of-Concept Investigation Into Predicting Follicular Carcinoma on Ultrasound Using Topological Data Analysis and Radiomics (2025)

Community Resources

Persistent Homology Machine Learning for Fingerprint Classification (2019)

Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)

Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)

Community Resources

Protein Classification With Improved Topological Data Analysis (2018)

Identification of Coenzyme-Binding Proteins With Machine Learning Algorithms (2019)

Reconstructing Linearly Embedded Graphs: A First Step to Stratified Space Learning (2021)

Community Resources

Deep Learning With Topological Signatures (2017)

TopoResNet: A Hybrid Deep Learning Architecture and Its Application to Skin Lesion Classification (2019)

Community Resources

Automatic Detection of Image Morphing by Topology-Based Analysis (2018)

Topological Approaches to Skin Disease Image Analysis (2018)

An Introduction to a New Text Classification and Visualization for Natural Language Processing Using Topological Data Analysis (2019)

Classifying RGB Images With Multi-Colour Persistent Homology (2019)

A Study on Topological Descriptors for the Analysis of 3D Surface Texture (2018)

Curvature Filtrations for Graph Generative Model Evaluation (2023)

Filtration Curves for Graph Representation (2021)

Graph Filtration Learning (2020)

Text Classification via Network Topology: A Case Study on the Holy Quran (2019)

A Persistent Weisfeiler-Lehman Procedure for Graph Classification (2019)

Filtration Surfaces for Dynamic Graph Classification (2023)

Topological Machine Learning With Persistence Indicator Functions (2019)

Topological Graph Neural Networks (2021)

A Stable Multi-Scale Kernel for Topological Machine Learning (2015)

Euler Characteristic Surfaces (2021)

A Topological Framework for Deep Learning (2020)

A Novel Method of Extracting Topological Features From Word Embeddings (2020)

Unsupervised Topological Learning for Identification of Atomic Structures (2022)

Simplicial Representation Learning With Neural \$K\$-Forms (2023)

Graph Classification via Heat Diffusion on Simplicial Complexes (2020)

Topological Attention for Time Series Forecasting (2021)

Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)

Acute Lymphoblastic Leukemia Classification Using Persistent Homology (2024)

Constructing Shape Spaces From a Topological Perspective (2017)

Tenfold Topology of Crystals (2020)

Topological Data Analysis on Simple English Wikipedia Articles (2020)

TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through Topology-Based Machine Learning (2021)

A Topological Machine Learning Pipeline for Classification (2022)

Topological Analysis of Gene Expression Arrays Identifies High Risk Molecular Subtypes in Breast Cancer (2012)

Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)

Hepatic Tumor Classification Using Texture and Topology Analysis of Non-Contrast-Enhanced Three-Dimensional T1-Weighted MR Images With a Radiomics Approach (2019)

Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)

Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

Community Resources

PI-Net: A Deep Learning Approach to Extract Topological Persistence Images (2020)

Learning Representations of Persistence Barcodes (2019)

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)