🍩 Database of Original & Non-Theoretical Uses of Topology
(found 41 matches in 0.008784s)
-
-
Persistent Homology Machine Learning for Fingerprint Classification (2019)
N. Giansiracusa, R. Giansiracusa, C. MoonAbstract
The fingerprint classification problem is to sort fingerprints into predetermined groups, such as arch, loop, and whorl. It was asserted in the literature that minutiae points, which are commonly used for fingerprint matching, are not useful for classification. We show that, to the contrary, near state-of-the-art classification accuracy rates can be achieved when applying topological data analysis (TDA) to 3-dimensional point clouds of oriented minutiae points. We also apply TDA to fingerprint ink-roll images, which yields a lower accuracy rate but still shows promise; moreover, combining the two approaches outperforms each one individually. These methods use supervised learning applied to persistent homology and allow us to explore feature selection on barcodes, an important topic at the interface between TDA and machine learning. We test our classification algorithms on the NIST fingerprint database SD-27. -
Multiresolution Persistent Homology for Excessively Large Biomolecular Datasets (2015)
Kelin Xia, Zhixiong Zhao, Guo-Wei WeiAbstract
Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs. -
Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)
Rayna Andreeva, Anwesha Sarkar, Rik SarkarAbstract
The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics. -
Protein Classification With Improved Topological Data Analysis (2018)
Tamal K. Dey, Sayan Mandal -
Identification of Coenzyme-Binding Proteins With Machine Learning Algorithms (2019)
Yong Liu, Cristian R. Munteanu, Zhiwei Kong, Tao Ran, Alfredo Sahagún-Ruiz, Zhixiong He, Chuanshe Zhou, Zhiliang Tan -
Reconstructing Linearly Embedded Graphs: A First Step to Stratified Space Learning (2021)
Yossi Bokor, Christopher Williams, Katharine TurnerCommunity Resources
-
Using Topological Data Analysis for Text Classification (2018)
Pratik Doshi -
Deep Learning With Topological Signatures (2017)
Christoph Hofer, Roland Kwitt, Marc Niethammer, Andreas Uhl -
Automatic Detection of Image Morphing by Topology-Based Analysis (2018)
Sabah Jassim, Aras Asaad -
Topological Approaches to Skin Disease Image Analysis (2018)
Yu-Min Chung, Chuan-Shen Hu, Austin Lawson, Clifford Smyth -
An Introduction to a New Text Classification and Visualization for Natural Language Processing Using Topological Data Analysis (2019)
Naiereh Elyasi, Mehdi Hosseini Moghadam -
Classifying RGB Images With Multi-Colour Persistent Homology (2019)
Wolf Byttner -
A Study on Topological Descriptors for the Analysis of 3D Surface Texture (2018)
Matthias Zeppelzauer, Bartosz Zieliński, Mateusz Juda, Markus Seidl -
Filtration Curves for Graph Representation (2021)
Leslie O'Bray, Bastian Rieck, Karsten BorgwardtAbstract
The two predominant approaches to graph comparison in recent years are based on (i) enumerating matching subgraphs or (ii) comparing neighborhoods of nodes. In this work, we complement these two perspectives with a third way of representing graphs: using filtration curves from topological data analysis that capture both edge weight information and global graph structure. Filtration curves are highly efficient to compute and lead to expressive representations of graphs, which we demonstrate on graph classification benchmark datasets. Our work opens the door to a new form of graph representation in data mining. -
Graph Filtration Learning (2020)
Christoph Hofer, Florian Graf, Bastian Rieck, Marc Niethammer, Roland KwittAbstract
We propose an approach to learning with graph-structured data in the problem domain of graph classification. In particular, we present a novel type of readout operation to aggregate node features into a graph-level representation. To this end, we leverage persistent homology computed via a real-valued, learnable, filter function. We establish the theoretical foundation for differentiating through the persistent homology computation. Empirically, we show that this type of readout operation compares favorably to previous techniques, especially when the graph connectivity structure is informative for the learning problem. -
Text Classification via Network Topology: A Case Study on the Holy Quran (2019)
Mehmet Emin Aktas, Esra AkbasAbstract
Due to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach. -
A Persistent Weisfeiler-Lehman Procedure for Graph Classification (2019)
Bastian Rieck, Christian Bock, Karsten BorgwardtAbstract
The Weisfeiler–Lehman graph kernel exhibits competitive performance in many graph classification tasks. However, its subtree features are not able to capture connected components and cycles, topological features known for characterising graphs. To extract such features, we leverage propagated node label information and transform unweighted graphs into metric ones. This permits us to augment the subtree features with topological information obtained using persistent homology, a concept from topological data analysis. Our method, which we formalise as a generalisation of Weisfeiler–Lehman subtree features, exhibits favourable classification accuracy and its improvements in predictive performance are mainly driven by including cycle information. -
Topological Graph Neural Networks (2021)
Max Horn, Edward De Brouwer, Michael Moor, Yves Moreau, Bastian Rieck, Karsten BorgwardtAbstract
Graph neural networks (GNNs) are a powerful architecture for tackling graph learning tasks, yet have been shown to be oblivious to eminent substructures, such as cycles. We present TOGL, a novel layer that incorporates global topological information of a graph using persistent homology. TOGL can be easily integrated into any type of GNN and is strictly more expressive in terms of the Weisfeiler--Lehman test of isomorphism. Augmenting GNNs with our layer leads to beneficial predictive performance, both on synthetic data sets, which can be trivially classified by humans but not by ordinary GNNs, and on real-world data. -
Topological Machine Learning With Persistence Indicator Functions (2019)
Bastian Rieck, Filip Sadlo, Heike LeitteAbstract
Techniques from computational topology, in particular persistent homology, are becoming increasingly relevant for data analysis. Their stable metrics permit the use of many distance-based data analysis methods, such as multidimensional scaling, while providing a firm theoretical ground. Many modern machine learning algorithms, however, are based on kernels. This paper presents persistence indicator functions (PIFs), which summarize persistence diagrams, i.e., feature descriptors in topological data analysis. PIFs can be calculated and compared in linear time and have many beneficial properties, such as the availability of a kernel-based similarity measure. We demonstrate their usage in common data analysis scenarios, such as confidence set estimation and classification of complex structured data. -
A Stable Multi-Scale Kernel for Topological Machine Learning (2015)
Jan Reininghaus, Stefan Huber, Ulrich Bauer, Roland KwittAbstract
Topological data analysis offers a rich source of valuable information to study vision problems. Yet, so far we lack a theoretically sound connection to popular kernel-based learning techniques, such as kernel SVMs or kernel PCA. In this work, we establish such a connection by designing a multi-scale kernel for persistence diagrams, a stable summary representation of topological features in data. We show that this kernel is positive definite and prove its stability with respect to the 1-Wasserstein distance. Experiments on two benchmark datasets for 3D shape classification/retrieval and texture recognition show considerable performance gains of the proposed method compared to an alternative approach that is based on the recently introduced persistence landscapes. -
Euler Characteristic Surfaces (2021)
Gabriele Beltramo, Rayna Andreeva, Ylenia Giarratano, Miguel O. Bernabeu, Rik Sarkar, Primoz SkrabaAbstract
We study the use of the Euler characteristic for multiparameter topological data analysis. Euler characteristic is a classical, well-understood topological invariant that has appeared in numerous applications, including in the context of random fields. The goal of this paper is to present the extension of using the Euler characteristic in higher-dimensional parameter spaces. While topological data analysis of higher-dimensional parameter spaces using stronger invariants such as homology continues to be the subject of intense research, Euler characteristic is more manageable theoretically and computationally, and this analysis can be seen as an important intermediary step in multi-parameter topological data analysis. We show the usefulness of the techniques using artificially generated examples, and a real-world application of detecting diabetic retinopathy in retinal images. -
A Topological Framework for Deep Learning (2020)
Mustafa Hajij, Kyle IstvanAbstract
We utilize classical facts from topology to show that the classification problem in machine learning is always solvable under very mild conditions. Furthermore, we show that a softmax classification network acts on an input topological space by a finite sequence of topological moves to achieve the classification task. Moreover, given a training dataset, we show how topological formalism can be used to suggest the appropriate architectural choices for neural networks designed to be trained as classifiers on the data. Finally, we show how the architecture of a neural network cannot be chosen independently from the shape of the underlying data. To demonstrate these results, we provide example datasets and show how they are acted upon by neural nets from this topological perspective. -
A Novel Method of Extracting Topological Features From Word Embeddings (2020)
Shafie Gholizadeh, Armin Seyeditabari, Wlodek ZadroznyAbstract
In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features. -
Graph Classification via Heat Diffusion on Simplicial Complexes (2020)
Mehmet Emin Aktas, Esra AkbasAbstract
In this paper, we study the graph classification problem in vertex-labeled graphs. Our main goal is to classify the graphs comparing their higher-order structures thanks to heat diffusion on their simplices. We first represent vertex-labeled graphs as simplex-weighted super-graphs. We then define the diffusion Frechet function over their simplices to encode the higher-order network topology and finally reach our goal by combining the function values with machine learning algorithms. Our experiments on real-world bioinformatics networks show that using diffusion Fr\éḩet function on simplices is promising in graph classification and more effective than the baseline methods. To the best of our knowledge, this paper is the first paper in the literature using heat diffusion on higher-dimensional simplices in a graph mining problem. We believe that our method can be extended to different graph mining domains, not only the graph classification problem. -
Topological Attention for Time Series Forecasting (2021)
Sebastian Zeng, Florian Graf, Christoph Hofer, Roland KwittAbstract
The problem of (point) forecasting univariate time series is considered. Most approaches, ranging from traditional statistical methods to recent learning-based techniques with neural networks, directly operate on raw time series observations. As an extension, we study whether local topological properties, as captured via persistent homology, can serve as a reliable signal that provides complementary information for learning to forecast. To this end, we propose topological attention, which allows attending to local topological features within a time horizon of historical data. Our approach easily integrates into existing end-to-end trainable forecasting models, such as N-BEATS, and, in combination with the latter exhibits state-of-the-art performance on the large-scale M4 benchmark dataset of 100,000 diverse time series from different domains. Ablation experiments, as well as a comparison to recent techniques in a setting where only a single time series is available for training, corroborate the beneficial nature of including local topological information through an attention mechanism. -
Geometric Feature Performance Under Downsampling for EEG Classification Tasks (2021)
Bryan Bischof, Eric BunchAbstract
We experimentally investigate a collection of feature engineering pipelines for use with a CNN for classifying eyes-open or eyes-closed from electroencephalogram (EEG) time-series from the Bonn dataset. Using the Takens' embedding--a geometric representation of time-series--we construct simplicial complexes from EEG data. We then compare \$\epsilon\$-series of Betti-numbers and \$\epsilon\$-series of graph spectra (a novel construction)--two topological invariants of the latent geometry from these complexes--to raw time series of the EEG to fill in a gap in the literature for benchmarking. These methods, inspired by Topological Data Analysis, are used for feature engineering to capture local geometry of the time-series. Additionally, we test these feature pipelines' robustness to downsampling and data reduction. This paper seeks to establish clearer expectations for both time-series classification via geometric features, and how CNNs for time-series respond to data of degraded resolution. -
Acute Lymphoblastic Leukemia Classification Using Persistent Homology (2024)
Waqar Hussain Shah, Abdullah Baloch, Rider Jaimes-Reátegui, Sohail Iqbal, Syeda Rafia Fatima, Alexander N. PisarchikAbstract
Acute Lymphoblastic Leukemia (ALL) is a prevalent form of childhood blood cancer characterized by the proliferation of immature white blood cells that rapidly replace normal cells in the bone marrow. The exponential growth of these leukemic cells can be fatal if not treated promptly. Classifying lymphoblasts and healthy cells poses a significant challenge, even for domain experts, due to their morphological similarities. Automated computer analysis of ALL can provide substantial support in this domain and potentially save numerous lives. In this paper, we propose a novel classification approach that involves analyzing shapes and extracting topological features of ALL cells. We employ persistent homology to capture these topological features. Our technique accurately and efficiently detects and classifies leukemia blast cells, achieving a recall of 98.2% and an F1-score of 94.6%. This approach has the potential to significantly enhance leukemia diagnosis and therapy. -
Constructing Shape Spaces From a Topological Perspective (2017)
Christoph Hofer, Roland Kwitt, Marc Niethammer, Yvonne Höller, Eugen Trinka, Andreas UhlAbstract
We consider the task of constructing (metric) shape space(s) from a topological perspective. In particular, we present a generic construction scheme and demonstrate how to apply this scheme when shape is interpreted as the differences that remain after factoring out translation, scaling and rotation. This is achieved by leveraging a recently proposed injective functional transform of 2D/3D (binary) objects, based on persistent homology. The resulting shape space is then equipped with a similarity measure that is (1) by design robust to noise and (2) fulfills all metric axioms. From a practical point of view, analyses of object shape can then be carried out directly on segmented objects obtained from some imaging modality without any preprocessing, such as alignment, smoothing, or landmark selection. We demonstrate the utility of the approach on the problem of distinguishing segmented hippocampi from normal controls vs. patients with Alzheimer’s disease in a challenging setup where volume changes are no longer discriminative. -
Tenfold Topology of Crystals (2020)
Eyal Cornfeld, Shachar CarmeliAbstract
The celebrated tenfold-way of Altland-Zirnbauer symmetry classes discern any quantum system by its pattern of non-spatial symmetries. It lays at the core of the periodic table of topological insulators and superconductors which provided a complete classification of weakly-interacting electrons' non-crystalline topological phases for all symmetry classes. Over recent years, a plethora of topological phenomena with diverse surface states has been discovered in crystalline materials. In this paper, we obtain an exhaustive classification of topologically distinct groundstates as well as topological phases with anomalous surface states of crystalline topological insulators and superconductors for key space-groups, layer-groups, and rod-groups. This is done in a unified manner for the full tenfold-way of Altland-Zirnbauer non-spatial symmetry classes. We establish a comprehensive paradigm that harnesses the modern mathematical framework of equivariant spectra; it allows us to obtain results applicable to generic topological classification problems. In particular, this paradigm provides efficient computational tools that enable an inherently unified treatment of the full tenfold-way. -
Topological Data Analysis on Simple English Wikipedia Articles (2020)
Matthew Wright, Xiaojun ZhengAbstract
Single-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems, with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology, and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. These approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. We demonstrate the applicability of our methods by distinguishing certain subsets of the Wikipedia data, and by comparison with random data. Results include insights into the construction of null distributions and stability of our methods with respect to noisy data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter. -
TDAExplore: Quantitative Analysis of Fluorescence Microscopy Images Through Topology-Based Machine Learning (2021)
Parker Edwards, Kristen Skruber, Nikola Milićević, James B. Heidings, Tracy-Ann Read, Peter Bubenik, Eric A. VitriolAbstract
Recent advances in machine learning have greatly enhanced automatic methods to extract information from fluorescence microscopy data. However, current machine-learning-based models can require hundreds to thousands of images to train, and the most readily accessible models classify images without describing which parts of an image contributed to classification. Here, we introduce TDAExplore, a machine learning image analysis pipeline based on topological data analysis. It can classify different types of cellular perturbations after training with only 20–30 high-resolution images and performs robustly on images from multiple subjects and microscopy modes. Using only images and whole-image labels for training, TDAExplore provides quantitative, spatial information, characterizing which image regions contribute to classification. Computational requirements to train TDAExplore models are modest and a standard PC can perform training with minimal user input. TDAExplore is therefore an accessible, powerful option for obtaining quantitative information about imaging data in a wide variety of applications. -
A Topological Machine Learning Pipeline for Classification (2022)
Francesco Conti, Davide Moroni, Maria Antonietta PascaliAbstract
In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another. -
Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)
Shafie Gholizadeh, Armin Seyeditabari, Wlodek ZadroznyAbstract
Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification. -
Topological Analysis of Gene Expression Arrays Identifies High Risk Molecular Subtypes in Breast Cancer (2012)
Javier Arsuaga, Nils A. Baas, Daniel DeWoskin, Hideaki Mizuno, Aleksandr Pankov, Catherine ParkAbstract
Genomic technologies measure thousands of molecular signals with the goal of understanding complex biological processes. In cancer these molecular signals have been used to characterize disease subtypes, signaling pathways and to identify subsets of patients with specific prognosis. However molecular signals for any disease type are so vast and complex that novel mathematical approaches are required for further analyses. Persistent and computational homology provide a new method for these analyses. In our previous work we presented a new homology-based supervised classification method to identify copy number aberrations from comparative genomic hybridization arrays. In this work we first propose a theoretical framework for our classification method and second we extend our analysis to gene expression data. We analyze a published breast cancer data set and find that that our method can distinguish most, but not all, different breast cancer subtypes. This result suggests that specific relationships between genes, captured by our algorithm, help distinguish between breast cancer subtypes. We propose that topological methods can be used for the classification and clustering of gene expression profiles. -
Ghrist Barcoded Video Frames. Application in Detecting Persistent Visual Scene Surface Shapes Captured in Videos (2019)
Arjuna P. H. Don, James F. PetersAbstract
This article introduces an application of Ghrist barcodes in the study of persistent Betti numbers derived from vortex nerve complexes found in triangulations of video frames. A Ghrist barcode (also called a persistence barcode) is a topology of data pic- tograph useful in representing the persistence of the features of changing shapes. The basic approach is to introduce a free Abelian group representation of intersecting filled polygons on the barycenters of the triangles of Alexandroff nerves. An Alexandroff nerve is a maximal collection of triangles of a common vertex in the triangulation of a finite, bounded planar region. In our case, the planar region is a video frame. A Betti number is a count of the number of generators is a finite Abelian group. The focus here is on the persistent Betti numbers across sequences of triangulated video frames. Each Betti number is mapped to an entry in a Ghrist barcode. Two main results are given, namely, vortex nerves are Edelsbrunner-Harer nerve complexes and the Betti number of a vortex nerve equals k + 2 for a vortex nerve containing k edges attached between a pair of vortex cycles in the nerve. -
Representability of Algebraic Topology for Biomolecules in Machine Learning Based Scoring and Virtual Screening (2018)
Zixuan Cang, Lin Mu, Guo-Wei WeiAbstract
This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination. -
PI-Net: A Deep Learning Approach to Extract Topological Persistence Images (2020)
Anirudh Som, Hongjun Choi, Karthikeyan Natesan Ramamurthy, Matthew Buman, Pavan TuragaAbstract
Topological features such as persistence diagrams and their functional approximations like persistence images (PIs) have been showing substantial promise for machine learning and computer vision applications. This is greatly attributed to the robustness topological representations provide against different types of physical nuisance variables seen in real-world data, such as view-point, illumination, and more. However, key bottlenecks to their large scale adoption are computational expenditure and difficulty incorporating them in a differentiable architecture. We take an important step in this paper to mitigate these bottlenecks by proposing a novel one-step approach to generate PIs directly from the input data. We design two separate convolutional neural network architectures, one designed to take in multi-variate time series signals as input and another that accepts multi-channel images as input. We call these networks Signal PI-Net and Image PINet respectively. To the best of our knowledge, we are the first to propose the use of deep learning for computing topological features directly from data. We explore the use of the proposed PI-Net architectures on two applications: human activity recognition using tri-axial accelerometer sensor data and image classification. We demonstrate the ease of fusion of PIs in supervised deep learning architectures and speed up of several orders of magnitude for extracting PIs from data. Our code is available at https://github.com/anirudhsom/PI-Net. -
Learning Representations of Persistence Barcodes (2019)
Christoph D. Hofer, Roland Kwitt, Marc NiethammerAbstract
We consider the problem of supervised learning with summary representations of topological features in data. In particular, we focus on persistent homology, the prevalent tool used in topological data analysis. As the summary representations, referred to as barcodes or persistence diagrams, come in the unusual format of multi sets, equipped with computationally expensive metrics, they can not readily be processed with conventional learning techniques. While different approaches to address this problem have been proposed, either in the context of kernel-based learning, or via carefully designed vectorization techniques, it remains an open problem how to leverage advances in representation learning via deep neural networks. Appropriately handling topological summaries as input to neural networks would address the disadvantage of previous strategies which handle this type of data in a task-agnostic manner. In particular, we propose an approach that is designed to learn a task-specific representation of barcodes. In other words, we aim to learn a representation that adapts to the learning problem while, at the same time, preserving theoretical properties (such as stability). This is done by projecting barcodes into a finite dimensional vector space using a collection of parametrized functionals, so called structure elements, for which we provide a generic construction scheme. A theoretical analysis of this approach reveals sufficient conditions to preserve stability, and also shows that different choices of structure elements lead to great differences with respect to their suitability for numerical optimization. When implemented as a neural network input layer, our approach demonstrates compelling performance on various types of problems, including graph classification and eigenvalue prediction, the classification of 2D/3D object shapes and recognizing activities from EEG signals. -
Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)
Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek ZadroznyAbstract
While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier. -
CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)
Hongxiao Wang, Gang Huang, Zhuo Zhao, Liang Cheng, Anna Juncker-Jensen, Máté Levente Nagy, Xin Lu, Xiangliang Zhang, Danny Z. ChenAbstract
Pathology images contain rich information of cell appearance, microenvironment, and topology features for cancer analysis and diagnosis. Among such features, topology becomes increasingly important in analysis for cancer immunotherapy. By analyzing geometric and hierarchically structured cell distribution topology, oncologists can identify densely-packed and cancer-relevant cell communities (CCs) for making decisions. Compared to commonly-used pixel-level Convolution Neural Network (CNN) features and cell-instance-level Graph Neural Network (GNN) features, CC topology features are at a higher level of granularity and geometry. However, topological features have not been well exploited by recent deep learning (DL) methods for pathology image classification due to lack of effective topological descriptors for cell distribution and gathering patterns. In this paper, inspired by clinical practice, we analyze and classify pathology images by comprehensively learning cell appearance, microenvironment, and topology in a fine-to-coarse manner. To describe and exploit topology, we design Cell Community Forest (CCF), a novel graph that represents the hierarchical formulation process of big-sparse CCs from small-dense CCs. Using CCF as a new geometric topological descriptor of tumor cells in pathology images, we propose CCF-GNN, a GNN model that successively aggregates heterogeneous features (e.g., appearance, microenvironment) from cell-instance-level, cell-community-level, into image-level for pathology image classification. Extensive cross-validation experiments show that our method significantly outperforms alternative methods on H&E-stained; immunofluorescence images for disease grading tasks with multiple cancer types. Our proposed CCF-GNN establishes a new topological data analysis (TDA) based method, which facilitates integrating multi-level heterogeneous features of point clouds (e.g., for cells) into a unified DL framework.