🍩 Database of Original & Non-Theoretical Uses of Topology
(found 6 matches in 0.001444s)
Topological Data Analysis on Simple English Wikipedia Articles (2020)Matthew Wright, Xiaojun Zheng
AbstractSingle-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems, with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology, and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. These approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. We demonstrate the applicability of our methods by distinguishing certain subsets of the Wikipedia data, and by comparison with random data. Results include insights into the construction of null distributions and stability of our methods with respect to noisy data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter.
Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny
AbstractWhile the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.
Text Classification via Network Topology: A Case Study on the Holy Quran (2019)Mehmet Emin Aktas, Esra Akbas
AbstractDue to the growth in the number of texts and documents available online, machine learning based text classification systems are getting more popular recently. Feature extraction, converting unstructured text into a structured feature space, is one of the essential tasks for text classification. In this paper, we propose a novel feature extraction approach for text classification using the network representation of text, network topology, and machine learning techniques. We present experimental results on classifying the Holy Quran chapters based on the place each chapter was revealed to illustrate the effectiveness of the approach.
An Introduction to a New Text Classification and Visualization for Natural Language Processing Using Topological Data Analysis (2019)Naiereh Elyasi, Mehdi Hosseini Moghadam
Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
AbstractTopological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.