🍩 Database of Original & Non-Theoretical Uses of Topology

(found 3 matches in 0.001345s)
  1. Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining (2018)

    Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny
    Abstract Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textual document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textual documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.
  2. The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis (2024)

    Ondřej Draganov, Steven Skiena
    Abstract Word embeddings represent language vocabularies as clouds of d-dimensional points. We investigate how information is conveyed by the general shape of these clouds, instead of representing the semantic meaning of each token. Specifically, we use the notion of persistent homology from topological data analysis (TDA) to measure the distances between language pairs from the shape of their unlabeled embeddings. These distances quantify the degree of non-isometry of the embeddings. To distinguish whether these differences are random training errors or capture real information about the languages, we use the computed distance matrices to construct language phylogenetic trees over 81 Indo-European languages. Careful evaluation shows that our reconstructed trees exhibit strong and statistically-significant similarities to the reference.

    Community Resources

  3. Knowledge Gaps in the Early Growth of Semantic Feature Networks (2018)

    Ann E. Sizemore, Elisabeth A. Karuza, Chad Giusti, Danielle S. Bassett
    Abstract Understanding language learning and more general knowledge acquisition requires the characterization of inherently qualitative structures. Recent work has applied network science to this task by creating semantic feature networks, in which words correspond to nodes and connections correspond to shared features, and then by characterizing the structure of strongly interrelated groups of words. However, the importance of sparse portions of the semantic network—knowledge gaps—remains unexplored. Using applied topology, we query the prevalence of knowledge gaps, which we propose manifest as cavities in the growing semantic feature network of toddlers. We detect topological cavities of multiple dimensions and find that, despite word order variation, the global organization remains similar. We also show that nodal network measures correlate with filling cavities better than basic lexical properties. Finally, we discuss the importance of semantic feature network topology in language learning and speculate that the progression through knowledge gaps may be a robust feature of knowledge acquisition.