A Topological Analysis of the Space of Recipes (2025)

Emerson G. Escolar, Yuta Shimada, Masahiro Yuasa

Abstract

In recent years, the use of data-driven methods has provided insights into underlying patterns and principles behind culinary recipes. In this exploratory work, we introduce the use of topological data analysis, especially persistent homology, in order to study the space of culinary recipes. In particular, persistent homology analysis provides a set of recipes surrounding the multiscale “holes” in the space of existing recipes. We then propose a method to generate novel ingredient combinations using combinatorial optimization on this topological information. We made biscuits using the novel ingredient combinations, which were confirmed to be acceptable enough by a sensory evaluation study. Our findings indicate that topological data analysis has the potential for providing new tools and insights in the study of culinary recipes.

Community Resources

Code

A Novel Method of Extracting Topological Features From Word Embeddings (2020)

Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny

Abstract

In recent years, topological data analysis has been utilized for a wide range of problems to deal with high dimensional noisy data. While text representations are often high dimensional and noisy, there are only a few work on the application of topological data analysis in natural language processing. In this paper, we introduce a novel algorithm to extract topological features from word embedding representation of text that can be used for text classification. Working on word embeddings, topological data analysis can interpret the embedding high-dimensional space and discover the relations among different embedding dimensions. We will use persistent homology, the most commonly tool from topological data analysis, for our experiment. Examining our topological algorithm on long textual documents, we will show our defined topological features may outperform conventional text mining features.

The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis (2024)

Ondřej Draganov, Steven Skiena

Abstract

Word embeddings represent language vocabularies as clouds of d-dimensional points. We investigate how information is conveyed by the general shape of these clouds, instead of representing the semantic meaning of each token. Specifically, we use the notion of persistent homology from topological data analysis (TDA) to measure the distances between language pairs from the shape of their unlabeled embeddings. These distances quantify the degree of non-isometry of the embeddings. To distinguish whether these differences are random training errors or capture real information about the languages, we use the computed distance matrices to construct language phylogenetic trees over 81 Indo-European languages. Careful evaluation shows that our reconstructed trees exhibit strong and statistically-significant similarities to the reference.

Community Resources

Code
Data

Topic Detection in Twitter Using Topology Data Analysis (2015)

Pablo Torres-Tramón, Hugo Hromic, Bahareh Rahmanzadeh Heravi

Abstract

The massive volume of content generated by social media greatly exceeds human capacity to manually process this data in order to identify topics of interest. As a solution, various automated topic detection approaches have been proposed, most of which are based on document clustering and burst detection. These approaches normally represent textual features in standard n-dimensional Euclidean metric spaces. However, in these cases, directly filtering noisy documents is challenging for topic detection. Instead we propose Topol, a topic detection method based on Topology Data Analysis (TDA) that transforms the Euclidean feature space into a topological space where the shapes of noisy irrelevant documents are much easier to distinguish from topically-relevant documents. This topological space is organised in a network according to the connectivity of the points, i.e. the documents, and by only filtering based on the size of the connected components we obtain competitive results compared to other state of the art topic detection methods.

Topological Data Analysis on Simple English Wikipedia Articles (2020)

Matthew Wright, Xiaojun Zheng

Abstract

Single-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems, with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology, and we demonstrate the applicability of these approaches on high-dimensional point-cloud data obtained from Simple English Wikipedia articles. These approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. We demonstrate the applicability of our methods by distinguishing certain subsets of the Wikipedia data, and by comparison with random data. Results include insights into the construction of null distributions and stability of our methods with respect to noisy data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter.

Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology (2021)

David Pérez Fernández, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas

Abstract

Artificial Neural Networks (ANNs) are widely used for approximating complex functions. The process that is usually followed to define the most appropriate architecture for an ANN given a specific function is mostly empirical. Once this architecture has been defined, weights are usually optimized according to the error function. On the other hand, we observe that ANNs can be represented as graphs and their topological 'fingerprints' can be obtained using Persistent Homology (PH). In this paper, we describe a proposal focused on designing more principled architecture search procedures. To do this, different architectures for solving problems related to a heterogeneous set of datasets have been analyzed. The results of the evaluation corroborate that PH effectively characterizes the ANN invariants: when ANN density (layers and neurons) or sample feeding order is the only difference, PH topological invariants appear; in the opposite direction in different sub-problems (i.e. different labels), PH varies. This approach based on topological analysis helps towards the goal of designing more principled architecture search procedures and having a better understanding of ANNs.

A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)

Xiaoyu Zhang, Takanori Fujiwara, Senthil Chandrasegaran, Michael P. Brundage, Thurston Sexton, Alden Dima, Kwan-Liu Ma

Abstract

Analysis of large, high-dimensional, and heterogeneous datasets is challenging as no one technique is suitable for visualizing and clustering such data in order to make sense of the underlying information. For instance, heterogeneous logs detailing machine repair and maintenance in an organization often need to be analyzed to diagnose errors and identify abnormal patterns, formalize root-cause analyses, and plan preventive maintenance. Such real-world datasets are also beset by issues such as inconsistent and/or missing entries. To conduct an effective diagnosis, it is important to extract and understand patterns from the data with support from analytic algorithms (e.g., finding that certain kinds of machine complaints occur more in the summer) while involving the human-in-the-loop. To address these challenges, we adopt existing techniques for dimensionality reduction (DR) and clustering of numerical, categorical, and text data dimensions, and introduce a visual analytics approach that uses multiple coordinated views to connect DR + clustering results across each kind of the data dimension stated. To help analysts label the clusters, each clustering view is supplemented with techniques and visualizations that contrast a cluster of interest with the rest of the dataset. Our approach assists analysts to make sense of machine maintenance logs and their errors. Then the gained insights help them carry out preventive maintenance. We illustrate and evaluate our approach through use cases and expert studies respectively, and discuss generalization of the approach to other heterogeneous data.

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)

Shafie Gholizadeh, Ketki Savle, Armin Seyeditabari, Wlodek Zadrozny

Abstract

While the strength of Topological Data Analysis has been explored in many studies on high dimensional numeric data, it is still a challenging task to apply it to text. As the primary goal in topological data analysis is to define and quantify the shapes in numeric data, defining shapes in the text is much more challenging, even though the geometries of vector spaces and conceptual spaces are clearly relevant for information retrieval and semantics. In this paper, we examine two different methods of extraction of topological features from text, using as the underlying representations of words the two most popular methods, namely word embeddings and TF-IDF vectors. To extract topological features from the word embedding space, we interpret the embedding of a text document as high dimensional time series, and we analyze the topology of the underlying graph where the vertices correspond to different embedding dimensions. For topological data analysis with the TF-IDF representations, we analyze the topology of the graph whose vertices come from the TF-IDF vectors of different blocks in the textual document. In both cases, we apply homological persistence to reveal the geometric structures under different distance resolutions. Our results show that these topological features carry some exclusive information that is not captured by conventional text mining methods. In our experiments we observe adding topological features to the conventional features in ensemble models improves the classification results (up to 5\%). On the other hand, as expected, topological features by themselves may be not sufficient for effective classification. It is an open problem to see whether TDA features from word embeddings might be sufficient, as they seem to perform within a range of few points from top results obtained with a linear support vector classifier.

🍩 Database of Original & Non-Theoretical Uses of Topology

A Topological Analysis of the Space of Recipes (2025)

Community Resources

A Novel Method of Extracting Topological Features From Word Embeddings (2020)

The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis (2024)

Community Resources

Topic Detection in Twitter Using Topology Data Analysis (2015)

Topological Data Analysis on Simple English Wikipedia Articles (2020)

Determining Structural Properties of Artificial Neural Networks Using Algebraic Topology (2021)

A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data (2021)

Topological Data Analysis in Text Classification: Extracting Features With Additive Information (2020)