HiDeF: Identifying Persistent Structures in Multiscale ‘Omics Data (2021)

Fan Zheng, She Zhang, Christopher Churas, Dexter Pratt, Ivet Bahar, Trey Ideker

Abstract

In any ‘omics study, the scale of analysis can dramatically affect the outcome. For instance, when clustering single-cell transcriptomes, is the analysis tuned to discover broad or specific cell types? Likewise, protein communities revealed from protein networks can vary widely in sizes depending on the method. Here, we use the concept of persistent homology, drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. Application to mouse single-cell transcriptomes significantly expands the catalog of identified cell types, while analysis of SARS-COV-2 protein interactions suggests hijacking of WNT. The method, HiDeF, is available via Python and Cytoscape.

When Remote Sensing Meets Topological Data Analysis (2018)

Ludovic Duponchel

Abstract

Author Summary: Hyperspectral remote sensing plays an increasingly important role in many scientific domains and everyday life problems. Indeed, this imaging concept ends up in applications as varied as catching tax-evaders red-handed by locating new construction and building alterations, searching for aircraft and saving lives after fatal crashes, detecting oil spills for marine life and environmental preservation, spying on enemies with reconnaissance satellites, watching algae grow as an indicator of environmental health, forecasting weather to warn about natural disasters and much more. From an instrumental point of view, we can say that the actual spectrometers have rather good characteristics, even if we can always increase spatial resolution and spectral range. In order to extract ever more information from such experiments and develop new applications, we must, therefore, propose multivariate data analysis tools able to capture the shape of data sets and their specific features. Nevertheless, actual methods often impose a data model which implicitly defines the geometry of the data set. The aim of the paper is thus to introduce the concept of topological data analysis in the framework of remote sensing, making no assumptions about the global shape of the data set, but also allowing the capture of its local features.

Community Resources

Code
Data

A Multimodal Data Analysis Approach for Targeted Drug Discovery Involving Topological Data Analysis (TDA) (2016)

Muthuraman Alagappan, Dadi Jiang, Nicholas Denko, Albert C. Koong

Export citation

Topological Data Analysis Quantifies Biological Nano-Structure From Single Molecule Localization Microscopy (2020)

Jeremy A. Pike, Abdullah O. Khan, Chiara Pallini, Steven G. Thomas, Markus Mund, Jonas Ries, Natalie S. Poulter, Iain B. Styles

Abstract

AbstractMotivation. Localization microscopy data is represented by a set of spatial coordinates, each corresponding to a single detection, that form a point cl

Specific Mutations in H5N1 Mainly Impact the Magnitude and Velocity of the Host Response in Mice (2013)

Nicolas Tchitchek, Amie J. Eisfeld, Jennifer Tisoncik-Go, Laurence Josset, Lisa E. Gralinski, Christophe Bécavin, Susan C. Tilton, Bobbie-Jo Webb-Robertson, Martin T. Ferris, Allison L. Totura

Export citation

A Personality Trait Contributes to the Occurrence of Postoperative Delirium: A Prospective Study (2016)

Jung Eun Shin, Sunghyon Kyeong, Jong-Seok Lee, Jin Young Park, Woo Suk Lee, Jae-Jin Kim, Kyu Hyun Yang

Exploring Hyperspectral Imaging Data Sets With Topological Data Analysis (2017)

Ludovic Duponchel

Export citation

A Severe Asthma Disease Signature From Gene Expression Profiling of Peripheral Blood From U-Biopred Cohorts (2017)

Jeannette Bigler, Michael Boedigheimer, James PR Schofield, Paul J. Skipp, Julie Corfield, Anthony Rowe, Ana R. Sousa, Martin Timour, Lori Twehues, Xuguang Hu

Export citation

Networked Data Analytics: Network Comparison and Applied Graph Signal Processing (2018)

Weiyu Huang

Disrupted Resting State Network of Fibromyalgia in Theta Frequency (2018)

Mi Kyung Choe, Manyoel Lim, June Sic Kim, Dong Soo Lee, Chun Kee Chung

Export citation

Interdisciplinary Approaches to Automated Obstructive Sleep Apnea Diagnosis Through High-Dimensional Multiple Scaled Data Analysis (2019)

Giseon Heo, Kathryn Leonard, Xu Wang, Yi Zhou

Export citation

Hierarchical Clustering and Zeroth Persistent Homology (2020)

İsmail Güzel, Atabey Kaygun

Abstract

In this article, we show that hierarchical clustering and the zeroth persistent homology do deliver the same topological information about a given data set. We show this fact using cophenetic matrices constructed out of the filtered Vietoris-Rips complex of the data set at hand. As in any cophenetic matrix, one can also display the inter-relations of zeroth homology classes via a rooted tree, also known as a dendogram. Since homological cophenetic matrices can be calculated for higher homologies, one can also sketch similar dendograms for higher persistent homology classes.

Topographical Transcriptome Mapping of the Mouse Medial Ganglionic Eminence by Spatially Resolved RNA-seq (2014)

Sabrina Zechel, Pawel Zajac, Peter Lönnerberg, Carlos F. Ibáñez, Sten Linnarsson

Abstract

Cortical interneurons originating from the medial ganglionic eminence, MGE, are among the most diverse cells within the CNS. Different pools of proliferating progenitor cells are thought to exist in the ventricular zone of the MGE, but whether the underlying subventricular and mantle regions of the MGE are spatially patterned has not yet been addressed. Here, we combined laser-capture microdissection and multiplex RNA-sequencing to map the transcriptome of MGE cells at a spatial resolution of 50 μm.

Spatial Embedding Imposes Constraints on Neuronal Network Architectures (2018)

Jennifer Stiso, Danielle S. Bassett

Abstract

Recent progress towards understanding circuit function has capitalized on tools from network science to parsimoniously describe the spatiotemporal architecture of neural systems. Such tools often address systems topology divorced from its physical instantiation. Nevertheless, for embedded systems such as the brain, physical laws directly constrain the processes of network growth, development, and function. We review here the rules imposed by the space and volume of the brain on the development of neuronal networks, and show that these rules give rise to a specific set of complex topologies. These rules also affect the repertoire of neural dynamics that can emerge from the system, and thereby inform our understanding of network dysfunction in disease. We close by discussing new tools and models to delineate the effects of spatial embedding.

The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis (2024)

Ondřej Draganov, Steven Skiena

Abstract

Word embeddings represent language vocabularies as clouds of d-dimensional points. We investigate how information is conveyed by the general shape of these clouds, instead of representing the semantic meaning of each token. Specifically, we use the notion of persistent homology from topological data analysis (TDA) to measure the distances between language pairs from the shape of their unlabeled embeddings. These distances quantify the degree of non-isometry of the embeddings. To distinguish whether these differences are random training errors or capture real information about the languages, we use the computed distance matrices to construct language phylogenetic trees over 81 Indo-European languages. Careful evaluation shows that our reconstructed trees exhibit strong and statistically-significant similarities to the reference.

Community Resources

Code
Data

Using Persistent Homology as a New Approach for Super-Resolution Localization Microscopy Data Analysis and Classification of γH2AX Foci/Clusters (2018)

Andreas Hofmann, Matthias Krufczik, Dieter W. Heermann, Michael Hausmann

Abstract

DNA double strand breaks (DSB) are the most severe damages in chromatin induced by ionizing radiation. In response to such environmentally determined stress situations, cells have developed repair mechanisms. Although many investigations have contributed to a detailed understanding of repair processes, e.g., homologous recombination repair or non-homologous end-joining, the question is not sufficiently answered, how a cell decides to apply a certain repair process at a certain damage site, since all different repair pathways could simultaneously occur in the same cell nucleus. One of the first processes after DSB induction is phosphorylation of the histone variant H2AX to γH2AX in the given surroundings of the damaged locus. Since the spatial organization of chromatin is not random, it may be conclusive that the spatial organization of γH2AX foci is also not random, and rather, contributes to accessibility of special repair proteins to the damaged site, and thus, to the following repair pathway at this given site. The aim of this article is to demonstrate a new approach to analyze repair foci by their topology in order to obtain a cell independent method of categorization. During the last decade, novel super-resolution fluorescence light microscopic techniques have enabled new insights into genome structure and spatial organization on the nano-scale in the order of 10 nm. One of these techniques is single molecule localization microscopy (SMLM) with which the spatial coordinates of single fluorescence molecules can precisely be determined and density and distance distributions can be calculated. This method is an appropriate tool to quantify complex changes of chromatin and to describe repair foci on the single molecule level. Based on the pointillist information obtained by SMLM from specifically labeled heterochromatin and γH2AX foci reflecting the chromatin morphology and repair foci topology, we have developed a new analytical methodology of foci or foci cluster characterization, respectively, by means of persistence homology. This method allows, for the first time, a cell independent comparison of two point distributions (here the point distributions of two γH2AX clusters) with each other of a selected ensample and to give a mathematical measure of their similarity. In order to demonstrate the feasibility of this approach, cells were irradiated by low LET (linear energy transfer) radiation with different doses and the heterochromatin and γH2AX foci were fluorescently labeled by antibodies for SMLM. By means of our new analysis method, we were able to show that the topology of clusters of γH2AX foci can be categorized depending on the distance to heterochromatin. This method opens up new possibilities to categorize spatial organization of point patterns by parameterization of topological similarity.

Topological Data Analysis of Contagion Maps for Examining Spreading Processes on Networks (2015)

Dane Taylor, Florian Klimm, Heather A. Harrington, Miroslav Kramár, Konstantin Mischaikow, Mason A. Porter, Peter J. Mucha

Abstract

Social and biological contagions are influenced by the spatial embeddedness of networks. Historically, many epidemics spread as a wave across part of the Earth’s surface; however, in modern contagions long-range edges—for example, due to airline transportation or communication media—allow clusters of a contagion to appear in distant locations. Here we study the spread of contagions on networks through a methodology grounded in topological data analysis and nonlinear dimension reduction. We construct ‘contagion maps’ that use multiple contagions on a network to map the nodes as a point cloud. By analysing the topology, geometry and dimensionality of manifold structure in such point clouds, we reveal insights to aid in the modelling, forecast and control of spreading processes. Our approach highlights contagion maps also as a viable tool for inferring low-dimensional structure in networks.

Identification of Topological Network Modules in Perturbed Protein Interaction Networks (2017)

Mihaela E. Sardiu, Joshua M. Gilmore, Brad Groppe, Laurence Florens, Michael P. Washburn

Abstract

Biological networks consist of functional modules, however detecting and characterizing such modules in networks remains challenging. Perturbing networks is one strategy for identifying modules. Here we used an advanced mathematical approach named topological data analysis (TDA) to interrogate two perturbed networks. In one, we disrupted the S. cerevisiae INO80 protein interaction network by isolating complexes after protein complex components were deleted from the genome. In the second, we reanalyzed previously published data demonstrating the disruption of the human Sin3 network with a histone deacetylase inhibitor. Here we show that disrupted networks contained topological network modules (TNMs) with shared properties that mapped onto distinct locations in networks. We define TMNs as proteins that occupy close network positions depending on their coordinates in a topological space. TNMs provide new insight into networks by capturing proteins from different categories including proteins within a complex, proteins with shared biological functions, and proteins disrupted across networks.

The Accumulated Persistence Function, a New Useful Functional Summary Statistic for Topological Data Analysis, With a View to Brain Artery Trees and Spatial Point Process Applications (2019)

C.A.N. Biscio, J. Møller

Abstract

We start with a simple introduction to topological data analysis where the most popular tool is called a persistence diagram. Briefly, a persistence diagram is a multiset of points in the plane describing the persistence of topological features of a compact set when a scale parameter varies. Since statistical methods are difficult to apply directly on persistence diagrams, various alternative functional summary statistics have been suggested, but either they do not contain the full information of the persistence diagram or they are two-dimensional functions. We suggest a new functional summary statistic that is one-dimensional and hence easier to handle, and which under mild conditions contains the full information of the persistence diagram. Its usefulness is illustrated in statistical settings concerned with point clouds and brain artery trees. The supplementary materials include additional methods and examples, technical details, and the R code used for all examples. © 2019, © 2019 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Combining Geometric and Topological Information in Image Segmentation (2019)

Hengrui Luo, Justin Strait

Abstract

A fundamental problem in computer vision is image segmentation, where the goal is to delineate the boundary of an object in the image. The focus of this work is on the segmentation of grayscale images and its purpose is two-fold. First, we conduct an in-depth study comparing active contour and topology-based methods in a statistical framework, two popular approaches for boundary detection of 2-dimensional images. Certain properties of the image dataset may favor one method over the other, both from an interpretability perspective as well as through evaluation of performance measures. Second, we propose the use of topological knowledge to assist an active contour method, which can potentially incorporate prior shape information. The latter is known to be extremely sensitive to algorithm initialization, and thus, we use a topological model to provide an automatic initialization. In addition, our proposed model can handle objects in images with more complex topological structures, including objects with holes and multiple objects within one image. We demonstrate this on artificially-constructed image datasets from computer vision, as well as real medical image data.

Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

John T. Nardini, Bernadette J. Stolz, Kevin B. Flores, Heather A. Harrington, Helen M. Byrne

Abstract

Angiogenesis is the process by which blood vessels form from pre-existing vessels. It plays a key role in many biological processes, including embryonic development and wound healing, and contributes to many diseases including cancer and rheumatoid arthritis. The structure of the resulting vessel networks determines their ability to deliver nutrients and remove waste products from biological tissues. Here we simulate the Anderson-Chaplain model of angiogenesis at different parameter values and quantify the vessel architectures of the resulting synthetic data. Specifically, we propose a topological data analysis (TDA) pipeline for systematic analysis of the model. TDA is a vibrant and relatively new field of computational mathematics for studying the shape of data. We compute topological and standard descriptors of model simulations generated by different parameter values. We show that TDA of model simulation data stratifies parameter space into regions with similar vessel morphology. The methodologies proposed here are widely applicable to other synthetic and experimental data including wound healing, development, and plant biology.

Connectivity in fMRI: Blind Spots and Breakthroughs (2018)

Victor Solo, Jean-Baptiste Poline, Martin A. Lindquist, Sean L. Simpson, F. DuBois Bowman, Moo K. Chung, Ben Cassidy

Abstract

In recent years, driven by scientific and clinical concerns, there has been an increased interest in the analysis of functional brain networks. The goal of these analyses is to better understand how brain regions interact, how this depends upon experimental conditions and behavioral measures and how anomalies (disease) can be recognized. In this work we provide, firstly, a brief review of some of the main existing methods of functional brain network analysis. But rather than compare them, as a traditional review would do, instead, we draw attention to their significant limitations and blind spots. Then, secondly, relevant experts, sketch a number of emerging methods, which can break through these limitations. In particular we discuss five such methods. The first two, stochastic block models and exponential random graph models, provide an inferential basis for network analysis lacking in the exploratory graph analysis methods. The other three address: network comparison via persistent homology, time-varying connectivity that distinguishes sample fluctuations from neural fluctuations and, network system identification that draws inferential strength from temporal autocorrelation.

Persistent Brain Network Homology From the Perspective of Dendrogram (2012)

Hyekyoung Lee, Hyejin Kang, Moo K. Chung, Bung-Nyun Kim, Dong Soo Lee

Abstract

The brain network is usually constructed by estimating the connectivity matrix and thresholding it at an arbitrary level. The problem with this standard method is that we do not have any generally accepted criteria for determining a proper threshold. Thus, we propose a novel multiscale framework that models all brain networks generated over every possible threshold. Our approach is based on persistent homology and its various representations such as the Rips filtration, barcodes, and dendrograms. This new persistent homological framework enables us to quantify various persistent topological features at different scales in a coherent manner. The barcode is used to quantify and visualize the evolutionary changes of topological features such as the Betti numbers over different scales. By incorporating additional geometric information to the barcode, we obtain a single linkage dendrogram that shows the overall evolution of the network. The difference between the two networks is then measured by the Gromov-Hausdorff distance over the dendrograms. As an illustration, we modeled and differentiated the FDG-PET based functional brain networks of 24 attention-deficit hyperactivity disorder children, 26 autism spectrum disorder children, and 11 pediatric control subjects.

Unsupervised Topological Learning Approach of Crystal Nucleation (2022)

Sébastien Becker, Emilie Devijver, Rémi Molinier, Noël Jakse

Abstract

Nucleation phenomena commonly observed in our every day life are of fundamental, technological and societal importance in many areas, but some of their most intimate mechanisms remain however to be unravelled. Crystal nucleation, the early stages where the liquid-to-solid transition occurs upon undercooling, initiates at the atomic level on nanometre length and sub-picoseconds time scales and involves complex multidimensional mechanisms with local symmetry breaking that can hardly be observed experimentally in the very details. To reveal their structural features in simulations without a priori, an unsupervised learning approach founded on topological descriptors loaned from persistent homology concepts is proposed. Applied here to monatomic metals, it shows that both translational and orientational ordering always come into play simultaneously as a result of the strong bonding when homogeneous nucleation starts in regions with low five-fold symmetry. It also reveals the specificity of the nucleation pathways depending on the element considered, with features beyond the hypothesis of Classical Nucleation Theory.

Community Resources

Code

Imaging-Based Representation and Stratification of Intra-Tumor Heterogeneity via Tree-Edit Distance (2022)

Lara Cavinato, Matteo Pegoraro, Alessandra Ragni, Francesca Ieva

Abstract

Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning.

CD8 T-Cell Reactivity to Islet Antigens Is Unique to Type 1 While CD4 T-Cell Reactivity Exists in Both Type 1 and Type 2 Diabetes (2014)

Ghanashyam Sarikonda, Jeremy Pettus, Sonal Phatak, Sowbarnika Sachithanantham, Jacqueline F. Miller, Johnna D. Wesley, Eithon Cadag, Ji Chae, Lakshmi Ganesan, Ronna Mallios, Steve Edelman, Bjoern Peters, Matthias von Herrath

Abstract

Previous cross-sectional analyses demonstrated that CD8+ and CD4+ T-cell reactivity to islet-specific antigens was more prevalent in T1D subjects than in healthy donors (HD). Here, we examined T1D-associated epitope-specific CD4+ T-cell cytokine production and autoreactive CD8+ T-cell frequency on a monthly basis for one year in 10 HD, 33 subjects with T1D, and 15 subjects with T2D. Autoreactive CD4+ T-cells from both T1D and T2D subjects produced more IFN-γ when stimulated than cells from HD. In contrast, higher frequencies of islet antigen-specific CD8+ T-cells were detected only in T1D. These observations support the hypothesis that general beta-cell stress drives autoreactive CD4+ T-cell activity while islet over-expression of MHC class I commonly seen in T1D mediates amplification of CD8+ T-cells and more rapid beta-cell loss. In conclusion, CD4+ T-cell autoreactivity appears to be present in both T1D and T2D while autoreactive CD8+ T-cells are unique to T1D. Thus, autoreactive CD8+ cells may serve as a more T1D-specific biomarker.

Relational Persistent Homology for Multispecies Data With Application to the Tumor Microenvironment (2023)

Bernadette J. Stolz, Jagdeep Dhesi, Joshua A. Bull, Heather A. Harrington, Helen M. Byrne, Iris H. R. Yoon

Abstract

Topological data analysis (TDA) is an active field of mathematics for quantifying shape in complex data. Standard methods in TDA such as persistent homology (PH) are typically focused on the analysis of data consisting of a single entity (e.g., cells or molecular species). However, state-of-the-art data collection techniques now generate exquisitely detailed multispecies data, prompting a need for methods that can examine and quantify the relations among them. Such heterogeneous data types arise in many contexts, ranging from biomedical imaging, geospatial analysis, to species ecology. Here, we propose two methods for encoding spatial relations among different data types that are based on Dowker complexes and Witness complexes. We apply the methods to synthetic multispecies data of a tumor microenvironment and analyze topological features that capture relations between different cell types, e.g., blood vessels, macrophages, tumor cells, and necrotic cells. We demonstrate that relational topological features can extract biological insight, including the dominant immune cell phenotype (an important predictor of patient prognosis) and the parameter regimes of a data-generating model. The methods provide a quantitative perspective on the relational analysis of multispecies spatial data, overcome the limits of traditional PH, and are readily computable.

Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations (2016)

Abdellah Tebani, Carlos Afonso, Stéphane Marret, Soumeya Bekri

Abstract

The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of next-generation diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by high-throughput omics technologies and the computational surge, it enables multi-scale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. High-throughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable high-dimensional data modeling and, therefore, accessible and user-friendly visualization. Furthermore, bioinformatics has enabled comprehensive multi-omics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present state-of-the-art multi-omics data analysis strategies in a clinical context. The challenges of omics-based biomarker translation are discussed. Perspectives regarding the use of multi-omics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the post-genomic era.

Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

Paolo Bajardi, Matteo Delfino, André Panisson, Giovanni Petri, Michele Tizzoni

Abstract

We analyse a large mobile phone activity dataset provided by Telecom Italia for the Telecom Big Data Challenge contest. The dataset reports the international country codes of every call/SMS made and received by mobile phone users in Milan, Italy, between November and December 2013, with a spatial resolution of about 200 meters. We first show that the observed spatial distribution of international codes well matches the distribution of international communities reported by official statistics, confirming the value of mobile phone data for demographic research. Next, we define an entropy function to measure the heterogeneity of the international phone activity in space and time. By comparing the entropy function to empirical data, we show that it can be used to identify the city’s hotspots, defined by the presence of points of interests. Eventually, we use the entropy function to characterize the spatial distribution of international communities in the city. Adopting a topological data analysis approach, we find that international mobile phone users exhibit some robust clustering patterns that correlate with basic socio-economic variables. Our results suggest that mobile phone records can be used in conjunction with topological data analysis tools to study the geography of migrant communities in a global city.

Community Resources

Code

Topological Data Analysis of Zebrafish Patterns (2020)

Melissa R. McGuirl, Alexandria Volkening, Björn Sandstede

Abstract

Self-organized pattern behavior is ubiquitous throughout nature, from fish schooling to collective cell dynamics during organism development. Qualitatively these patterns display impressive consistency, yet variability inevitably exists within pattern-forming systems on both microscopic and macroscopic scales. Quantifying variability and measuring pattern features can inform the underlying agent interactions and allow for predictive analyses. Nevertheless, current methods for analyzing patterns that arise from collective behavior capture only macroscopic features or rely on either manual inspection or smoothing algorithms that lose the underlying agent-based nature of the data. Here we introduce methods based on topological data analysis and interpretable machine learning for quantifying both agent-level features and global pattern attributes on a large scale. Because the zebrafish is a model organism for skin pattern formation, we focus specifically on analyzing its skin patterns as a means of illustrating our approach. Using a recent agent-based model, we simulate thousands of wild-type and mutant zebrafish patterns and apply our methodology to better understand pattern variability in zebrafish. Our methodology is able to quantify the differential impact of stochasticity in cell interactions on wild-type and mutant patterns, and we use our methods to predict stripe and spot statistics as a function of varying cellular communication. Our work provides an approach to automatically quantifying biological patterns and analyzing agent-based dynamics so that we can now answer critical questions in pattern formation at a much larger scale.

Tracking Resilience to Infections by Mapping Disease Space (2016)

Brenda Y. Torres, Jose Henrique M. Oliveira, Ann Thomas Tate, Poonam Rath, Katherine Cumnock, David S. Schneider

Abstract

Infected hosts differ in their responses to pathogens; some hosts are resilient and recover their original health, whereas others follow a divergent path and die. To quantitate these differences, we propose mapping the routes infected individuals take through “disease space.” We find that when plotting physiological parameters against each other, many pairs have hysteretic relationships that identify the current location of the host and predict the future route of the infection. These maps can readily be constructed from experimental longitudinal data, and we provide two methods to generate the maps from the cross-sectional data that is commonly gathered in field trials. We hypothesize that resilient hosts tend to take small loops through disease space, whereas nonresilient individuals take large loops. We support this hypothesis with experimental data in mice infected with Plasmodium chabaudi, finding that dying mice trace a large arc in red blood cells (RBCs) by reticulocyte space as compared to surviving mice. We find that human malaria patients who are heterozygous for sickle cell hemoglobin occupy a small area of RBCs by reticulocyte space, suggesting this approach can be used to distinguish resilience in human populations. This technique should be broadly useful in describing the in-host dynamics of infections in both model hosts and patients at both population and individual levels.

Topological Data Analysis: A Promising Big Data Exploration Tool in Biology, Analytical Chemistry and Physical Chemistry (2016)

Marc Offroy, Ludovic Duponchel

Abstract

An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data).

Airway Pathological Heterogeneity in Asthma: Visualization of Disease Microclusters Using Topological Data Analysis (2018)

Salman Siddiqui, Aarti Shikotra, Matthew Richardson, Emma Doran, David Choy, Alex Bell, Cary D. Austin, Jeffrey Eastham-Anderson, Beverley Hargadon, Joseph R. Arron, Andrew Wardlaw, Christopher E. Brightling, Liam G. Heaney, Peter Bradding

Abstract

Background Asthma is a complex chronic disease underpinned by pathological changes within the airway wall. How variations in structural airway pathology and cellular inflammation contribute to the expression and severity of asthma are poorly understood. Objectives Therefore we evaluated pathological heterogeneity using topological data analysis (TDA) with the aim of visualizing disease clusters and microclusters. Methods A discovery population of 202 adult patients (142 asthmatic patients and 60 healthy subjects) and an external replication population (59 patients with severe asthma) were evaluated. Pathology and gene expression were examined in bronchial biopsy samples. TDA was applied by using pathological variables alone to create pathology-driven visual networks. Results In the discovery cohort TDA identified 4 groups/networks with multiple microclusters/regions of interest that were masked by group-level statistics. Specifically, TDA group 1 consisted of a high proportion of healthy subjects, with a microcluster representing a topological continuum connecting healthy subjects to patients with mild-to-moderate asthma. Three additional TDA groups with moderate-to-severe asthma (Airway Smooth MuscleHigh, Reticular Basement MembraneHigh, and RemodelingLow groups) were identified and contained numerous microclusters with varying pathological and clinical features. Mutually exclusive TH2 and TH17 tissue gene expression signatures were identified in all pathological groups. Discovery and external replication applied to the severe asthma subgroup identified only highly similar “pathological data shapes” through analyses of persistent homology. Conclusions We have identified and replicated novel pathological phenotypes of asthma using TDA. Our methodology is applicable to other complex chronic diseases.

Multivariate Data Analysis Using Persistence-Based Filtering and Topological Signatures (2012)

B. Rieck, H. Mara, H. Leitte

Abstract

The extraction of significant structures in arbitrary high-dimensional data sets is a challenging task. Moreover, classifying data points as noise in order to reduce a data set bears special relevance for many application domains. Standard methods such as clustering serve to reduce problem complexity by providing the user with classes of similar entities. However, they usually do not highlight relations between different entities and require a stopping criterion, e.g. the number of clusters to be detected. In this paper, we present a visualization pipeline based on recent advancements in algebraic topology. More precisely, we employ methods from persistent homology that enable topological data analysis on high-dimensional data sets. Our pipeline inherently copes with noisy data and data sets of arbitrary dimensions. It extracts central structures of a data set in a hierarchical manner by using a persistence-based filtering algorithm that is theoretically well-founded. We furthermore introduce persistence rings, a novel visualization technique for a class of topological features-the persistence intervals-of large data sets. Persistence rings provide a unique topological signature of a data set, which helps in recognizing similarities. In addition, we provide interactive visualization techniques that assist the user in evaluating the parameter space of our method in order to extract relevant structures. We describe and evaluate our analysis pipeline by means of two very distinct classes of data sets: First, a class of synthetic data sets containing topological objects is employed to highlight the interaction capabilities of our method. Second, in order to affirm the utility of our technique, we analyse a class of high-dimensional real-world data sets arising from current research in cultural heritage.

Two-Tier Mapper, an Unbiased Topology-Based Clustering Method for Enhanced Global Gene Expression Analysis (2019)

Rachel Jeitziner, Mathieu Carrière, Jacques Rougemont, Steve Oudot, Kathryn Hess, Cathrin Brisken

Abstract

MOTIVATION: Unbiased clustering methods are needed to analyze growing numbers of complex datasets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small datasets. To overcome these shortcomings we used topological data analysis, an emerging field of mathematics that discerns additional feature and discovers hidden insights on datasets and has a wide application range. RESULTS: We have developed a topology-based clustering method called Two-Tier Mapper (TTMap) for enhanced analysis of global gene expression datasets. First, TTMap discerns divergent features in the control group, adjusts for them, and identifies outliers. Second, the deviation of each test sample from the control group in a high-dimensional space is computed, and the test samples are clustered using a new Mapper-based topological algorithm at two levels: a global tier and local tiers. All parameters are either carefully chosen or data-driven, avoiding any user-induced bias. The method is stable, different datasets can be combined for analysis, and significant subgroups can be identified. It outperforms current clustering methods in sensitivity and stability on synthetic and biological datasets, in particular when sample sizes are small; outcome is not affected by removal of control samples, by choice of normalization, or by subselection of data. TTMap is readily applicable to complex, highly variable biological samples and holds promise for personalized medicine. AVAILABILITY AND IMPLEMENTATION: TTMap is supplied as an R package in Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

Monica Nicolau, Arnold J. Levine, Gunnar Carlsson

Abstract

High-throughput biological data, whether generated as sequencing, transcriptional microarrays, proteomic, or other means, continues to require analytic methods that address its high dimensional aspects. Because the computational part of data analysis ultimately identifies shape characteristics in the organization of data sets, the mathematics of shape recognition in high dimensions continues to be a crucial part of data analysis. This article introduces a method that extracts information from high-throughput microarray data and, by using topology, provides greater depth of information than current analytic techniques. The method, termed Progression Analysis of Disease (PAD), first identifies robust aspects of cluster analysis, then goes deeper to find a multitude of biologically meaningful shape characteristics in these data. Additionally, because PAD incorporates a visualization tool, it provides a simple picture or graph that can be used to further explore these data. Although PAD can be applied to a wide range of high-throughput data types, it is used here as an example to analyze breast cancer transcriptional data. This identified a unique subgroup of Estrogen Receptor-positive (ER+) breast cancers that express high levels of c-MYB and low levels of innate inflammatory genes. These patients exhibit 100% survival and no metastasis. No supervised step beyond distinction between tumor and healthy patients was used to identify this subtype. The group has a clear and distinct, statistically significant molecular signature, it highlights coherent biology but is invisible to cluster methods, and does not fit into the accepted classification of Luminal A/B, Normal-like subtypes of ER+ breast cancers. We denote the group as c-MYB+ breast cancer.

Using Multidimensional Topological Data Analysis to Identify Traits of Hip Osteoarthritis (2018)

Jasmine Rossi‐deVries, Valentina Pedoia, Michael A. Samaan, Adam R. Ferguson, Richard B. Souza, Sharmila Majumdar

Abstract

Background Osteoarthritis (OA) is a multifaceted disease with many variables affecting diagnosis and progression. Topological data analysis (TDA) is a state-of-the-art big data analytics tool that can combine all variables into multidimensional space. TDA is used to simultaneously analyze imaging and gait analysis techniques. Purpose To identify biochemical and biomechanical biomarkers able to classify different disease progression phenotypes in subjects with and without radiographic signs of hip OA. Study Type Longitudinal study for comparison of progressive and nonprogressive subjects. Population In all, 102 subjects with and without radiographic signs of hip osteoarthritis. Field Strength/Sequence 3T, SPGR 3D MAPSS T1ρ/T2, intermediate-weighted fat-suppressed fast spin-echo (FSE). Assessment Multidimensional data analysis including cartilage composition, bone shape, Kellgren–Lawrence (KL) classification of osteoarthritis, scoring hip osteoarthritis with MRI (SHOMRI), hip disability and osteoarthritis outcome score (HOOS). Statistical Tests Analysis done using TDA, Kolmogorov–Smirnov (KS) testing, and Benjamini-Hochberg to rank P-value results to correct for multiple comparisons. Results Subjects in the later stages of the disease had an increased SHOMRI score (P \textless 0.0001), increased KL (P = 0.0012), and older age (P \textless 0.0001). Subjects in the healthier group showed intact cartilage and less pain. Subjects found between these two groups had a range of symptoms. Analysis of this subgroup identified knee biomechanics (P \textless 0.0001) as an initial marker of the disease that is noticeable before the morphological progression and degeneration. Further analysis of an OA subgroup with femoroacetabular impingement (FAI) showed anterior labral tears to be the most significant marker (P = 0.0017) between those FAI subjects with and without OA symptoms. Data Conclusion The data-driven analysis obtained with TDA proposes new phenotypes of these subjects that partially overlap with the radiographic-based classical disease status classification and also shows the potential for further examination of an early onset biomechanical intervention. Level of Evidence: 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;48:1046–1058.

Community Resources

Code (Software)

Persistent Homology Analysis of Ion Aggregations and Hydrogen-Bonding Networks (2018)

Kelin Xia

Abstract

Despite the great advancement of experimental tools and theoretical models, a quantitative characterization of the microscopic structures of ion aggregates and their associated water hydrogen-bonding networks still remains a challenging problem. In this paper, a newly-invented mathematical method called persistent homology is introduced, for the first time, to quantitatively analyze the intrinsic topological properties of ion aggregation systems and hydrogen-bonding networks. The two most distinguishable properties of persistent homology analysis of assembly systems are as follows. First, it does not require a predefined bond length to construct the ion or hydrogen-bonding network. Persistent homology results are determined by the morphological structure of the data only. Second, it can directly measure the size of circles or holes in ion aggregates and hydrogen-bonding networks. To validate our model, we consider two well-studied systems, i.e., NaCl and KSCN solutions, generated from molecular dynamics simulations. They are believed to represent two morphological types of aggregation, i.e., local clusters and extended ion networks. It has been found that the two aggregation types have distinguishable topological features and can be characterized by our topological model very well. Further, we construct two types of networks, i.e., O-networks and H2O-networks, for analyzing the topological properties of hydrogen-bonding networks. It is found that for both models, KSCN systems demonstrate much more dramatic variations in their local circle structures with a concentration increase. A consistent increase of large-sized local circle structures is observed and the sizes of these circles become more and more diverse. In contrast, NaCl systems show no obvious increase of large-sized circles. Instead a consistent decline of the average size of the circle structures is observed and the sizes of these circles become more and more uniform with a concentration increase. As far as we know, these unique intrinsic topological features in ion aggregation systems have never been pointed out before. More importantly, our models can be directly used to quantitatively analyze the intrinsic topological invariants, including circles, loops, holes, and cavities, of any network-like structures, such as nanomaterials, colloidal systems, biomolecular assemblies, among others. These topological invariants cannot be described by traditional graph and network models.

🍩 Database of Original & Non-Theoretical Uses of Topology

HiDeF: Identifying Persistent Structures in Multiscale ‘Omics Data (2021)

When Remote Sensing Meets Topological Data Analysis (2018)

Community Resources

A Multimodal Data Analysis Approach for Targeted Drug Discovery Involving Topological Data Analysis (TDA) (2016)

Topological Data Analysis Quantifies Biological Nano-Structure From Single Molecule Localization Microscopy (2020)

Specific Mutations in H5N1 Mainly Impact the Magnitude and Velocity of the Host Response in Mice (2013)

A Personality Trait Contributes to the Occurrence of Postoperative Delirium: A Prospective Study (2016)

Exploring Hyperspectral Imaging Data Sets With Topological Data Analysis (2017)

A Severe Asthma Disease Signature From Gene Expression Profiling of Peripheral Blood From U-Biopred Cohorts (2017)

Networked Data Analytics: Network Comparison and Applied Graph Signal Processing (2018)

Disrupted Resting State Network of Fibromyalgia in Theta Frequency (2018)

Interdisciplinary Approaches to Automated Obstructive Sleep Apnea Diagnosis Through High-Dimensional Multiple Scaled Data Analysis (2019)

Hierarchical Clustering and Zeroth Persistent Homology (2020)

Topographical Transcriptome Mapping of the Mouse Medial Ganglionic Eminence by Spatially Resolved RNA-seq (2014)

Spatial Embedding Imposes Constraints on Neuronal Network Architectures (2018)

The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis (2024)

Community Resources

Using Persistent Homology as a New Approach for Super-Resolution Localization Microscopy Data Analysis and Classification of γH2AX Foci/Clusters (2018)

Topological Data Analysis of Contagion Maps for Examining Spreading Processes on Networks (2015)

Identification of Topological Network Modules in Perturbed Protein Interaction Networks (2017)

The Accumulated Persistence Function, a New Useful Functional Summary Statistic for Topological Data Analysis, With a View to Brain Artery Trees and Spatial Point Process Applications (2019)

Combining Geometric and Topological Information in Image Segmentation (2019)

Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

Connectivity in fMRI: Blind Spots and Breakthroughs (2018)

Persistent Brain Network Homology From the Perspective of Dendrogram (2012)

Unsupervised Topological Learning Approach of Crystal Nucleation (2022)

Community Resources

Imaging-Based Representation and Stratification of Intra-Tumor Heterogeneity via Tree-Edit Distance (2022)

CD8 T-Cell Reactivity to Islet Antigens Is Unique to Type 1 While CD4 T-Cell Reactivity Exists in Both Type 1 and Type 2 Diabetes (2014)

Relational Persistent Homology for Multispecies Data With Application to the Tumor Microenvironment (2023)

Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations (2016)

Unveiling Patterns of International Communities in a Global City Using Mobile Phone Data (2015)

Community Resources

Topological Data Analysis of Zebrafish Patterns (2020)

Tracking Resilience to Infections by Mapping Disease Space (2016)

Topological Data Analysis: A Promising Big Data Exploration Tool in Biology, Analytical Chemistry and Physical Chemistry (2016)

Airway Pathological Heterogeneity in Asthma: Visualization of Disease Microclusters Using Topological Data Analysis (2018)

Multivariate Data Analysis Using Persistence-Based Filtering and Topological Signatures (2012)

Two-Tier Mapper, an Unbiased Topology-Based Clustering Method for Enhanced Global Gene Expression Analysis (2019)

Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

Using Multidimensional Topological Data Analysis to Identify Traits of Hip Osteoarthritis (2018)

Community Resources

Persistent Homology Analysis of Ion Aggregations and Hydrogen-Bonding Networks (2018)