Protein Classification With Improved Topological Data Analysis (2018)

Persistence Weighted Gaussian Kernel for Topological Data Analysis (2016)

Genki Kusano, Yasuaki Hiraoka, Kenji Fukumizu

Abstract

Topological data analysis (TDA) is an emerging mathematical concept for characterizing shapes in complex data. In TDA, persistence diagrams are widely recognized as a useful descriptor of data, and...

A Collaborative Visual Analytics Suite for Protein Folding Research (2014)

William Harvey, In-Hee Park, Oliver Rübel, Valerio Pascucci, Peer-Timo Bremer, Chenglong Li, Yusu Wang

Conserved Abundance and Topological Features in Chromatin-Remodeling Protein Interaction Networks (2015)

Mihaela E Sardiu, Joshua M Gilmore, Brad D Groppe, Damir Herman, Sreenivasa R Ramisetty, Yong Cai, Jingji Jin, Ronald C Conaway, Joan W Conaway, Laurence Florens, Michael P Washburn

Abstract

Abstract The study of conserved protein interaction networks seeks to better understand the evolution and regulation of protein interactions. Here, we present a quantitative proteomic analysis of 18 orthologous baits from three distinct chromatin-remodeling complexes in Saccharomyces cerevisiae and Homo sapiens. We demonstrate that abundance levels of orthologous proteins correlate strongly between the two organisms and both networks have highly similar topologies. We therefore used the protein abundances in one species to cross-predict missing protein abundance levels in the other species. Lastly, we identified a novel conserved low-abundance subnetwork further demonstrating the value of quantitative analysis of networks.

Protein-Folding Analysis Using Features Obtained by Persistent Homology (2020)

Takashi Ichinomiya, Ippei Obayashi, Yasuaki Hiraoka

Abstract

Understanding the protein-folding process is an outstanding issue in biophysics; recent developments in molecular dynamics simulation have provided insights into this phenomenon. However, the large freedom of atomic motion hinders the understanding of this process. In this study, we applied persistent homology, an emerging method to analyze topological features in a data set, to reveal protein-folding dynamics. We developed a new, to our knowledge, method to characterize the protein structure based on persistent homology and applied this method to molecular dynamics simulations of chignolin. Using principle component analysis or nonnegative matrix factorization, our analysis method revealed two stable states and one saddle state, corresponding to the native, misfolded, and transition states, respectively. We also identified an unfolded state with slow dynamics in the reduced space. Our method serves as a promising tool to understand the protein-folding process.

Identification of Topological Network Modules in Perturbed Protein Interaction Networks (2017)

Mihaela E. Sardiu, Joshua M. Gilmore, Brad Groppe, Laurence Florens, Michael P. Washburn

Abstract

Biological networks consist of functional modules, however detecting and characterizing such modules in networks remains challenging. Perturbing networks is one strategy for identifying modules. Here we used an advanced mathematical approach named topological data analysis (TDA) to interrogate two perturbed networks. In one, we disrupted the S. cerevisiae INO80 protein interaction network by isolating complexes after protein complex components were deleted from the genome. In the second, we reanalyzed previously published data demonstrating the disruption of the human Sin3 network with a histone deacetylase inhibitor. Here we show that disrupted networks contained topological network modules (TNMs) with shared properties that mapped onto distinct locations in networks. We define TMNs as proteins that occupy close network positions depending on their coordinates in a topological space. TNMs provide new insight into networks by capturing proteins from different categories including proteins within a complex, proteins with shared biological functions, and proteins disrupted across networks.

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction (2020)

Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov, Katherine Yelick

Abstract

Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structure-based prediction schemes. We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own as well as a baseline neural network that learns from the same information. PersGNN achieves a 9.3% boost in area under the precision recall curve (AUPR) compared to the best individual model, as well as high F1 scores across different gene ontology categories, indicating the transferability of this approach.

Multidimensional Persistence in Biomolecular Data (2015)

Kelin Xia, Guo-Wei Wei

Abstract

Persistent homology has emerged as a popular technique for the topological simplification of big data, including biomolecular data. Multidimensional persistence bears considerable promise to bridge the gap between geometry and topology. However, its practical and robust construction has been a challenge. We introduce two families of multidimensional persistence, namely pseudo-multidimensional persistence and multiscale multidimensional persistence. The former is generated via the repeated applications of persistent homology filtration to high dimensional data, such as results from molecular dynamics or partial differential equations. The latter is constructed via isotropic and anisotropic scales that create new simiplicial complexes and associated topological spaces. The utility, robustness and efficiency of the proposed topological methods are demonstrated via protein folding, protein flexibility analysis, the topological denoising of cryo-electron microscopy data, and the scale dependence of nano particles. Topological transition between partial folded and unfolded proteins has been observed in multidimensional persistence. The separation between noise topological signatures and molecular topological fingerprints is achieved by the Laplace-Beltrami flow. The multiscale multidimensional persistent homology reveals relative local features in Betti-0 invariants and the relatively global characteristics of Betti-1 and Betti-2 invariants.

A Topological Measurement of Protein Compressibility (2015)

Marcio Gameiro, Yasuaki Hiraoka, Shunsuke Izumi, Miroslav Kramar, Konstantin Mischaikow, Vidit Nanda

Abstract

In this paper we partially clarify the relation between the compressibility of a protein and its molecular geometric structure. To identify and understand the relevant topological features within a given protein, we model its molecule as an alpha filtration and hence obtain multi-scale insight into the structure of its tunnels and cavities. The persistence diagrams of this alpha filtration capture the sizes and robustness of such tunnels and cavities in a compact and meaningful manner. From these persistence diagrams, we extract a measure of compressibility derived from those topological features whose relevance is suggested by physical and chemical properties. Due to recent advances in combinatorial topology, this measure is efficiently and directly computable from information found in the Protein Data Bank (PDB). Our main result establishes a clear linear correlation between the topological measure and the experimentally-determined compressibility of most proteins for which both PDB information and experimental compressibility data are available. Finally, we establish that both the topological measurement and the linear correlation are stable with respect to small perturbations in the input data, such as those arising from experimental errors in compressibility and X-ray crystallography experiments.

Atom-Specific Persistent Homology and Its Application to Protein Flexibility Analysis (2020)

David Bramer, Guo-Wei Wei

Abstract

Recently, persistent homology has had tremendous success in biomolecular data analysis. It works by examining the topological relationship or connectivity of a group of atoms in a molecule at a variety of scales, then rendering a family of topological representations of the molecule. However, persistent homology is rarely employed for the analysis of atomic properties, such as biomolecular flexibility analysis or B-factor prediction. This work introduces atom-specific persistent homology to provide a local atomic level representation of a molecule via a global topological tool. This is achieved through the construction of a pair of conjugated sets of atoms and corresponding conjugated simplicial complexes, as well as conjugated topological spaces. The difference between the topological invariants of the pair of conjugated sets is measured by Bottleneck and Wasserstein metrics and leads to an atom-specific topological representation of individual atomic properties in a molecule. Atom-specific topological features are integrated with various machine learning algorithms, including gradient boosting trees and convolutional neural network for protein thermal fluctuation analysis and B-factor prediction. Extensive numerical results indicate the proposed method provides a powerful topological tool for analyzing and predicting localized information in complex macromolecules.

Persistent Homology Analysis of Protein Structure, Flexibility, and Folding (2014)

Kelin Xia, Guo-Wei Wei

Abstract

SUMMARYProteins are the most important biomolecules for living organisms. The understanding of protein structure, function, dynamics, and transport is one of the most challenging tasks in biological science. In the present work, persistent homology is, for the first time, introduced for extracting molecular topological fingerprints (MTFs) based on the persistence of molecular topological invariants. MTFs are utilized for protein characterization, identification, and classification. The method of slicing is proposed to track the geometric origin of protein topological invariants. Both all-atom and coarse-grained representations of MTFs are constructed. A new cutoff-like filtration is proposed to shed light on the optimal cutoff distance in elastic network models. On the basis of the correlation between protein compactness, rigidity, and connectivity, we propose an accumulated bar length generated from persistent topological invariants for the quantitative modeling of protein flexibility. To this end, a correlation matrix-based filtration is developed. This approach gives rise to an accurate prediction of the optimal characteristic distance used in protein B-factor analysis. Finally, MTFs are employed to characterize protein topological evolution during protein folding and quantitatively predict the protein folding stability. An excellent consistence between our persistent homology prediction and molecular dynamics simulation is found. This work reveals the topology–function relationship of proteins. Copyright © 2014 John Wiley & Sons, Ltd.

Persistent Homology Analysis of Osmolyte Molecular Aggregation and Their Hydrogen-Bonding Networks (2019)

Kelin Xia, D. Vijay Anand, Saxena Shikhar, Yuguang Mu

Abstract

Dramatically different properties have been observed for two types of osmolytes, i.e., trimethylamine N-oxide (TMAO) and urea, in a protein folding process. Great progress has been made in revealing the potential underlying mechanism of these two osmolyte systems. However, many problems still remain unsolved. In this paper, we propose to use the persistent homology to systematically study the osmolytes’ molecular aggregation and their hydrogen-bonding network from a global topological perspective. It has been found that, for the first time, TMAO and urea show two extremely different topological behaviors, i.e., an extensive network and local clusters, respectively. In general, TMAO forms highly consistent large loop or circle structures in high concentrations. In contrast, urea is more tightly aggregated locally. Moreover, the resulting hydrogen-bonding networks also demonstrate distinguishable features. With a concentration increase, TMAO hydrogen-bonding networks vary greatly in their total number of loop structures and large-sized loop structures consistently increase. In contrast, urea hydrogen-bonding networks remain relatively stable with slight reduction of the total loop number. Moreover, the persistent entropy (PE) is, for the first time, used in characterization of the topological information of the aggregation and hydrogen-bonding networks. The average PE systematically increases with the concentration for both TMAO and urea, and decreases in their hydrogen-bonding networks. But their PE variances have totally different behaviors. Finally, topological features of the hydrogen-bonding networks are found to be highly consistent with those from the ion aggregation systems, indicating that our topological invariants can characterize intrinsic features of the “structure making” and “structure breaking” systems.

The Architecture of the Endoplasmic Reticulum Is Regulated by the Reversible Lipid Modification of the Shaping Protein CLIMP-63 (2018)

Patrick A. Sandoz, Robin A. Denhardt-Eriksson, Laurence Abrami, Luciano Abriata, Gard Spreemann, Catherine Maclachlan, Sylvia Ho, Béatrice Kunz, Kathryn Hess, Graham Knott, Vassily Hatzimanikatis, F. Gisou van der Goot

Abstract

\textlessh3\textgreaterAbstract\textless/h3\textgreater \textlessp\textgreaterThe endoplasmic reticulum (ER) has a complex morphology generated and maintained by membrane-shaping proteins and membrane energy minimization, though not much is known about how it is regulated. The architecture of this intracellular organelle is balanced between large, thin sheets that are densely packed in the perinuclear region and a connected network of branched, elongated tubules that extend throughout the cytoplasm. Sheet formation is known to involve the cytoskeleton-linking membrane protein 63 (CLIMP-63), though its regulation and the depth of its involvement remain unknown. Here we show that the post-translational modification of CLIMP-63 by the palmitoyltransferase ZDHHC6 controls the relative distribution of CLIMP-63 between the ER and the plasma membrane. By combining data-driven mathematical modeling, predictions, and experimental validation, we found that the attachment of a medium chain fatty acid, so-called S-palmitoylation, to the unique CLIMP-63 cytoplasmic cysteine residue drastically reduces its turnover rate, and thereby controls its abundance. Light microscopy and focused ion beam electron microcopy further revealed that enhanced CLIMP-63 palmitoylation leads to strong ER-sheet proliferation. Altogether, we show that ZDHHC6-mediated S-palmitoylation regulates the cellular localization of CLIMP-63, the morphology of the ER, and the interconversion of ER structural elements in mammalian cells through its action on the CLIMP-63 protein.\textless/p\textgreater\textlessh3\textgreaterSignificance Statement\textless/h3\textgreater \textlessp\textgreaterEukaryotic cells subcompartmentalize their various functions into organelles, the shape of each being specific and necessary for its proper role. However, how these shapes are generated and controlled is poorly understood. The endoplasmic reticulum is the largest membrane-bound intracellular compartment, accounting for more than 50% of all cellular membranes. We found that the shape and quantity of its sheet-like structures are controlled by a specific protein, cytoskeleton-linking membrane protein 63, through the acquisition of a lipid chain attached by an enzyme called ZDHHC6. Thus, by modifying the ZDHHC6 amounts, a cell can control the shape of its ER. The modeling and prediction technique used herein also provides a method for studying the interconnected function of other post-translational modifications in organelles.\textless/p\textgreater

Object-Oriented Persistent Homology (2016)

Bao Wang, Guo-Wei Wei

Abstract

Persistent homology provides a new approach for the topological simplification of big data via measuring the life time of intrinsic topological features in a filtration process and has found its success in scientific and engineering applications. However, such a success is essentially limited to qualitative data classification and analysis. Indeed, persistent homology has rarely been employed for quantitative modeling and prediction. Additionally, the present persistent homology is a passive tool, rather than a proactive technique, for classification and analysis. In this work, we outline a general protocol to construct object-oriented persistent homology methods. By means of differential geometry theory of surfaces, we construct an objective functional, namely, a surface free energy defined on the data of interest. The minimization of the objective functional leads to a Laplace-Beltrami operator which generates a multiscale representation of the initial data and offers an objective oriented filtration process. The resulting differential geometry based object-oriented persistent homology is able to preserve desirable geometric features in the evolutionary filtration and enhances the corresponding topological persistence. The cubical complex based homology algorithm is employed in the present work to be compatible with the Cartesian representation of the Laplace-Beltrami flow. The proposed Laplace-Beltrami flow based persistent homology method is extensively validated. The consistence between Laplace-Beltrami flow based filtration and Euclidean distance based filtration is confirmed on the Vietoris-Rips complex for a large amount of numerical tests. The convergence and reliability of the present Laplace-Beltrami flow based cubical complex filtration approach are analyzed over various spatial and temporal mesh sizes. The Laplace-Beltrami flow based persistent homology approach is utilized to study the intrinsic topology of proteins and fullerene molecules. Based on a quantitative model which correlates the topological persistence of fullerene central cavity with the total curvature energy of the fullerene structure, the proposed method is used for the prediction of fullerene isomer stability. The efficiency and robustness of the present method are verified by more than 500 fullerene molecules. It is shown that the proposed persistent homology based quantitative model offers good predictions of total curvature energies for ten types of fullerene isomers. The present work offers the first example to design object-oriented persistent homology to enhance or preserve desirable features in the original data during the filtration process and then automatically detect or extract the corresponding topological traits from the data.

🍩 Database of Original & Non-Theoretical Uses of Topology

Protein Classification With Improved Topological Data Analysis (2018)

Persistence Weighted Gaussian Kernel for Topological Data Analysis (2016)

A Collaborative Visual Analytics Suite for Protein Folding Research (2014)

Conserved Abundance and Topological Features in Chromatin-Remodeling Protein Interaction Networks (2015)

Protein-Folding Analysis Using Features Obtained by Persistent Homology (2020)

Identification of Topological Network Modules in Perturbed Protein Interaction Networks (2017)

PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction (2020)

Multidimensional Persistence in Biomolecular Data (2015)

A Topological Measurement of Protein Compressibility (2015)

Atom-Specific Persistent Homology and Its Application to Protein Flexibility Analysis (2020)

Persistent Homology Analysis of Protein Structure, Flexibility, and Folding (2014)

Persistent Homology Analysis of Osmolyte Molecular Aggregation and Their Hydrogen-Bonding Networks (2019)

The Architecture of the Endoplasmic Reticulum Is Regulated by the Reversible Lipid Modification of the Shaping Protein CLIMP-63 (2018)

Object-Oriented Persistent Homology (2016)