🍩 Database of Original & Non-Theoretical Uses of Topology

(found 91 matches in 0.010586s)
  1. Histopathological Cancer Detection With Topological Signatures (2023)

    Ankur Yadav, Faisal Ahmed, Ovidiu Daescu, Reyhan Gedik, Baris Coskunuzer
    Abstract We present a transformative approach to histopathological cancer detection and grading by introducing a very powerful feature extraction method based on the latest topological data analysis tools. By analyzing the evolution of topological patterns in different color channels, we discovered that every tumor class leaves its own topological footprint in histopathological images, allowing to extract feature vectors that can be used to reliably identify tumor classes.Our topological signatures, even when combined with traditional machine learning methods, provide very fast and highly accurate results in various settings. While most DL models work well for one type of cancer, our model easily adapts to different scenarios, and consistently gives highly competitive results with the state-of-the-art models on benchmark datasets across multiple cancer types including bone, colon, breast, cervical (cytopathology), and prostate cancer. Unlike most DL models, our proposed Topo-ML model does not need any data augmentation or pre-processing steps and works perfectly on small datasets. The model is computationally very efficient, with end-to-end processing taking only a few hours for datasets consisting of thousands of images.
  2. Machine Learning and Topological Data Analysis Identify Unique Features of Human Papillae in 3D Scans (2023)

    Rayna Andreeva, Anwesha Sarkar, Rik Sarkar
    Abstract The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics.
  3. Determining Clinically Relevant Features in Cytometry Data Using Persistent Homology (2022)

    Soham Mukherjee, Darren Wethington, Tamal K. Dey, Jayajit Das
    Abstract Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. We present that persistent homology, a mathematical structure that summarizes the topological features, can distinguish different sources of data, such as from groups of healthy donors or patients, effectively. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as ‘elbows’. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.

    Community Resources

  4. Raman Spectroscopy and Topological Machine Learning for Cancer Grading (2023)

    Francesco Conti, Mario D’Acunto, Claudia Caudai, Sara Colantonio, Raffaele Gaeta, Davide Moroni, Maria Antonietta Pascali
    Abstract In the last decade, Raman Spectroscopy is establishing itself as a highly promising technique for the classification of tumour tissues as it allows to obtain the biochemical maps of the tissues under investigation, making it possible to observe changes among different tissues in terms of biochemical constituents (proteins, lipid structures, DNA, vitamins, and so on). In this paper, we aim to show that techniques emerging from the cross-fertilization of persistent homology and machine learning can support the classification of Raman spectra extracted from cancerous tissues for tumour grading. In more detail, topological features of Raman spectra and machine learning classifiers are trained in combination as an automatic classification pipeline in order to select the best-performing pair. The case study is the grading of chondrosarcoma in four classes: cross and leave-one-patient-out validations have been used to assess the classification accuracy of the method. The binary classification achieves a validation accuracy of 81% and a test accuracy of 90%. Moreover, the test dataset has been collected at a different time and with different equipment. Such results are achieved by a support vector classifier trained with the Betti Curve representation of the topological features extracted from the Raman spectra, and are excellent compared with the existing literature. The added value of such results is that the model for the prediction of the chondrosarcoma grading could easily be implemented in clinical practice, possibly integrated into the acquisition system.
  5. Acute Lymphoblastic Leukemia Classification Using Persistent Homology (2024)

    Waqar Hussain Shah, Abdullah Baloch, Rider Jaimes-Reátegui, Sohail Iqbal, Syeda Rafia Fatima, Alexander N. Pisarchik
    Abstract Acute Lymphoblastic Leukemia (ALL) is a prevalent form of childhood blood cancer characterized by the proliferation of immature white blood cells that rapidly replace normal cells in the bone marrow. The exponential growth of these leukemic cells can be fatal if not treated promptly. Classifying lymphoblasts and healthy cells poses a significant challenge, even for domain experts, due to their morphological similarities. Automated computer analysis of ALL can provide substantial support in this domain and potentially save numerous lives. In this paper, we propose a novel classification approach that involves analyzing shapes and extracting topological features of ALL cells. We employ persistent homology to capture these topological features. Our technique accurately and efficiently detects and classifies leukemia blast cells, achieving a recall of 98.2% and an F1-score of 94.6%. This approach has the potential to significantly enhance leukemia diagnosis and therapy.
  6. The Shape of Cancer Relapse: Topological Data Analysis Predicts Recurrence in Paediatric Acute Lymphoblastic Leukaemia (2021)

    Salvador Chulián, Bernadette J. Stolz, Álvaro Martínez-Rubio, Cristina Blázquez Goñi, Juan F. Rodríguez Gutiérrez, Teresa Caballero Velázquez, Águeda Molinos Quintana, Manuel Ramírez Orellana, Ana Castillo Robleda, José Luis Fuster Soler, Alfredo Minguela Puras, María Victoria Martínez Sánchez, María Rosa, Víctor M. Pérez-García, Helen Byrne
    Abstract Acute Lymphoblastic Leukaemia (ALL) is the most frequent paediatric cancer. Modern therapies have improved survival rates, but approximately 15-20 % of patients relapse. At present, patients’ risk of relapse are assessed by projecting high-dimensional flow cytometry data onto a subset of biomarkers and manually estimating the shape of this reduced data. Here, we apply methods from topological data analysis (TDA), which quantify shape in data via features such as connected components and loops, to pre-treatment ALL datasets with known outcomes. We combine these fully unsupervised analyses with machine learning to identify features in the pre-treatment data that are prognostic for risk of relapse. We find significant topological differences between relapsing and non-relapsing patients and confirm the predictive power of CD10, CD20, CD38, and CD45. Further, we are able to use the TDA descriptors to predict patients who relapsed. We propose three prognostic pipelines that readily extend to other haematological malignancies. Teaser Topology reveals features in flow cytometry data which predict relapse of patients with acute lymphoblastic leukemia
  7. A Data-Driven Workflow for Evaporation Performance Degradation Analysis: A Full-Scale Case Study in the Herbal Medicine Manufacturing Industry (2023)

    Sheng Zhang, Xinyuan Xie, Haibin Qu
    Abstract The evaporation process is a common step in herbal medicine manufacturing and often lasts for a long time. The degradation of evaporation performance is inevitable, leading to more consumption of steam and electricity, and it may also have an impact on the content of thermosensitive components. Recently, a vast amount of evaporation process data is collected with the aid of industrial information systems, and process knowledge is hidden behind the data. But currently, these data are seldom deeply analyzed. In this work, an exploratory data analysis workflow is proposed to evaluate the evaporation performance and to identify the root causes of the performance degradation. The workflow consists of 6 steps: data collecting, preprocessing, characteristic stage identification, feature extraction, model development and interpretation, and decision making. In the model development and interpretation step, the workflow employs the HDBSCAN clustering algorithm for data annotation and then uses the ccPCA method to compare the differences between clusters for root cause analysis. A full-scale case is presented to verify the effectiveness of the workflow. The evaporation process data of 192 batches in 2018 were collected in the case. Through the steps of the workflow, the features of each batch were extracted, and the batches were clustered into 6 groups. The root causes of the performance degradation were determined as the high Pv,II and high LI by ccPCA. Recommended suggestions for future manufacturing were given according to the results. The proposed workflow can determine the root causes of the evaporation performance degradation.
  8. CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

    Hongxiao Wang, Gang Huang, Zhuo Zhao, Liang Cheng, Anna Juncker-Jensen, Máté Levente Nagy, Xin Lu, Xiangliang Zhang, Danny Z. Chen
    Abstract Pathology images contain rich information of cell appearance, microenvironment, and topology features for cancer analysis and diagnosis. Among such features, topology becomes increasingly important in analysis for cancer immunotherapy. By analyzing geometric and hierarchically structured cell distribution topology, oncologists can identify densely-packed and cancer-relevant cell communities (CCs) for making decisions. Compared to commonly-used pixel-level Convolution Neural Network (CNN) features and cell-instance-level Graph Neural Network (GNN) features, CC topology features are at a higher level of granularity and geometry. However, topological features have not been well exploited by recent deep learning (DL) methods for pathology image classification due to lack of effective topological descriptors for cell distribution and gathering patterns. In this paper, inspired by clinical practice, we analyze and classify pathology images by comprehensively learning cell appearance, microenvironment, and topology in a fine-to-coarse manner. To describe and exploit topology, we design Cell Community Forest (CCF), a novel graph that represents the hierarchical formulation process of big-sparse CCs from small-dense CCs. Using CCF as a new geometric topological descriptor of tumor cells in pathology images, we propose CCF-GNN, a GNN model that successively aggregates heterogeneous features (e.g., appearance, microenvironment) from cell-instance-level, cell-community-level, into image-level for pathology image classification. Extensive cross-validation experiments show that our method significantly outperforms alternative methods on H&E-stained; immunofluorescence images for disease grading tasks with multiple cancer types. Our proposed CCF-GNN establishes a new topological data analysis (TDA) based method, which facilitates integrating multi-level heterogeneous features of point clouds (e.g., for cells) into a unified DL framework.
  9. Alzheimer Disease Detection From Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning (2023)

    Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D’Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini
    Abstract The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer’s disease (AD) as well as of 5 pathological controls was collected and analyzed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (\textgreater87%). Although our results are preliminary, they indicate that RS and topological analysis may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to investigate whether topological data analysis could support the characterization of AD subtypes.
  10. Statistical Inference for Persistent Homology Applied to Simulated fMRI Time Series Data (2023)

    Hassan Abdallah, Adam Regalski, Mohammad Behzad Kang, Maria Berishaj, Nkechi Nnadi, Asadur Chowdury, Vaibhav A. Diwadkar, Andrew Salch
    Abstract Time-series data are amongst the most widely-used in biomedical sciences, including domains such as functional Magnetic Resonance Imaging (fMRI). Structure within time series data can be captured by the tools of topological data analysis (TDA). Persistent homology is the mostly commonly used data-analytic tool in TDA, and can effectively summarize complex high-dimensional data into an interpretable 2-dimensional representation called a persistence diagram. Existing methods for statistical inference for persistent homology of data depend on an independence assumption being satisfied. While persistent homology can be computed for each time index in a time-series, time-series data often fail to satisfy the independence assumption. This paper develops a statistical test that obviates the independence assumption by implementing a multi-level block sampled Monte Carlo test with sets of persistence diagrams. Its efficacy for detecting task-dependent topological organization is then demonstrated on simulated fMRI data. This new statistical test is therefore suitable for analyzing persistent homology of fMRI data, and of non-independent data in general.
  11. Feasibility of Topological Data Analysis for Event-Related fMRI (2019)

    Cameron T. Ellis, Michael Lesnick, Gregory Henselman-Petrusek, Bryn Keller, Jonathan D. Cohen
    Abstract Recent fMRI research shows that perceptual and cognitive representations are instantiated in high-dimensional multivoxel patterns in the brain. However, the methods for detecting these representations are limited. Topological data analysis (TDA) is a new approach, based on the mathematical field of topology, that can detect unique types of geometric features in patterns of data. Several recent studies have successfully applied TDA to study various forms of neural data; however, to our knowledge, TDA has not been successfully applied to data from event-related fMRI designs. Event-related fMRI is very common but limited in terms of the number of events that can be run within a practical time frame and the effect size that can be expected. Here, we investigate whether persistent homology—a popular TDA tool that identifies topological features in data and quantifies their robustness—can identify known signals given these constraints. We use fmrisim, a Python-based simulator of realistic fMRI data, to assess the plausibility of recovering a simple topological representation under a variety of conditions. Our results suggest that persistent homology can be used under certain circumstances to recover topological structure embedded in realistic fMRI data simulations.How do we represent the world? In cognitive neuroscience it is typical to think representations are points in high-dimensional space. In order to study these kinds of spaces it is necessary to have tools that capture the organization of high-dimensional data. Topological data analysis (TDA) holds promise for detecting unique types of geometric features in patterns of data. Although potentially useful, TDA has not been applied to event-related fMRI data. Here we utilized a popular tool from TDA, persistent homology, to recover topological signals from event-related fMRI data. We simulated realistic fMRI data and explored the parameters under which persistent homology can successfully extract signal. We also provided extensive code and recommendations for how to make the most out of TDA for fMRI analysis.
  12. Characterising Epithelial Tissues Using Persistent Entropy (2019)

    N. Atienza, L. M. Escudero, M. J. Jimenez, M. Soriano-Trigueros
    Abstract In this paper, we apply persistent entropy, a novel topological statistic, for characterization of images of epithelial tissues. We have found out that persistent entropy is able to summarize topological and geometric information encoded by \$\$\alpha \$\$α-complexes and persistent homology. After using some statistical tests, we can guarantee the existence of significant differences in the studied tissues.
  13. Towards a New Approach to Reveal Dynamical Organization of the Brain Using Topological Data Analysis (2018)

    Manish Saggar, Olaf Sporns, Javier Gonzalez-Castillo, Peter A. Bandettini, Gunnar Carlsson, Gary Glover, Allan L. Reiss
    Abstract Approaches describing how the brain changes to accomplish cognitive tasks tend to rely on collapsed data. Here, authors present a new approach that maintains high dimensionality and use it to describe individual differences in how brain activity is represented and organized across different cognitive tasks.
  14. Possible Clinical Use of Big Data: Personal Brain Connectomics (2018)

    Dong Soo Lee
    Abstract The biggest data is brain imaging data, which waited for clinical use during the last three decades. Topographic data interpretation prevailed for the first two decades, and only during the last decade, connectivity or connectomics data began to be analyzed properly. Owing to topological data interpretation and timely introduction of likelihood method based on hierarchical generalized linear model, we now foresee the clinical use of personal connectomics for classification and prediction of disease prognosis for brain diseases without any clue by currently available diagnostic methods.
  15. Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury (2015)

    Jessica L. Nielson, Jesse Paquette, Aiwen W. Liu, Cristian F. Guandique, C. Amy Tovar, Tomoo Inoue, Karen-Amanda Irvine, John C. Gensel, Jennifer Kloke, Tanya C. Petrossian, Pek Y. Lum, Gunnar E. Carlsson, Geoffrey T. Manley, Wise Young, Michael S. Beattie, Jacqueline C. Bresnahan, Adam R. Ferguson
    Abstract Data-driven discovery in complex neurological disorders has potential to extract meaningful knowledge from large, heterogeneous datasets. Here the authors apply topological data analysis to assess therapeutic effects in preclinical traumatic brain injury and spinal cord injury research studies.
  16. Persistent Homology Analysis of Brain Artery Trees (2016)

    Paul Bendich, J. S. Marron, Ezra Miller, Alex Pieloch, Sean Skwerer
    Abstract New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries.
  17. Topological Data Analysis for Arrhythmia Detection Through Modular Neural Networks (2020)

    Meryll Dindin, Yuhei Umeda, Frederic Chazal
    Abstract This paper presents an innovative and generic deep learning approach to monitor heart conditions from ECG signals. We focus our attention on both the detection and classification of abnormal heartbeats, known as arrhythmia. We strongly insist on generalization throughout the construction of a shallow deep-learning model that turns out to be effective for new unseen patient. The novelty of our approach relies on the use of topological data analysis to deal with individual differences. We show that our structure reaches the performances of the state-of-the-art methods for both arrhythmia detection and classification.
  18. Characterizing Scales of Genetic Recombination and Antibiotic Resistance in Pathogenic Bacteria Using Topological Data Analysis (2014)

    Kevin J. Emmett, Raul Rabadan
    Abstract Pathogenic bacteria present a large disease burden on human health. Control of these pathogens is hampered by rampant lateral gene transfer, whereby pathogenic strains may acquire genes conferring resistance to common antibiotics. Here we introduce tools from topological data analysis to characterize the frequency and scale of lateral gene transfer in bacteria, focusing on a set of pathogens of significant public health relevance. As a case study, we examine the spread of antibiotic resistance in Staphylococcus aureus. Finally, we consider the possible role of the human microbiome as a reservoir for antibiotic resistance genes.
  19. Classification of Skin Lesions by Topological Data Analysis Alongside With Neural Network (2020)

    Naiereh Elyasi, Mehdi Hosseini Moghadam
    Abstract In this paper we use TDA mapper alongside with deep convolutional neural networks in the classification of 7 major skin diseases. First we apply kepler mapper with neural network as one of its filter steps to classify the dataset HAM10000. Mapper visualizes the classification result by a simplicial complex, where neural network can not do this alone, but as a filter step neural network helps to classify data better. Furthermore we apply TDA mapper and persistent homology to understand the weights of layers of mobilenet network in different training epochs of HAM10000. Also we use persistent diagrams to visualize the results of analysis of layers of mobilenet network.
  20. Optimal Topological Cycles and Their Application in Cardiac Trabeculae Restoration (2017)

    Pengxiang Wu, Chao Chen, Yusu Wang, Shaoting Zhang, Changhe Yuan, Zhen Qian, Dimitris Metaxas, Leon Axel
    Abstract In cardiac image analysis, it is important yet challenging to reconstruct the trabeculae, namely, fine muscle columns whose ends are attached to the ventricular walls. To extract these fine structures, traditional image segmentation methods are insufficient. In this paper, we propose a novel method to jointly detect salient topological handles and compute the optimal representations of them. The detected handles are considered hypothetical trabeculae structures. They are further screened using a classifier and are then included in the final segmentation. We show in experiments the significance of our contribution compared with previous standard segmentation methods without topological priors, as well as with previous topological method in which non-optimal representations of topological handles are used.
  21. Topological Biomarkers for Real-Time Detection of Epileptic Seizures (2022)

    Ximena Fernández, Diego Mateos
    Abstract Automated seizure detection is a fundamental problem in computational neuroscience towards diagnosis and treatment's improvement of epileptic disease. We propose a real-time computational method for automated tracking and detection of epileptic seizures from raw neurophysiological recordings. Our mechanism is based on the topological analysis of the sliding-window embedding of the time series derived from simultaneously recorded channels. We extract topological biomarkers from the signals via the computation of the persistent homology of time-evolving topological spaces. Remarkably, the proposed biomarkers robustly captures the change in the brain dynamics during the ictal state. We apply our methods in different types of signals including scalp and intracranial EEG and MEG, in patients during interictal and ictal states, showing high accuracy in a range of clinical situations.
  22. Topological Descriptors of Histology Images (2014)

    Nikhil Singh, Heather D. Couture, J. S. Marron, Charles Perou, Marc Niethammer
    Abstract The purpose of this study is to investigate architectural characteristics of cell arrangements in breast cancer histology images. We propose the use of topological data analysis to summarize the geometric information inherent in tumor cell arrangements. Our goal is to use this information as signatures that encode robust summaries of cell arrangements in tumor tissue as captured through histology images. In particular, using ideas from algebraic topology we construct topological descriptors based on cell nucleus segmentations such as persistency charts and Betti sequences. We assess their performance on the task of discriminating the breast cancer subtypes Basal, Luminal A, Luminal B and HER2. We demonstrate that the topological features contain useful complementary information to image-appearance based features that can improve discriminatory performance of classifiers.
  23. Unifying Immunology With Informatics and Multiscale Biology (2014)

    Brian A Kidd, Lauren A Peters, Eric E Schadt, Joel T Dudley
    Abstract The immune system is a highly complex and dynamic system. Historically, the most common scientific and clinical practice has been to evaluate its individual components. This kind of approach cannot always expose the interconnecting pathways that control immune-system responses and does not reveal how the immune system works across multiple biological systems and scales. High-throughput technologies can be used to measure thousands of parameters of the immune system at a genome-wide scale. These system-wide surveys yield massive amounts of quantitative data that provide a means to monitor and probe immune-system function. New integrative analyses can help synthesize and transform these data into valuable biological insight. Here we review some of the computational analysis tools for high-dimensional data and how they can be applied to immunology.
  24. Lung Topology Characteristics in Patients With Chronic Obstructive Pulmonary Disease (2018)

    Francisco Belchi, Mariam Pirashvili, Joy Conway, Michael Bennett, Ratko Djukanovic, Jacek Brodzki
    Abstract Quantitative features that can currently be obtained from medical imaging do not provide a complete picture of Chronic Obstructive Pulmonary Disease (COPD). In this paper, we introduce a novel analytical tool based on persistent homology that extracts quantitative features from chest CT scans to describe the geometric structure of the airways inside the lungs. We show that these new radiomic features stratify COPD patients in agreement with the GOLD guidelines for COPD and can distinguish between inspiratory and expiratory scans. These CT measurements are very different to those currently in use and we demonstrate that they convey significant medical information. The results of this study are a proof of concept that topological methods can enhance the standard methodology to create a finer classification of COPD and increase the possibilities of more personalized treatment.
  25. Topological Detection of Alzheimer’s Disease Using Betti Curves (2021)

    Ameer Saadat-Yazdi, Rayna Andreeva, Rik Sarkar
    Abstract Alzheimer’s disease is a debilitating disease in the elderly, and is an increasing burden to the society due to an aging population. In this paper, we apply topological data analysis to structural MRI scans of the brain, and show that topological invariants make accurate predictors for Alzheimer’s. Using the construct of Betti Curves, we first show that topology is a good predictor of Age. Then we develop an approach to factor out the topological signature of age from Betti curves, and thus obtain accurate detection of Alzheimer’s disease. Experimental results show that topological features used with standard classifiers perform comparably to recently developed convolutional neural networks. These results imply that topology is a major aspect of structural changes due to aging and Alzheimer’s. We expect this relation will generate further insights for both early detection and better understanding of the disease.
  26. Persistent Homology for Breast Tumor Classification Using Mammogram Scans (2022)

    Aras Asaad, Dashti Ali, Taban Majeed, Rasber Rashid
    Abstract An Important tool in the field topological data analysis is known as persistent Homology (PH) which is used to encode abstract representation of the homology of data at different resolutions in the form of persistence diagram (PD). In this work we build more than one PD representation of a single image based on a landmark selection method, known as local binary patterns, that encode different types of local textures from images. We employed different PD vectorizations using persistence landscapes, persistence images, persistence binning (Betti Curve) and statistics. We tested the effectiveness of proposed landmark based PH on two publicly available breast abnormality detection datasets using mammogram scans. Sensitivity of landmark based PH obtained is over 90% in both datasets for the detection of abnormal breast scans. Finally, experimental results give new insights on using different types of PD vectorizations which help in utilising PH in conjunction with machine learning classifiers.
  27. Extremal Event Graphs: A (Stable) Tool for Analyzing Noisy Time Series Data (2022)

    Robin Belton, Bree Cummins, Brittany Terese Fasy, Tomáš Gedeon
    Abstract Local maxima and minima, or extremal events, in experimental time series can be used as a coarse summary to characterize data. However, the discrete sampling in recording experimental measurements suggests uncertainty on the true timing of extrema during the experiment. This in turn gives uncertainty in the timing order of extrema within the time series. Motivated by applications in genomic time series and biological network analysis, we construct a weighted directed acyclic graph (DAG) called an extremal event DAG using techniques from persistent homology that is robust to measurement noise. Furthermore, we define a distance between extremal event DAGs based on the edit distance between strings. We prove several properties including local stability for the extremal event DAG distance with respect to pairwise \$L_\\infty\\$ distances between functions in the time series data. Lastly, we provide algorithms, publicly free software, and implementations on extremal event DAG construction and comparison.
  28. Topological Analysis Reveals State Transitions in Human Gut and Marine Bacterial Communities (2020)

    William K. Chang, David VanInsberghe, Libusha Kelly
    Abstract Microbiome dynamics influence the health and functioning of human physiology and the environment and are driven in part by interactions between large numbers of microbial taxa, making large-scale prediction and modeling a challenge. Here, using topological data analysis, we identify states and dynamical features relevant to macroscopic processes. We show that gut disease processes and marine geochemical events are associated with transitions between community states, defined as topological features of the data density. We find a reproducible two-state succession during recovery from cholera in the gut microbiomes of multiple patients, evidence of dynamic stability in the gut microbiome of a healthy human after experiencing diarrhea during travel, and periodic state transitions in a marine Prochlorococcus community driven by water column cycling. Our approach bridges small-scale fluctuations in microbiome composition and large-scale changes in phenotype without details of underlying mechanisms, and provides an assessment of microbiome stability and its relation to human and environmental health.
  29. Extracting Insights From the Shape of Complex Data Using Topology (2013)

    P. Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson
    Abstract This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.
  30. The Accumulated Persistence Function, a New Useful Functional Summary Statistic for Topological Data Analysis, With a View to Brain Artery Trees and Spatial Point Process Applications (2019)

    C.A.N. Biscio, J. Møller
    Abstract We start with a simple introduction to topological data analysis where the most popular tool is called a persistence diagram. Briefly, a persistence diagram is a multiset of points in the plane describing the persistence of topological features of a compact set when a scale parameter varies. Since statistical methods are difficult to apply directly on persistence diagrams, various alternative functional summary statistics have been suggested, but either they do not contain the full information of the persistence diagram or they are two-dimensional functions. We suggest a new functional summary statistic that is one-dimensional and hence easier to handle, and which under mild conditions contains the full information of the persistence diagram. Its usefulness is illustrated in statistical settings concerned with point clouds and brain artery trees. The supplementary materials include additional methods and examples, technical details, and the R code used for all examples. © 2019, © 2019 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
  31. Combining Geometric and Topological Information in Image Segmentation (2019)

    Hengrui Luo, Justin Strait
    Abstract A fundamental problem in computer vision is image segmentation, where the goal is to delineate the boundary of an object in the image. The focus of this work is on the segmentation of grayscale images and its purpose is two-fold. First, we conduct an in-depth study comparing active contour and topology-based methods in a statistical framework, two popular approaches for boundary detection of 2-dimensional images. Certain properties of the image dataset may favor one method over the other, both from an interpretability perspective as well as through evaluation of performance measures. Second, we propose the use of topological knowledge to assist an active contour method, which can potentially incorporate prior shape information. The latter is known to be extremely sensitive to algorithm initialization, and thus, we use a topological model to provide an automatic initialization. In addition, our proposed model can handle objects in images with more complex topological structures, including objects with holes and multiple objects within one image. We demonstrate this on artificially-constructed image datasets from computer vision, as well as real medical image data.
  32. Topological Data Analysis Distinguishes Parameter Regimes in the Anderson-Chaplain Model of Angiogenesis (2021)

    John T. Nardini, Bernadette J. Stolz, Kevin B. Flores, Heather A. Harrington, Helen M. Byrne
    Abstract Angiogenesis is the process by which blood vessels form from pre-existing vessels. It plays a key role in many biological processes, including embryonic development and wound healing, and contributes to many diseases including cancer and rheumatoid arthritis. The structure of the resulting vessel networks determines their ability to deliver nutrients and remove waste products from biological tissues. Here we simulate the Anderson-Chaplain model of angiogenesis at different parameter values and quantify the vessel architectures of the resulting synthetic data. Specifically, we propose a topological data analysis (TDA) pipeline for systematic analysis of the model. TDA is a vibrant and relatively new field of computational mathematics for studying the shape of data. We compute topological and standard descriptors of model simulations generated by different parameter values. We show that TDA of model simulation data stratifies parameter space into regions with similar vessel morphology. The methodologies proposed here are widely applicable to other synthetic and experimental data including wound healing, development, and plant biology.
  33. Inferring COVID-19 Biological Pathways From Clinical Phenotypes via Topological Analysis (2021)

    Negin Karisani, Daniel E. Platt, Saugata Basu, Laxmi Parida
    Abstract COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop tools and models that can aid researchers with this process in a timely manner. However, medical records are often unstructured clinical notes, and this poses significant challenges to developing the automated systems. In this article, we propose a pipeline to aid practitioners in analyzing clinical notes and revealing the pathways associated with this disease. Our pipeline relies on topological properties and consists of three steps: 1) pre-processing the clinical notes to extract the salient concepts, 2) constructing a feature space of the patients to characterize the extracted concepts, and finally, 3) leveraging the topological properties to distill the available knowledge and visualize the result. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.
  34. Topological Analysis of Gene Expression Arrays Identifies High Risk Molecular Subtypes in Breast Cancer (2012)

    Javier Arsuaga, Nils A. Baas, Daniel DeWoskin, Hideaki Mizuno, Aleksandr Pankov, Catherine Park
    Abstract Genomic technologies measure thousands of molecular signals with the goal of understanding complex biological processes. In cancer these molecular signals have been used to characterize disease subtypes, signaling pathways and to identify subsets of patients with specific prognosis. However molecular signals for any disease type are so vast and complex that novel mathematical approaches are required for further analyses. Persistent and computational homology provide a new method for these analyses. In our previous work we presented a new homology-based supervised classification method to identify copy number aberrations from comparative genomic hybridization arrays. In this work we first propose a theoretical framework for our classification method and second we extend our analysis to gene expression data. We analyze a published breast cancer data set and find that that our method can distinguish most, but not all, different breast cancer subtypes. This result suggests that specific relationships between genes, captured by our algorithm, help distinguish between breast cancer subtypes. We propose that topological methods can be used for the classification and clustering of gene expression profiles.
  35. Capturing Shape Information With Multi-Scale Topological Loss Terms For 3D Reconstruction (2022)

    Dominik J. E. Waibel, Scott Atwell, Matthias Meier, Carsten Marr, Bastian Rieck
    Abstract Reconstructing 3D objects from 2D images is both challenging for our brains and machine learning algorithms. To support this spatial reasoning task, contextual information about the overall shape of an object is critical. However, such information is not captured by established loss terms (e.g. Dice loss). We propose to complement geometrical shape information by including multi-scale topological features, such as connected components, cycles, and voids, in the reconstruction loss. Our method uses cubical complexes to calculate topological features of 3D volume data and employs an optimal transport distance to guide the reconstruction process. This topology-aware loss is fully differentiable, computationally efficient, and can be added to any neural network. We demonstrate the utility of our loss by incorporating it into SHAPR, a model for predicting the 3D cell shape of individual cells based on 2D microscopy images. Using a hybrid loss that leverages both geometrical and topological information of single objects to assess their shape, we find that topological information substantially improves the quality of reconstructions, thus highlighting its ability to extract more relevant features from image datasets.
  36. Molecular Phenotyping Using Networks, Diffusion, and Topology: Soft Tissue Sarcoma (2019)

    James C. Mathews, Maryam Pouryahya, Caroline Moosmüller, Yannis G. Kevrekidis, Joseph O. Deasy, Allen Tannenbaum
    Abstract Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
  37. Genomics Data Analysis via Spectral Shape and Topology (2022)

    Erik J. Amézquita, Farzana Nasrin, Kathleen M. Storey, Masato Yoshizawa
    Abstract Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper and differential gene expression. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-SNE. Although Mapper shows promise in analyzing high-dimensional data, building tools to statistically analyze Mapper graphical structures is limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.
  38. Persistent Homology Index as a Robust Quantitative Measure of Immunohistochemical Scoring (2017)

    Akihiro Takiyama, Takashi Teramoto, Hiroaki Suzuki, Katsushige Yamashiro, Shinya Tanaka
    Abstract Immunohistochemical data (IHC) plays an important role in clinical practice, and is typically gathered in a semi-quantitative fashion that relies on some degree of visual scoring. However, visual scoring by a pathologist is inherently subjective and manifests both intra-observer and inter-observer variability. In this study, we introduce a novel computer-aided quantification methodology for immunohistochemical scoring that uses the algebraic concept of persistent homology. Using 8 bit grayscale image data derived from 90 specimens of invasive ductal carcinoma of the breast, stained for the replicative marker Ki-67, we computed homology classes. These were then compared to nuclear grades and the Ki-67 labeling indices obtained by visual scoring. Three metrics for IHC staining were newly defined: Persistent Homology Index (PHI), center coordinates of positive and negative groups, and the sum of squares within groups (WSS). This study demonstrates that PHI, a novel index for immunohistochemical labeling using persistent homology, can produce highly similar data to that generated by a pathologist using visual evaluation. The potential benefits associated with our novel technology include both improved quantification and reproducibility. Since our method reflects cellularity and nuclear atypia, it carries a greater quantity of biologic data compared to conventional evaluation using Ki-67.
  39. Imaging-Based Representation and Stratification of Intra-Tumor Heterogeneity via Tree-Edit Distance (2022)

    Lara Cavinato, Matteo Pegoraro, Alessandra Ragni, Francesca Ieva
    Abstract Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning.
  40. CD8 T-Cell Reactivity to Islet Antigens Is Unique to Type 1 While CD4 T-Cell Reactivity Exists in Both Type 1 and Type 2 Diabetes (2014)

    Ghanashyam Sarikonda, Jeremy Pettus, Sonal Phatak, Sowbarnika Sachithanantham, Jacqueline F. Miller, Johnna D. Wesley, Eithon Cadag, Ji Chae, Lakshmi Ganesan, Ronna Mallios, Steve Edelman, Bjoern Peters, Matthias von Herrath
    Abstract Previous cross-sectional analyses demonstrated that CD8+ and CD4+ T-cell reactivity to islet-specific antigens was more prevalent in T1D subjects than in healthy donors (HD). Here, we examined T1D-associated epitope-specific CD4+ T-cell cytokine production and autoreactive CD8+ T-cell frequency on a monthly basis for one year in 10 HD, 33 subjects with T1D, and 15 subjects with T2D. Autoreactive CD4+ T-cells from both T1D and T2D subjects produced more IFN-γ when stimulated than cells from HD. In contrast, higher frequencies of islet antigen-specific CD8+ T-cells were detected only in T1D. These observations support the hypothesis that general beta-cell stress drives autoreactive CD4+ T-cell activity while islet over-expression of MHC class I commonly seen in T1D mediates amplification of CD8+ T-cells and more rapid beta-cell loss. In conclusion, CD4+ T-cell autoreactivity appears to be present in both T1D and T2D while autoreactive CD8+ T-cells are unique to T1D. Thus, autoreactive CD8+ cells may serve as a more T1D-specific biomarker.
  41. Hepatic Tumor Classification Using Texture and Topology Analysis of Non-Contrast-Enhanced Three-Dimensional T1-Weighted MR Images With a Radiomics Approach (2019)

    Asuka Oyama, Yasuaki Hiraoka, Ippei Obayashi, Yusuke Saikawa, Shigeru Furui, Kenshiro Shiraishi, Shinobu Kumagai, Tatsuya Hayashi, Jun’ichi Kotoku
    Abstract The purpose of this study is to evaluate the accuracy for classification of hepatic tumors by characterization of T1-weighted magnetic resonance (MR) images using two radiomics approaches with machine learning models: texture analysis and topological data analysis using persistent homology. This study assessed non-contrast-enhanced fat-suppressed three-dimensional (3D) T1-weighted images of 150 hepatic tumors. The lesions included 50 hepatocellular carcinomas (HCCs), 50 metastatic tumors (MTs), and 50 hepatic hemangiomas (HHs) found respectively in 37, 23, and 33 patients. For classification, texture features were calculated, and also persistence images of three types (degree 0, degree 1 and degree 2) were obtained for each lesion from the 3D MR imaging data. We used three classification models. In the classification of HCC and MT (resp. HCC and HH, HH and MT), we obtained accuracy of 92% (resp. 90%, 73%) by texture analysis, and the highest accuracy of 85% (resp. 84%, 74%) when degree 1 (resp. degree 1, degree 2) persistence images were used. Our methods using texture analysis or topological data analysis allow for classification of the three hepatic tumors with considerable accuracy, and thus might be useful when applied for computer-aided diagnosis with MR images.
  42. Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations (2016)

    Abdellah Tebani, Carlos Afonso, Stéphane Marret, Soumeya Bekri
    Abstract The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of next-generation diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by high-throughput omics technologies and the computational surge, it enables multi-scale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. High-throughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable high-dimensional data modeling and, therefore, accessible and user-friendly visualization. Furthermore, bioinformatics has enabled comprehensive multi-omics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present state-of-the-art multi-omics data analysis strategies in a clinical context. The challenges of omics-based biomarker translation are discussed. Perspectives regarding the use of multi-omics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the post-genomic era.
  43. Identification of Relevant Genetic Alterations in Cancer Using Topological Data Analysis (2020)

    Raúl Rabadán, Yamina Mohamedi, Udi Rubin, Tim Chu, Adam N. Alghalith, Oliver Elliott, Luis Arnés, Santiago Cal, Álvaro J. Obaya, Arnold J. Levine, Pablo G. Cámara
    Abstract Large-scale cancer genomic studies enable the systematic identification of mutations that lead to the genesis and progression of tumors, uncovering the underlying molecular mechanisms and potential therapies. While some such mutations are recurrently found in many tumors, many others exist solely within a few samples, precluding detection by conventional recurrence-based statistical approaches. Integrated analysis of somatic mutations and RNA expression data across 12 tumor types reveals that mutations of cancer genes are usually accompanied by substantial changes in expression. We use topological data analysis to leverage this observation and uncover 38 elusive candidate cancer-associated genes, including inactivating mutations of the metalloproteinase ADAMTS12 in lung adenocarcinoma. We show that ADAMTS12−/− mice have a five-fold increase in the susceptibility to develop lung tumors, confirming the role of ADAMTS12 as a tumor suppressor gene. Our results demonstrate that data integration through topological techniques can increase our ability to identify previously unreported cancer-related alterations., Rare cancer mutations are often missed using recurrence-based statistical approaches, but are usually accompanied by changes in expression. Here the authors leverage this information to uncover several elusive candidate cancer-associated genes using topological data analysis.
  44. Topology Highlights Mesoscopic Functional Equivalence Between Imagery and Perception: The Case of Hypnotizability (2019)

    Esther Ibáñez-Marcelo, Lisa Campioni, Angkoon Phinyomark, Giovanni Petri, Enrica L. Santarcangelo
    Abstract The functional equivalence (FE) between imagery and perception or motion has been proposed on the basis of neuroimaging evidence of large spatially overlapping activations between real and imagined sensori-motor conditions. However, similar local activation patterns do not imply the same mesoscopic integration of brain regions, which can be described by tools from Topological Data Analysis (TDA). On the basis of behavioral findings, stronger FE has been hypothesized in the individuals with high scores of hypnotizability scores (highs) with respect to low hypnotizable participants (lows) who differ between each other in the proneness to modify memory, perception and behavior according to specific imaginative suggestions. Here we present the first EEG evidence of stronger FE in highs. In fact, persistent homology shows that the highs EEG topological asset during real and imagined sensory conditions is significantly more similar than the lows. As a corollary finding, persistent homology shows lower restructuring of the EEG asset in highs than in lows during both sensory and imagery tasks with respect to basal conditions. Present findings support the view that greater embodiment of mental images may be responsible for the highs greater proneness to respond to sensori-motor suggestions and to report involuntariness in action. In addition, findings indicate hypnotizability-related sensory and cognitive information processing and suggest that the psycho-physiological trait of hypnotizability may modulate more than one aspect of the everyday life.
  45. Nonlinear Dynamic Approaches to Identify Atrial Fibrillation Progression Based on Topological Methods (2019)

    Bahareh Safarbali, Seyed Mohammad Reza Hashemi Golpayegani
    Abstract In recent years, atrial fibrillation (AF) development from paroxysmal to persistent or permanent forms has become an important issue in cardiovascular disorders. Information about AF pattern of presentation (paroxysmal, persistent, or permanent) was useful in the management of algorithms in each category. This management is aimed at reducing symptoms and stopping severe problems associated with AF. AF classification has been based on time duration and episodes until now. In particular, complexity changes in Heart Rate Variation (HRV) may contain clinically relevant signals of imminent systemic dysregulation. A number of nonlinear methods based on phase space and topological properties can give more insight into HRV abnormalities such as fibrillation. Aiming to provide a nonlinear tool to qualitatively classify AF stages, we proposed two geometrical indices (fractal dimension and persistent homology) based on HRV phase space, which can successfully replicate the changes in AF progression. The study population includes 38 lone AF patients and 20 normal subjects, which are collected from the Physio-Bank database. “Time of Life (TOL)” is proposed as a new feature based on the initial and final Čech radius in the persistent homology diagram. A neural network was implemented to prove the effectiveness of both TOL and fractal dimension as classification features. The accuracy of classification performance was 93%. The proposed indices provide a signal representation framework useful to understand the dynamic changes in AF cardiac patterns and to classify normal and pathological rhythms.
  46. Prediction in Cancer Genomics Using Topological Signatures and Machine Learning (2020)

    Georgina Gonzalez, Arina Ushakova, Radmila Sazdanovic, Javier Arsuaga
    Abstract Copy Number Aberrations, gains and losses of genomic regions, are a hallmark of cancer and can be experimentally detected using microarray comparative genomic hybridization (aCGH). In previous works, we developed a topology based method to analyze aCGH data whose output are regions of the genome where copy number is altered in patients with a predetermined cancer phenotype. We call this method Topological Analysis of array CGH (TAaCGH). Here we combine TAaCGH with machine learning techniques to build classifiers using copy number aberrations. We chose logistic regression on two different binary phenotypes related to breast cancer to illustrate this approach. The first case consists of patients with over-expression of the ERBB2 gene. Over-expression of ERBB2 is commonly regulated by a copy number gain in chromosome arm 17q. TAaCGH found the region 17q11-q22 associated with the phenotype and using logistic regression we reduced this region to 17q12-q21.31 correctly classifying 78% of the ERBB2 positive individuals (sensitivity) in a validation data set. We also analyzed over-expression in Estrogen Receptor (ER), a second phenotype commonly observed in breast cancer patients and found that the region 5p14.3-12 together with six full arms were associated with the phenotype. Our method identified 4p, 6p and 16q as the strongest predictors correctly classifying 76% of ER positives in our validation data set. However, for this set there was a significant increase in the false positive rate (specificity). We suggest that topological and machine learning methods can be combined for prediction of phenotypes using genetic data.
  47. Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Functional Data Analysis (2019)

    Lorin Crawford, Anthea Monod, Andrew X. Chen, Sayan Mukherjee, Raúl Rabadán
    Abstract Glioblastoma multiforme (GBM) is an aggressive form of human brain cancer that is under active study in the field of cancer biology. Its rapid progression and the relative time cost of obtaining molecular data make other readily available forms of data, such as images, an important resource for actionable measures in patients. Our goal is to use information given by medical images taken from GBM patients in statistical settings. To do this, we design a novel statistic—the smooth Euler characteristic transform (SECT)—that quantifies magnetic resonance images of tumors. Due to its well-defined inner product structure, the SECT can be used in a wider range of functional and nonparametric modeling approaches than other previously proposed topological summary statistics. When applied to a cohort of GBM patients, we find that the SECT is a better predictor of clinical outcomes than both existing tumor shape quantifications and common molecular assays. Specifically, we demonstrate that SECT features alone explain more of the variance in GBM patient survival than gene expression, volumetric features, and morphometric features. The main takeaways from our findings are thus 2-fold. First, they suggest that images contain valuable information that can play an important role in clinical prognosis and other medical decisions. Second, they show that the SECT is a viable tool for the broader study of medical imaging informatics. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
  48. Two-Tier Mapper, an Unbiased Topology-Based Clustering Method for Enhanced Global Gene Expression Analysis (2019)

    Rachel Jeitziner, Mathieu Carrière, Jacques Rougemont, Steve Oudot, Kathryn Hess, Cathrin Brisken
    Abstract MOTIVATION: Unbiased clustering methods are needed to analyze growing numbers of complex datasets. Currently available clustering methods often depend on parameters that are set by the user, they lack stability, and are not applicable to small datasets. To overcome these shortcomings we used topological data analysis, an emerging field of mathematics that discerns additional feature and discovers hidden insights on datasets and has a wide application range. RESULTS: We have developed a topology-based clustering method called Two-Tier Mapper (TTMap) for enhanced analysis of global gene expression datasets. First, TTMap discerns divergent features in the control group, adjusts for them, and identifies outliers. Second, the deviation of each test sample from the control group in a high-dimensional space is computed, and the test samples are clustered using a new Mapper-based topological algorithm at two levels: a global tier and local tiers. All parameters are either carefully chosen or data-driven, avoiding any user-induced bias. The method is stable, different datasets can be combined for analysis, and significant subgroups can be identified. It outperforms current clustering methods in sensitivity and stability on synthetic and biological datasets, in particular when sample sizes are small; outcome is not affected by removal of control samples, by choice of normalization, or by subselection of data. TTMap is readily applicable to complex, highly variable biological samples and holds promise for personalized medicine. AVAILABILITY AND IMPLEMENTATION: TTMap is supplied as an R package in Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  49. Topology-Based Kernels With Application to Inference Problems in Alzheimer’s Disease (2011)

    Deepti Pachauri, Chris Hinrichs, Moo K. Chung, Sterling C. Johnson, Vikas Singh
    Abstract Alzheimer’s disease (AD) research has recently witnessed a great deal of activity focused on developing new statistical learning tools for automated inference using imaging data. The workhorse for many of these techniques is the Support Vector Machine (SVM) framework (or more generally kernel based methods). Most of these require, as a first step, specification of a kernel matrix between input examples (i.e., images). The inner product between images Ii and Ij in a feature space can generally be written in closed form, and so it is convenient to treat as “given”. However, in certain neuroimaging applications such an assumption becomes problematic. As an example, it is rather challenging to provide a scalar measure of similarity between two instances of highly attributed data such as cortical thickness measures on cortical surfaces. Note that cortical thickness is known to be discriminative for neurological disorders, so leveraging such information in an inference framework, especially within a multi-modal method, is potentially advantageous. But despite being clinically meaningful, relatively few works have successfully exploited this measure for classification or regression. Motivated by these applications, our paper presents novel techniques to compute similarity matrices for such topologically-based attributed data. Our ideas leverage recent developments to characterize signals (e.g., cortical thickness) motivated by the persistence of their topological features, leading to a scheme for simple constructions of kernel matrices. As a proof of principle, on a dataset of 356 subjects from the ADNI study, we report good performance on several statistical inference tasks without any feature selection, dimensionality reduction, or parameter tuning.
  50. Classification of COVID-19 via Homology of CT-SCAN (2021)

    Sohail Iqbal, H. Fareed Ahmed, Talha Qaiser, Muhammad Imran Qureshi, Nasir Rajpoot
    Abstract In this worldwide spread of SARS-CoV-2 (COVID-19) infection, it is of utmost importance to detect the disease at an early stage especially in the hot spots of this epidemic. There are more than 110 Million infected cases on the globe, sofar. Due to its promptness and effective results computed tomography (CT)-scan image is preferred to the reverse-transcription polymerase chain reaction (RT-PCR). Early detection and isolation of the patient is the only possible way of controlling the spread of the disease. Automated analysis of CT-Scans can provide enormous support in this process. In this article, We propose a novel approach to detect SARS-CoV-2 using CT-scan images. Our method is based on a very intuitive and natural idea of analyzing shapes, an attempt to mimic a professional medic. We mainly trace SARS-CoV-2 features by quantifying their topological properties. We primarily use a tool called persistent homology, from Topological Data Analysis (TDA), to compute these topological properties. We train and test our model on the "SARS-CoV-2 CT-scan dataset" i̧tep\soares2020sars\, an open-source dataset, containing 2,481 CT-scans of normal and COVID-19 patients. Our model yielded an overall benchmark F1 score of \$99.42\% \$, accuracy \$99.416\%\$, precision \$99.41\%\$, and recall \$99.42\%\$. The TDA techniques have great potential that can be utilized for efficient and prompt detection of COVID-19. The immense potential of TDA may be exploited in clinics for rapid and safe detection of COVID-19 globally, in particular in the low and middle-income countries where RT-PCR labs and/or kits are in a serious crisis.
  51. Topology Based Data Analysis Identifies a Subgroup of Breast Cancers With a Unique Mutational Profile and Excellent Survival (2011)

    Monica Nicolau, Arnold J. Levine, Gunnar Carlsson
    Abstract High-throughput biological data, whether generated as sequencing, transcriptional microarrays, proteomic, or other means, continues to require analytic methods that address its high dimensional aspects. Because the computational part of data analysis ultimately identifies shape characteristics in the organization of data sets, the mathematics of shape recognition in high dimensions continues to be a crucial part of data analysis. This article introduces a method that extracts information from high-throughput microarray data and, by using topology, provides greater depth of information than current analytic techniques. The method, termed Progression Analysis of Disease (PAD), first identifies robust aspects of cluster analysis, then goes deeper to find a multitude of biologically meaningful shape characteristics in these data. Additionally, because PAD incorporates a visualization tool, it provides a simple picture or graph that can be used to further explore these data. Although PAD can be applied to a wide range of high-throughput data types, it is used here as an example to analyze breast cancer transcriptional data. This identified a unique subgroup of Estrogen Receptor-positive (ER+) breast cancers that express high levels of c-MYB and low levels of innate inflammatory genes. These patients exhibit 100% survival and no metastasis. No supervised step beyond distinction between tumor and healthy patients was used to identify this subtype. The group has a clear and distinct, statistically significant molecular signature, it highlights coherent biology but is invisible to cluster methods, and does not fit into the accepted classification of Luminal A/B, Normal-like subtypes of ER+ breast cancers. We denote the group as c-MYB+ breast cancer.
  52. Topological Data Analysis of Escherichia Coli O157:H7 and Non-O157 Survival in Soils (2014)

    Abasiofiok M. Ibekwe, Jincai Ma, David E. Crowley, Ching-Hong Yang, Alexis M. Johnson, Tanya C. Petrossian, Pek Y. Lum
    Abstract Shiga toxin-producing E. coli O157:H7 and non-O157 have been implicated in many foodborne illnesses caused by the consumption of contaminated fresh produce. However, data on their persistence in soils are limited due to the complexity in datasets generated from different environmental variables and bacterial taxa. There is a continuing need to distinguish the various environmental variables and different bacterial groups to understand the relationships among these factors and the pathogen survival. Using an approach called Topological Data Analysis (TDA); we reconstructed the relationship structure of E. coli O157 and non-O157 survival in 32 soils (16 organic and 16 conventionally managed soils) from California (CA) and Arizona (AZ) with a multi-resolution output. In our study, we took a community approach based on total soil microbiome to study community level survival and examining the network of the community as a whole and the relationship between its topology and biological processes. TDA produces a geometric representation of complex data sets. Network analysis showed that Shiga toxin negative strain E. coli O157:H7 4554 survived significantly longer in comparison to E. coli O157:H7 EDL933, while the survival time of E. coli O157:NM was comparable to that of E. coli O157:H7 strain 933 in all of the tested soils. Two non-O157 strains, E. coli O26:H11 and E. coli O103:H2 survived much longer than E. coli O91:H21 and the three strains of E. coli O157. We show that there are complex interactions between E. coli strain survival, microbial community structures, and soil parameters.
  53. Airway Pathological Heterogeneity in Asthma: Visualization of Disease Microclusters Using Topological Data Analysis (2018)

    Salman Siddiqui, Aarti Shikotra, Matthew Richardson, Emma Doran, David Choy, Alex Bell, Cary D. Austin, Jeffrey Eastham-Anderson, Beverley Hargadon, Joseph R. Arron, Andrew Wardlaw, Christopher E. Brightling, Liam G. Heaney, Peter Bradding
    Abstract Background Asthma is a complex chronic disease underpinned by pathological changes within the airway wall. How variations in structural airway pathology and cellular inflammation contribute to the expression and severity of asthma are poorly understood. Objectives Therefore we evaluated pathological heterogeneity using topological data analysis (TDA) with the aim of visualizing disease clusters and microclusters. Methods A discovery population of 202 adult patients (142 asthmatic patients and 60 healthy subjects) and an external replication population (59 patients with severe asthma) were evaluated. Pathology and gene expression were examined in bronchial biopsy samples. TDA was applied by using pathological variables alone to create pathology-driven visual networks. Results In the discovery cohort TDA identified 4 groups/networks with multiple microclusters/regions of interest that were masked by group-level statistics. Specifically, TDA group 1 consisted of a high proportion of healthy subjects, with a microcluster representing a topological continuum connecting healthy subjects to patients with mild-to-moderate asthma. Three additional TDA groups with moderate-to-severe asthma (Airway Smooth MuscleHigh, Reticular Basement MembraneHigh, and RemodelingLow groups) were identified and contained numerous microclusters with varying pathological and clinical features. Mutually exclusive TH2 and TH17 tissue gene expression signatures were identified in all pathological groups. Discovery and external replication applied to the severe asthma subgroup identified only highly similar “pathological data shapes” through analyses of persistent homology. Conclusions We have identified and replicated novel pathological phenotypes of asthma using TDA. Our methodology is applicable to other complex chronic diseases.
  54. Fast and Accurate Tumor Segmentation of Histology Images Using Persistent Homology and Deep Convolutional Features (2019)

    Talha Qaiser, Yee-Wah Tsang, Daiki Taniyama, Naoya Sakamoto, Kazuaki Nakane, David Epstein, Nasir Rajpoot
    Abstract Tumor segmentation in whole-slide images of histology slides is an important step towards computer-assisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. We propose an efficient way of computing topological persistence of an image, alternative to simplicial homology. The PHPs are devised to distinguish tumor regions from their normal counterparts by modeling the atypical characteristics of tumor nuclei. We propose two variants of our method for tumor segmentation: one that targets speed without compromising accuracy and the other that targets higher accuracy. The fast version is based on a selection of exemplar image patches from a convolution neural network (CNN) and patch classification by quantifying the divergence between the PHPs of exemplars and the input image patch. Detailed comparative evaluation shows that the proposed algorithm is significantly faster than competing algorithms while achieving comparable results. The accurate version combines the PHPs and high-level CNN features and employs a multi-stage ensemble strategy for image patch labeling. Experimental results demonstrate that the combination of PHPs and CNN features outperform competing algorithms. This study is performed on two independently collected colorectal datasets containing adenoma, adenocarcinoma, signet, and healthy cases. Collectively, the accurate tumor segmentation produces the highest average patch-level F1-score, as compared with competing algorithms, on malignant and healthy cases from both the datasets. Overall the proposed framework highlights the utility of persistent homology for histopathology image analysis.
  55. Topological Data Analysis Reveals Robust Alterations in the Whole-Brain and Frontal Lobe Functional Connectomes in Attention-Deficit/Hyperactivity Disorder (2020)

    Zeus Gracia-Tabuenca, Juan Carlos Díaz-Patiño, Isaac Arelio, Sarael Alcauter
    Abstract Visual Abstract \textlessimg class="highwire-fragment fragment-image" alt="Figure" src="https://www.eneuro.org/content/eneuro/7/3/ENEURO.0543-19.2020/F1.medium.gif" width="369" height="440"/\textgreaterDownload figureOpen in new tabDownload powerpoint Attention-deficit/hyperactivity disorder (ADHD) is a developmental disorder characterized by difficulty to control the own behavior. Neuroimaging studies have related ADHD with the interplay of fronto-parietal attention systems with the default mode network (DMN; Castellanos and Aoki, 2016). However, some results have been inconsistent, potentially due to methodological differences in the analytical strategies when defining the brain functional network, i.e., the functional connectivity threshold and/or the brain parcellation scheme. Here, we make use of topological data analysis (TDA) to explore the brain connectome as a function of the filtration value (i.e., the connectivity threshold), instead of using a static connectivity threshold. Specifically, we characterized the transition from all nodes being isolated to being connected into a single component as a function of the filtration value. We explored the utility of such a method to identify differences between 81 children with ADHD (45 male, age: 7.26–17.61 years old) and 96 typically developing children (TDC; 59 male, age: 7.17–17.96 years old), using a public dataset of resting state (rs)fMRI in human subjects. Results were highly congruent when using four different brain segmentations (atlases), and exhibited significant differences for the brain topology of children with ADHD, both at the whole-brain network and the functional subnetwork levels, particularly involving the frontal lobe and the DMN. Therefore, this is a solid approach that complements connectomics-related methods and may contribute to identify the neurophysio-pathology of ADHD.
  56. MRI and Biomechanics Multidimensional Data Analysis Reveals R2 -R1ρ as an Early Predictor of Cartilage Lesion Progression in Knee Osteoarthritis (2017)

    Valentina Pedoia, Jenny Haefeli, Kazuhito Morioka, Hsiang-Ling Teng, Lorenzo Nardo, Richard B. Souza, Adam R. Ferguson, Sharmila Majumdar
    Abstract PURPOSE: To couple quantitative compositional MRI, gait analysis, and machine learning multidimensional data analysis to study osteoarthritis (OA). OA is a multifactorial disorder accompanied by biochemical and morphological changes in the articular cartilage, modulated by skeletal biomechanics and gait. While we can now acquire detailed information about the knee joint structure and function, we are not yet able to leverage the multifactorial factors for diagnosis and disease management of knee OA. MATERIALS AND METHODS: We mapped 178 subjects in a multidimensional space integrating: demographic, clinical information, gait kinematics and kinetics, cartilage compositional T1ρ and T2 and R2 -R1ρ (1/T2 -1/T1ρ ) acquired at 3T and whole-organ magnetic resonance imaging score morphological grading. Topological data analysis (TDA) and Kolmogorov-Smirnov test were adopted for data integration, analysis, and hypothesis generation. Regression models were used for hypothesis testing. RESULTS: The results of the TDA showed a network composed of three main patient subpopulations, thus potentially identifying new phenotypes. T2 and T1ρ values (T2 lateral femur P = 1.45*10-8 , T1ρ medial tibia P = 1.05*10-5 ), the presence of femoral cartilage defects (P = 0.0013), lesions in the meniscus body (P = 0.0035), and race (P = 2.44*10-4 ) were key markers in the subpopulation classification. Within one of the subpopulations we observed an association between the composite metric R2 -R1ρ and the longitudinal progression of cartilage lesions. CONCLUSION: The analysis presented demonstrates some of the complex multitissue biochemical and biomechanical interactions that define joint degeneration and OA using a multidimensional approach, and potentially indicates that R2 -R1ρ may be an imaging biomarker for early OA. LEVEL OF EVIDENCE: 3 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;47:78-90.
  57. Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)

    Nicole Bussola, Bruno Papa, Ombretta Melaiu, Aurora Castellano, Doriana Fruci, Giuseppe Jurman
    Abstract We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the U-Net model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform.
  58. Using Multidimensional Topological Data Analysis to Identify Traits of Hip Osteoarthritis (2018)

    Jasmine Rossi‐deVries, Valentina Pedoia, Michael A. Samaan, Adam R. Ferguson, Richard B. Souza, Sharmila Majumdar
    Abstract Background Osteoarthritis (OA) is a multifaceted disease with many variables affecting diagnosis and progression. Topological data analysis (TDA) is a state-of-the-art big data analytics tool that can combine all variables into multidimensional space. TDA is used to simultaneously analyze imaging and gait analysis techniques. Purpose To identify biochemical and biomechanical biomarkers able to classify different disease progression phenotypes in subjects with and without radiographic signs of hip OA. Study Type Longitudinal study for comparison of progressive and nonprogressive subjects. Population In all, 102 subjects with and without radiographic signs of hip osteoarthritis. Field Strength/Sequence 3T, SPGR 3D MAPSS T1ρ/T2, intermediate-weighted fat-suppressed fast spin-echo (FSE). Assessment Multidimensional data analysis including cartilage composition, bone shape, Kellgren–Lawrence (KL) classification of osteoarthritis, scoring hip osteoarthritis with MRI (SHOMRI), hip disability and osteoarthritis outcome score (HOOS). Statistical Tests Analysis done using TDA, Kolmogorov–Smirnov (KS) testing, and Benjamini-Hochberg to rank P-value results to correct for multiple comparisons. Results Subjects in the later stages of the disease had an increased SHOMRI score (P \textless 0.0001), increased KL (P = 0.0012), and older age (P \textless 0.0001). Subjects in the healthier group showed intact cartilage and less pain. Subjects found between these two groups had a range of symptoms. Analysis of this subgroup identified knee biomechanics (P \textless 0.0001) as an initial marker of the disease that is noticeable before the morphological progression and degeneration. Further analysis of an OA subgroup with femoroacetabular impingement (FAI) showed anterior labral tears to be the most significant marker (P = 0.0017) between those FAI subjects with and without OA symptoms. Data Conclusion The data-driven analysis obtained with TDA proposes new phenotypes of these subjects that partially overlap with the radiographic-based classical disease status classification and also shows the potential for further examination of an early onset biomechanical intervention. Level of Evidence: 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;48:1046–1058.
  59. Uncovering Precision Phenotype-Biomarker Associations in Traumatic Brain Injury Using Topological Data Analysis (2017)

    Jessica L. Nielson, Shelly R. Cooper, John K. Yue, Marco D. Sorani, Tomoo Inoue, Esther L. Yuh, Pratik Mukherjee, Tanya C. Petrossian, Jesse Paquette, Pek Y. Lum, Gunnar E. Carlsson, Mary J. Vassar, Hester F. Lingsma, Wayne A. Gordon, Alex B. Valadka, David O. Okonkwo, Geoffrey T. Manley, Adam R. Ferguson, Track-Tbi Investigators
    Abstract Background Traumatic brain injury (TBI) is a complex disorder that is traditionally stratified based on clinical signs and symptoms. Recent imaging and molecular biomarker innovations provide unprecedented opportunities for improved TBI precision medicine, incorporating patho-anatomical and molecular mechanisms. Complete integration of these diverse data for TBI diagnosis and patient stratification remains an unmet challenge. Methods and findings The Transforming Research and Clinical Knowledge in Traumatic Brain Injury (TRACK-TBI) Pilot multicenter study enrolled 586 acute TBI patients and collected diverse common data elements (TBI-CDEs) across the study population, including imaging, genetics, and clinical outcomes. We then applied topology-based data-driven discovery to identify natural subgroups of patients, based on the TBI-CDEs collected. Our hypothesis was two-fold: 1) A machine learning tool known as topological data analysis (TDA) would reveal data-driven patterns in patient outcomes to identify candidate biomarkers of recovery, and 2) TDA-identified biomarkers would significantly predict patient outcome recovery after TBI using more traditional methods of univariate statistical tests. TDA algorithms organized and mapped the data of TBI patients in multidimensional space, identifying a subset of mild TBI patients with a specific multivariate phenotype associated with unfavorable outcome at 3 and 6 months after injury. Further analyses revealed that this patient subset had high rates of post-traumatic stress disorder (PTSD), and enrichment in several distinct genetic polymorphisms associated with cellular responses to stress and DNA damage (PARP1), and in striatal dopamine processing (ANKK1, COMT, DRD2). Conclusions TDA identified a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms. Machine learning methods such as TDA may provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials in TBI patients. Trial Registration ClinicalTrials.gov Identifier NCT01565551
  60. Continuous Indexing of Fibrosis (CIF): Improving the Assessment and Classification of MPN Patients (2022)

    Hosuk Ryou, Korsuk Sirinukunwattana, Alan Aberdeen, Gillian Grindstaff, Bernadette Stolz, Helen Byrne, Heather A. Harrington, Nikolaos Sousos, Anna L. Godfrey, Claire N. Harrison, Bethan Psaila, Adam J. Mead, Gabrielle Rees, Gareth D. H. Turner, Jens Rittscher, Daniel Royston
    Abstract The detection and grading of fibrosis in myeloproliferative neoplasms (MPN) is an important component of disease classification, prognostication and disease monitoring. However, current fibrosis grading systems are only semi-quantitative and fail to capture sample heterogeneity. To improve the detection, quantitation and representation of reticulin fibrosis, we developed a machine learning (ML) approach using bone marrow trephine (BMT) samples (n = 107) from patients diagnosed with MPN or a reactive / nonneoplastic marrow. The resulting Continuous Indexing of Fibrosis (CIF) enhances the detection and monitoring of fibrosis within BMTs, and aids the discrimination of MPN subtypes. When combined with megakaryocyte feature analysis, CIF discriminates between the frequently challenging differential diagnosis of essential thrombocythemia (ET) and pre-fibrotic myelofibrosis (pre-PMF) with high predictive accuracy [area under the curve = 0.94]. CIF also shows significant promise in the identification of MPN patients at risk of disease progression; analysis of samples from 35 patients diagnosed with ET and enrolled in the Primary Thrombocythemia-1 (PT-1) trial identified features predictive of post-ET myelofibrosis (area under the curve = 0.77). In addition to these clinical applications, automated analysis of fibrosis has clear potential to further refine disease classification boundaries and inform future studies of the micro-environmental factors driving disease initiation and progression in MPN and other stem cell disorders. The image analysis methods used to generate CIF can be readily integrated with those of other key morphological features in MPNs, including megakaryocyte morphology, that lie beyond the scope of conventional histological assessment. Key PointsMachine learning enables an objective and quantitative description of reticulin fibrosis within the bone marrow of patients with myeloproliferative neoplasms (MPN),Automated analysis and Continuous Indexing of Fibrosis (CIF) captures heterogeneity within MPN samples and has utility in refined classification and disease monitoringQuantitative fibrosis assessment combined with topological data analysis may help to predict patients at increased risk of progression to post-ET myelofibrosis, and assist in the discrimination of ET and pre-fibrotic PMF (pre-PMF)
  61. Segmentation of Biomedical Images by a Computational Topology Framework (2017)

    Rodrigo Rojas Moraleda, Wei Xiong, Niels Halama, Katja Breitkopf-Heinlein, Steven Steven, Luis Salinas, Dieter W. Heermann, Nektarios A. Valous
    Abstract The segmentation of cell nuclei is an important step towards the automated analysis of histological images. The presence of a large number of nuclei in whole-slide images necessitates methods that are computationally tractable in addition to being effective. In this work, a method is developed for the robust segmentation of cell nuclei in histological images based on the principles of persistent homology. More specifically, an abstract simplicial homology approach for image segmentation is established. Essentially, the approach deals with the persistence of disconnected sets in the image, thus identifying salient regions that express patterns of persistence. By introducing an image representation based on topological features, the task of segmentation is less dependent on variations of color or texture. This results in a novel approach that generalizes well and provides stable performance. The method conceptualizes regions of interest (cell nuclei) pertinent to their topological features in a successful manner. The time cost of the proposed approach is lower-bounded by an almost linear behavior and upper-bounded by O(n2) in a worst-case scenario. Time complexity matches a quasilinear behavior which is O(n1+ɛ) for ε \textless 1. Images acquired from histological sections of liver tissue are used as a case study to demonstrate the effectiveness of the approach. The histological landscape consists of hepatocytes and non-parenchymal cells. The accuracy of the proposed methodology is verified against an automated workflow created by the output of a conventional filter bank (validated by experts) and the supervised training of a random forest classifier. The results are obtained on a per-object basis. The proposed workflow successfully detected both hepatocyte and non-parenchymal cell nuclei with an accuracy of 84.6%, and hepatocyte cell nuclei only with an accuracy of 86.2%. A public histological dataset with supplied ground-truth data is also used for evaluating the performance of the proposed approach (accuracy: 94.5%). Further validations are carried out with a publicly available dataset and ground-truth data from the Gland Segmentation in Colon Histology Images Challenge (GlaS) contest. The proposed method is useful for obtaining unsupervised robust initial segmentations that can be further integrated in image/data processing and management pipelines. The development of a fully automated system supporting a human expert provides tangible benefits in the context of clinical decision-making.
  62. Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology (2015)

    Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, Catherine Park
    Abstract DNA copy number aberrations (CNAs) are of biological and medical interest because they help identify regulatory mechanisms underlying tumor initiation and evolution. Identification of tumor-driving CNAs (driver CNAs) however remains a challenging task, because they are frequently hidden by CNAs that are the product of random events that take place during tumor evolution. Experimental detection of CNAs is commonly accomplished through array comparative genomic hybridization (aCGH) assays followed by supervised and/or unsupervised statistical methods that combine the segmented profiles of all patients to identify driver CNAs. Here, we extend a previously-presented supervised algorithm for the identification of CNAs that is based on a topological representation of the data. Our method associates a two-dimensional (2D) point cloud with each aCGH profile and generates a sequence of simplicial complexes, mathematical objects that generalize the concept of a graph. This representation of the data permits segmenting the data at different resolutions and identifying CNAs by interrogating the topological properties of these simplicial complexes. We tested our approach on a published dataset with the goal of identifying specific breast cancer CNAs associated with specific molecular subtypes. Identification of CNAs associated with each subtype was performed by analyzing each subtype separately from the others and by taking the rest of the subtypes as the control. Our results found a new amplification in 11q at the location of the progesterone receptor in the Luminal A subtype. Aberrations in the Luminal B subtype were found only upon removal of the basal-like subtype from the control set. Under those conditions, all regions found in the original publication, except for 17q, were confirmed; all aberrations, except those in chromosome arms 8q and 12q were confirmed in the basal-like subtype. These two chromosome arms, however, were detected only upon removal of three patients with exceedingly large copy number values. More importantly, we detected 10 and 21 additional regions in the Luminal B and basal-like subtypes, respectively. Most of the additional regions were either validated on an independent dataset and/or using GISTIC. Furthermore, we found three new CNAs in the basal-like subtype: a combination of gains and losses in 1p, a gain in 2p and a loss in 14q. Based on these results, we suggest that topological approaches that incorporate multiresolution analyses and that interrogate topological properties of the data can help in the identification of copy number changes in cancer.