Rule Generation for Classifying SLT Failed Parts (2022)

Ho-Chieh Hsu, Cheng-Che Lu, Shih-Wei Wang, Kelly Jones, Kai-Chiang Wu, Mango C.-T. Chao

Abstract

System-level test (SLT) has recently gained visibility when integrated circuits become harder and harder to be fully tested due to increasing transistor density and circuit design complexity. Albeit SLT is effective for reducing test escapes, little diagnostic information can be obtained for product improvement. In this paper, we propose an unsupervised learning (UL) method to resolve the aforementioned issue by discovering correlative, potentially systematic defects during the SLT phase. Toward this end, HDBSCAN [1] is used for clustering SLT failed devices in a low-dimensional space created by UMAP [2]. Decision trees are subsequently applied to explain the HDBSCAN results based on generating explainable quantitative rules, e.g., inequality constraints, providing domain experts additional information for advanced diagnosis. Experiments on industrial data demonstrate that the proposed methodology can effectively cluster SLT failed devices and then explain the clustering results with a promising accuracy of above 90%. Our methodology is also scalable and fast, requiring two to five orders of magnitude lower runtime than the method presented in [3].

A Data-Driven Workflow for Evaporation Performance Degradation Analysis: A Full-Scale Case Study in the Herbal Medicine Manufacturing Industry (2023)

Sheng Zhang, Xinyuan Xie, Haibin Qu

Abstract

The evaporation process is a common step in herbal medicine manufacturing and often lasts for a long time. The degradation of evaporation performance is inevitable, leading to more consumption of steam and electricity, and it may also have an impact on the content of thermosensitive components. Recently, a vast amount of evaporation process data is collected with the aid of industrial information systems, and process knowledge is hidden behind the data. But currently, these data are seldom deeply analyzed. In this work, an exploratory data analysis workflow is proposed to evaluate the evaporation performance and to identify the root causes of the performance degradation. The workflow consists of 6 steps: data collecting, preprocessing, characteristic stage identification, feature extraction, model development and interpretation, and decision making. In the model development and interpretation step, the workflow employs the HDBSCAN clustering algorithm for data annotation and then uses the ccPCA method to compare the differences between clusters for root cause analysis. A full-scale case is presented to verify the effectiveness of the workflow. The evaporation process data of 192 batches in 2018 were collected in the case. Through the steps of the workflow, the features of each batch were extracted, and the batches were clustered into 6 groups. The root causes of the performance degradation were determined as the high Pv,II and high LI by ccPCA. Recommended suggestions for future manufacturing were given according to the results. The proposed workflow can determine the root causes of the evaporation performance degradation.

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

Hongxiao Wang, Gang Huang, Zhuo Zhao, Liang Cheng, Anna Juncker-Jensen, Máté Levente Nagy, Xin Lu, Xiangliang Zhang, Danny Z. Chen

Abstract

Pathology images contain rich information of cell appearance, microenvironment, and topology features for cancer analysis and diagnosis. Among such features, topology becomes increasingly important in analysis for cancer immunotherapy. By analyzing geometric and hierarchically structured cell distribution topology, oncologists can identify densely-packed and cancer-relevant cell communities (CCs) for making decisions. Compared to commonly-used pixel-level Convolution Neural Network (CNN) features and cell-instance-level Graph Neural Network (GNN) features, CC topology features are at a higher level of granularity and geometry. However, topological features have not been well exploited by recent deep learning (DL) methods for pathology image classification due to lack of effective topological descriptors for cell distribution and gathering patterns. In this paper, inspired by clinical practice, we analyze and classify pathology images by comprehensively learning cell appearance, microenvironment, and topology in a fine-to-coarse manner. To describe and exploit topology, we design Cell Community Forest (CCF), a novel graph that represents the hierarchical formulation process of big-sparse CCs from small-dense CCs. Using CCF as a new geometric topological descriptor of tumor cells in pathology images, we propose CCF-GNN, a GNN model that successively aggregates heterogeneous features (e.g., appearance, microenvironment) from cell-instance-level, cell-community-level, into image-level for pathology image classification. Extensive cross-validation experiments show that our method significantly outperforms alternative methods on H&E-stained; immunofluorescence images for disease grading tasks with multiple cancer types. Our proposed CCF-GNN establishes a new topological data analysis (TDA) based method, which facilitates integrating multi-level heterogeneous features of point clouds (e.g., for cells) into a unified DL framework.

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)

Nicole Bussola, Bruno Papa, Ombretta Melaiu, Aurora Castellano, Doriana Fruci, Giuseppe Jurman

Abstract

We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the U-Net model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform.

🍩 Database of Original & Non-Theoretical Uses of Topology

Rule Generation for Classifying SLT Failed Parts (2022)

A Data-Driven Workflow for Evaporation Performance Degradation Analysis: A Full-Scale Case Study in the Herbal Medicine Manufacturing Industry (2023)

CCF-GNN: A Unified Model Aggregating Appearance, Microenvironment, and Topology for Pathology Image Classification (2023)

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology (2021)