43 resultados para Penalized regression
Resumo:
Increasingly semiconductor manufacturers are exploring opportunities for virtual metrology (VM) enabled process monitoring and control as a means of reducing non-value added metrology and achieving ever more demanding wafer fabrication tolerances. However, developing robust, reliable and interpretable VM models can be very challenging due to the highly correlated input space often associated with the underpinning data sets. A particularly pertinent example is etch rate prediction of plasma etch processes from multichannel optical emission spectroscopy data. This paper proposes a novel input-clustering based forward stepwise regression methodology for VM model building in such highly correlated input spaces. Max Separation Clustering (MSC) is employed as a pre-processing step to identify a reduced srt of well-conditioned, representative variables that can then be used as inputs to state-of-the-art model building techniques such as Forward Selection Regression (FSR), Ridge regression, LASSO and Forward Selection Ridge Regression (FCRR). The methodology is validated on a benchmark semiconductor plasma etch dataset and the results obtained are compared with those achieved when the state-of-art approaches are applied directly to the data without the MSC pre-processing step. Significant performance improvements are observed when MSC is combined with FSR (13%) and FSRR (8.5%), but not with Ridge Regression (-1%) or LASSO (-32%). The optimal VM results are obtained using the MSC-FSR and MSC-FSRR generated models. © 2012 IEEE.
Resumo:
In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.
Forward Stepwise Ridge Regression (FSRR) based variable selection for highly correlated input spaces
Resumo:
Virtual metrology (VM) aims to predict metrology values using sensor data from production equipment and physical metrology values of preceding samples. VM is a promising technology for the semiconductor manufacturing industry as it can reduce the frequency of in-line metrology operations and provide supportive information for other operations such as fault detection, predictive maintenance and run-to-run control. The prediction models for VM can be from a large variety of linear and nonlinear regression methods and the selection of a proper regression method for a specific VM problem is not straightforward, especially when the candidate predictor set is of high dimension, correlated and noisy. Using process data from a benchmark semiconductor manufacturing process, this paper evaluates the performance of four typical regression methods for VM: multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), neural networks (NN) and Gaussian process regression (GPR). It is observed that GPR performs the best among the four methods and that, remarkably, the performance of linear regression approaches that of GPR as the subset of selected input variables is increased. The observed competitiveness of high-dimensional linear regression models, which does not hold true in general, is explained in the context of extreme learning machines and functional link neural networks.
Resumo:
A forward and backward least angle regression (LAR) algorithm is proposed to construct the nonlinear autoregressive model with exogenous inputs (NARX) that is widely used to describe a large class of nonlinear dynamic systems. The main objective of this paper is to improve model sparsity and generalization performance of the original forward LAR algorithm. This is achieved by introducing a replacement scheme using an additional backward LAR stage. The backward stage replaces insignificant model terms selected by forward LAR with more significant ones, leading to an improved model in terms of the model compactness and performance. A numerical example to construct four types of NARX models, namely polynomials, radial basis function (RBF) networks, neuro fuzzy and wavelet networks, is presented to illustrate the effectiveness of the proposed technique in comparison with some popular methods.
Resumo:
In many applications, and especially those where batch processes are involved, a target scalar output of interest is often dependent on one or more time series of data. With the exponential growth in data logging in modern industries such time series are increasingly available for statistical modeling in soft sensing applications. In order to exploit time series data for predictive modelling, it is necessary to summarise the information they contain as a set of features to use as model regressors. Typically this is done in an unsupervised fashion using simple techniques such as computing statistical moments, principal components or wavelet decompositions, often leading to significant information loss and hence suboptimal predictive models. In this paper, a functional learning paradigm is exploited in a supervised fashion to derive continuous, smooth estimates of time series data (yielding aggregated local information), while simultaneously estimating a continuous shape function yielding optimal predictions. The proposed Supervised Aggregative Feature Extraction (SAFE) methodology can be extended to support nonlinear predictive models by embedding the functional learning framework in a Reproducing Kernel Hilbert Spaces setting. SAFE has a number of attractive features including closed form solution and the ability to explicitly incorporate first and second order derivative information. Using simulation studies and a practical semiconductor manufacturing case study we highlight the strengths of the new methodology with respect to standard unsupervised feature extraction approaches.
Resumo:
Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.
Resumo:
Background: Around 10-15% of patients with locally advanced rectal cancer (LARC) undergo a pathologically complete response (TRG4) to neoadjuvant chemoradiotherapy; the rest of patients exhibit a spectrum of tumour regression (TRG1-3). Understanding therapy-related genomic alterations may help us to identify underlying biology or novel targets associated with response that could increase the efficacy of therapy in patients that do not benefit from the current standard of care.
Methods: 48 FFPE rectal cancer biopsies and matched resections were analysed using the WG-DASL HumanHT-12_v4 Beadchip array on the illumina iScan. Bioinformatic analysis was conducted in Partek genomics suite and R studio. Limma and glmnet packages were used to identify genes differentially expressed between tumour regression grades. Validation of microarray results will be carried out using IHC, RNAscope and RT-PCR.
Results: Immune response genes were observed from supervised analysis of the biopsies which may have predictive value. Differential gene expression from the resections as well as pre and post therapy analysis revealed induction of genes in a tumour regression dependent manner. Pathway mapping and Gene Ontology analysis of these genes suggested antigen processing and natural killer mediated cytotoxicity respectively. The natural killer-like gene signature was switched off in non-responders and on in the responders. IHC has confirmed the presence of Natural killer cells through CD56+ staining.
Conclusion: Identification of NK cell genes and CD56+ cells in patients responding to neoadjuvant chemoradiotherapy warrants further investigation into their association with tumour regression grade in LARC. NK cells are known to lyse malignant cells and determining whether their presence is a cause or consequence of response is crucial. Interrogation of the cytokines upregulated in our NK-like signature will help guide future in vitro models.
Resumo:
Histone deacetylases (HDACs) are enzymes involved in transcriptional repression. We aimed to examine the significance of HDAC1 and HDAC2 gene expression in the prediction of recurrence and survival in 156 patients with hepatocellular carcinoma (HCC) among a South East Asian population who underwent curative surgical resection in Singapore. We found that HDAC1 and HDAC2 were upregulated in the majority of HCC tissues. The presence of HDAC1 in tumor tissues was correlated with poor tumor differentiation. Notably, HDAC1 expression in adjacent non-tumor hepatic tissues was correlated with the presence of satellite nodules and multiple lesions, suggesting that HDAC1 upregulation within the field of HCC may contribute to tumor spread. Using competing risk regression analysis, we found that increased cancer-specific mortality was significantly associated with HDAC2 expression. Mortality was also increased with high HDAC1 expression. In the liver cancer cell lines, HEP3B, HEPG2, PLC5, and a colorectal cancer cell line, HCT116, the combined knockdown of HDAC1 and HDAC2 increased cell death and reduced cell proliferation as well as colony formation. In contrast, knockdown of either HDAC1 or HDAC2 alone had minimal effects on cell death and proliferation. Taken together, our study suggests that both HDAC1 and HDAC2 exert pro-survival effects in HCC cells, and the combination of isoform-specific HDAC inhibitors against both HDACs may be effective in targeting HCC to reduce mortality.
Characterising granuloma regression and liver recovery in a murine model of schistosomiasis japonica
Resumo:
For hepatic schistosomiasis the egg-induced granulomatous response and the development of extensive fibrosis are the main pathologies. We used a Schistosoma japonicum-infected mouse model to characterise the multi-cellular pathways associated with the recovery from hepatic fibrosis following clearance of the infection with the anti-schistosomal drug, praziquantel. In the recovering liver splenomegaly, granuloma density and liver fibrosis were all reduced. Inflammatory cell infiltration into the liver was evident, and the numbers of neutrophils, eosinophils and macrophages were significantly decreased. Transcriptomic analysis revealed the up-regulation of fatty acid metabolism genes and the identification of Peroxisome proliferator activated receptor alpha as the upstream regulator of liver recovery. The aryl hydrocarbon receptor signalling pathway which regulates xenobiotic metabolism was also differentially up-regulated. These findings provide a better understanding of the mechanisms associated with the regression of hepatic schistosomiasis.