46 resultados para quantile regression


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Plasma etch is a key process in modern semiconductor manufacturing facilities as it offers process simplification and yet greater dimensional tolerances compared to wet chemical etch technology. The main challenge of operating plasma etchers is to maintain a consistent etch rate spatially and temporally for a given wafer and for successive wafers processed in the same etch tool. Etch rate measurements require expensive metrology steps and therefore in general only limited sampling is performed. Furthermore, the results of measurements are not accessible in real-time, limiting the options for run-to-run control. This paper investigates a Virtual Metrology (VM) enabled Dynamic Sampling (DS) methodology as an alternative paradigm for balancing the need to reduce costly metrology with the need to measure more frequently and in a timely fashion to enable wafer-to-wafer control. Using a Gaussian Process Regression (GPR) VM model for etch rate estimation of a plasma etch process, the proposed dynamic sampling methodology is demonstrated and evaluated for a number of different predictive dynamic sampling rules. © 2013 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Increasingly semiconductor manufacturers are exploring opportunities for virtual metrology (VM) enabled process monitoring and control as a means of reducing non-value added metrology and achieving ever more demanding wafer fabrication tolerances. However, developing robust, reliable and interpretable VM models can be very challenging due to the highly correlated input space often associated with the underpinning data sets. A particularly pertinent example is etch rate prediction of plasma etch processes from multichannel optical emission spectroscopy data. This paper proposes a novel input-clustering based forward stepwise regression methodology for VM model building in such highly correlated input spaces. Max Separation Clustering (MSC) is employed as a pre-processing step to identify a reduced srt of well-conditioned, representative variables that can then be used as inputs to state-of-the-art model building techniques such as Forward Selection Regression (FSR), Ridge regression, LASSO and Forward Selection Ridge Regression (FCRR). The methodology is validated on a benchmark semiconductor plasma etch dataset and the results obtained are compared with those achieved when the state-of-art approaches are applied directly to the data without the MSC pre-processing step. Significant performance improvements are observed when MSC is combined with FSR (13%) and FSRR (8.5%), but not with Ridge Regression (-1%) or LASSO (-32%). The optimal VM results are obtained using the MSC-FSR and MSC-FSRR generated models. © 2012 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Virtual metrology (VM) aims to predict metrology values using sensor data from production equipment and physical metrology values of preceding samples. VM is a promising technology for the semiconductor manufacturing industry as it can reduce the frequency of in-line metrology operations and provide supportive information for other operations such as fault detection, predictive maintenance and run-to-run control. The prediction models for VM can be from a large variety of linear and nonlinear regression methods and the selection of a proper regression method for a specific VM problem is not straightforward, especially when the candidate predictor set is of high dimension, correlated and noisy. Using process data from a benchmark semiconductor manufacturing process, this paper evaluates the performance of four typical regression methods for VM: multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), neural networks (NN) and Gaussian process regression (GPR). It is observed that GPR performs the best among the four methods and that, remarkably, the performance of linear regression approaches that of GPR as the subset of selected input variables is increased. The observed competitiveness of high-dimensional linear regression models, which does not hold true in general, is explained in the context of extreme learning machines and functional link neural networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A forward and backward least angle regression (LAR) algorithm is proposed to construct the nonlinear autoregressive model with exogenous inputs (NARX) that is widely used to describe a large class of nonlinear dynamic systems. The main objective of this paper is to improve model sparsity and generalization performance of the original forward LAR algorithm. This is achieved by introducing a replacement scheme using an additional backward LAR stage. The backward stage replaces insignificant model terms selected by forward LAR with more significant ones, leading to an improved model in terms of the model compactness and performance. A numerical example to construct four types of NARX models, namely polynomials, radial basis function (RBF) networks, neuro fuzzy and wavelet networks, is presented to illustrate the effectiveness of the proposed technique in comparison with some popular methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Quantile normalization (QN) is a technique for microarray data processing and is the default normalization method in the Robust Multi-array Average (RMA) procedure, which was primarily designed for analysing gene expression data from Affymetrix arrays. Given the abundance of Affymetrix microarrays and the popularity of the RMA method, it is crucially important that the normalization procedure is applied appropriately. In this study we carried out simulation experiments and also analysed real microarray data to investigate the suitability of RMA when it is applied to dataset with different groups of biological samples. From our experiments, we showed that RMA with QN does not preserve the biological signal included in each group, but rather it would mix the signals between the groups. We also showed that the Median Polish method in the summarization step of RMA has similar mixing effect. RMA is one of the most widely used methods in microarray data processing and has been applied to a vast volume of data in biomedical research. The problematic behaviour of this method suggests that previous studies employing RMA could have been misadvised or adversely affected. Therefore we think it is crucially important that the research community recognizes the issue and starts to address it. The two core elements of the RMA method, quantile normalization and Median Polish, both have the undesirable effects of mixing biological signals between different sample groups, which can be detrimental to drawing valid biological conclusions and to any subsequent analyses. Based on the evidence presented here and that in the literature, we recommend exercising caution when using RMA as a method of processing microarray gene expression data, particularly in situations where there are likely to be unknown subgroups of samples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In many applications, and especially those where batch processes are involved, a target scalar output of interest is often dependent on one or more time series of data. With the exponential growth in data logging in modern industries such time series are increasingly available for statistical modeling in soft sensing applications. In order to exploit time series data for predictive modelling, it is necessary to summarise the information they contain as a set of features to use as model regressors. Typically this is done in an unsupervised fashion using simple techniques such as computing statistical moments, principal components or wavelet decompositions, often leading to significant information loss and hence suboptimal predictive models. In this paper, a functional learning paradigm is exploited in a supervised fashion to derive continuous, smooth estimates of time series data (yielding aggregated local information), while simultaneously estimating a continuous shape function yielding optimal predictions. The proposed Supervised Aggregative Feature Extraction (SAFE) methodology can be extended to support nonlinear predictive models by embedding the functional learning framework in a Reproducing Kernel Hilbert Spaces setting. SAFE has a number of attractive features including closed form solution and the ability to explicitly incorporate first and second order derivative information. Using simulation studies and a practical semiconductor manufacturing case study we highlight the strengths of the new methodology with respect to standard unsupervised feature extraction approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Around 10-15% of patients with locally advanced rectal cancer (LARC) undergo a pathologically complete response (TRG4) to neoadjuvant chemoradiotherapy; the rest of patients exhibit a spectrum of tumour regression (TRG1-3). Understanding therapy-related genomic alterations may help us to identify underlying biology or novel targets associated with response that could increase the efficacy of therapy in patients that do not benefit from the current standard of care.
Methods: 48 FFPE rectal cancer biopsies and matched resections were analysed using the WG-DASL HumanHT-12_v4 Beadchip array on the illumina iScan. Bioinformatic analysis was conducted in Partek genomics suite and R studio. Limma and glmnet packages were used to identify genes differentially expressed between tumour regression grades. Validation of microarray results will be carried out using IHC, RNAscope and RT-PCR.
Results: Immune response genes were observed from supervised analysis of the biopsies which may have predictive value. Differential gene expression from the resections as well as pre and post therapy analysis revealed induction of genes in a tumour regression dependent manner. Pathway mapping and Gene Ontology analysis of these genes suggested antigen processing and natural killer mediated cytotoxicity respectively. The natural killer-like gene signature was switched off in non-responders and on in the responders. IHC has confirmed the presence of Natural killer cells through CD56+ staining.
Conclusion: Identification of NK cell genes and CD56+ cells in patients responding to neoadjuvant chemoradiotherapy warrants further investigation into their association with tumour regression grade in LARC. NK cells are known to lyse malignant cells and determining whether their presence is a cause or consequence of response is crucial. Interrogation of the cytokines upregulated in our NK-like signature will help guide future in vitro models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Histone deacetylases (HDACs) are enzymes involved in transcriptional repression. We aimed to examine the significance of HDAC1 and HDAC2 gene expression in the prediction of recurrence and survival in 156 patients with hepatocellular carcinoma (HCC) among a South East Asian population who underwent curative surgical resection in Singapore. We found that HDAC1 and HDAC2 were upregulated in the majority of HCC tissues. The presence of HDAC1 in tumor tissues was correlated with poor tumor differentiation. Notably, HDAC1 expression in adjacent non-tumor hepatic tissues was correlated with the presence of satellite nodules and multiple lesions, suggesting that HDAC1 upregulation within the field of HCC may contribute to tumor spread. Using competing risk regression analysis, we found that increased cancer-specific mortality was significantly associated with HDAC2 expression. Mortality was also increased with high HDAC1 expression. In the liver cancer cell lines, HEP3B, HEPG2, PLC5, and a colorectal cancer cell line, HCT116, the combined knockdown of HDAC1 and HDAC2 increased cell death and reduced cell proliferation as well as colony formation. In contrast, knockdown of either HDAC1 or HDAC2 alone had minimal effects on cell death and proliferation. Taken together, our study suggests that both HDAC1 and HDAC2 exert pro-survival effects in HCC cells, and the combination of isoform-specific HDAC inhibitors against both HDACs may be effective in targeting HCC to reduce mortality.