963 resultados para predictive regression
Resumo:
Land-use regression (LUR) is a technique that can improve the accuracy of air pollution exposure assessment in epidemiological studies. Most LUR models are developed for single cities, which places limitations on their applicability to other locations. We sought to develop a model to predict nitrogen dioxide (NO2) concentrations with national coverage of Australia by using satellite observations of tropospheric NO2 columns combined with other predictor variables. We used a generalised estimating equation (GEE) model to predict annual and monthly average ambient NO2 concentrations measured by a national monitoring network from 2006 through 2011. The best annual model explained 81% of spatial variation in NO2 (absolute RMS error=1.4 ppb), while the best monthly model explained 76% (absolute RMS error=1.9 ppb). We applied our models to predict NO2 concentrations at the ~350,000 census mesh blocks across the country (a mesh block is the smallest spatial unit in the Australian census). National population-weighted average concentrations ranged from 7.3 ppb (2006) to 6.3 ppb (2011). We found that a simple approach using tropospheric NO2 column data yielded models with slightly better predictive ability than those produced using a more involved approach that required simulation of surface-to-column ratios. The models were capable of capturing within-urban variability in NO2, and offer the ability to estimate ambient NO2 concentrations at monthly and annual time scales across Australia from 2006–2011. We are making our model predictions freely available for research.
Resumo:
The functions of the volunteer functions inventory were combined with the constructs of the theory of planned behaviour (i.e., attitudes, subjective norms, and perceived behavioural control) to establish whether a stronger, single explanatory model prevailed. Undertaken in the context of episodic, skilled volunteering by individuals who were retired or approaching retirement (N = 186), the research advances on prior studies which either examined the predictive capacity of each model independently or compared their explanatory value. Using hierarchical regression analysis, the functions of the volunteer functions inventory (when controlling for demographic variables) explained an additional 7.0% of variability in individuals’ willingness to volunteer over and above that accounted for by the theory of planned behaviour. Significant predictors in the final model included attitudes, subjective norms and perceived behavioural control from the theory of planned behaviour and the understanding function from the volunteer functions inventory. It is proposed that the items comprising the understanding function may represent a deeper psychological construct (e.g., self-actualisation) not accounted for by the theory of planned behaviour. The findings highlight the potential benefit of combining these two prominent models in terms of improving understanding of volunteerism and providing a single parsimonious model for raising rates of this important behaviour.
Resumo:
There are numerous load estimation methods available, some of which are captured in various online tools. However, most estimators are subject to large biases statistically, and their associated uncertainties are often not reported. This makes interpretation difficult and the estimation of trends or determination of optimal sampling regimes impossible to assess. In this paper, we first propose two indices for measuring the extent of sampling bias, and then provide steps for obtaining reliable load estimates by minimizing the biases and making use of possible predictive variables. The load estimation procedure can be summarized by the following four steps: - (i) output the flow rates at regular time intervals (e.g. 10 minutes) using a time series model that captures all the peak flows; - (ii) output the predicted flow rates as in (i) at the concentration sampling times, if the corresponding flow rates are not collected; - (iii) establish a predictive model for the concentration data, which incorporates all possible predictor variables and output the predicted concentrations at the regular time intervals as in (i), and; - (iv) obtain the sum of all the products of the predicted flow and the predicted concentration over the regular time intervals to represent an estimate of the load. The key step to this approach is in the development of an appropriate predictive model for concentration. This is achieved using a generalized regression (rating-curve) approach with additional predictors that capture unique features in the flow data, namely the concept of the first flush, the location of the event on the hydrograph (e.g. rise or fall) and cumulative discounted flow. The latter may be thought of as a measure of constituent exhaustion occurring during flood events. The model also has the capacity to accommodate autocorrelation in model errors which are the result of intensive sampling during floods. Incorporating this additional information can significantly improve the predictability of concentration, and ultimately the precision with which the pollutant load is estimated. We also provide a measure of the standard error of the load estimate which incorporates model, spatial and/or temporal errors. This method also has the capacity to incorporate measurement error incurred through the sampling of flow. We illustrate this approach using the concentrations of total suspended sediment (TSS) and nitrogen oxide (NOx) and gauged flow data from the Burdekin River, a catchment delivering to the Great Barrier Reef. The sampling biases for NOx concentrations range from 2 to 10 times indicating severe biases. As we expect, the traditional average and extrapolation methods produce much higher estimates than those when bias in sampling is taken into account.
Resumo:
The low predictive power of implied volatility in forecasting the subsequently realized volatility is a well-documented empirical puzzle. As suggested by e.g. Feinstein (1989), Jackwerth and Rubinstein (1996), and Bates (1997), we test whether unrealized expectations of jumps in volatility could explain this phenomenon. Our findings show that expectations of infrequently occurring jumps in volatility are indeed priced in implied volatility. This has two important consequences. First, implied volatility is actually expected to exceed realized volatility over long periods of time only to be greatly less than realized volatility during infrequently occurring periods of very high volatility. Second, the slope coefficient in the classic forecasting regression of realized volatility on implied volatility is very sensitive to the discrepancy between ex ante expected and ex post realized jump frequencies. If the in-sample frequency of positive volatility jumps is lower than ex ante assessed by the market, the classic regression test tends to reject the hypothesis of informational efficiency even if markets are informationally effective.
Resumo:
[EN] Based on an extensive theoretical review, the aim of this paper is to carry out a closer examination of the differences between exporters according to their commitment to the international market. Once the main disparities are identified by means of a non-parametric test, a logistic analysis based upon data collected from small and medium sized manufacturing firms is conducted in order to construct a classificatory model.
Resumo:
Background: Limited information is available about predictors of short-term outcomes in patients with exacerbation of chronic obstructive pulmonary disease (eCOPD) attending an emergency department (ED). Such information could help stratify these patients and guide medical decision-making. The aim of this study was to develop a clinical prediction rule for short-term mortality during hospital admission or within a week after the index ED visit. Methods: This was a prospective cohort study of patients with eCOPD attending the EDs of 16 participating hospitals. Recruitment started in June 2008 and ended in September 2010. Information on possible predictor variables was recorded during the time the patient was evaluated in the ED, at the time a decision was made to admit the patient to the hospital or discharge home, and during follow-up. Main short-term outcomes were death during hospital admission or within 1 week of discharge to home from the ED, as well as at death within 1 month of the index ED visit. Multivariate logistic regression models were developed in a derivation sample and validated in a validation sample. The score was compared with other published prediction rules for patients with stable COPD. Results: In total, 2,487 patients were included in the study. Predictors of death during hospital admission, or within 1 week of discharge to home from the ED were patient age, baseline dyspnea, previous need for long-term home oxygen therapy or non-invasive mechanical ventilation, altered mental status, and use of inspiratory accessory muscles or paradoxical breathing upon ED arrival (area under the curve (AUC) = 0.85). Addition of arterial blood gas parameters (oxygen and carbon dioxide partial pressures (PO2 and PCO2)) and pH) did not improve the model. The same variables were predictors of death at 1 month (AUC = 0.85). Compared with other commonly used tools for predicting the severity of COPD in stable patients, our rule was significantly better. Conclusions: Five clinical predictors easily available in the ED, and also in the primary care setting, can be used to create a simple and easily obtained score that allows clinicians to stratify patients with eCOPD upon ED arrival and guide the medical decision-making process.
Resumo:
We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes. This model accommodates input dependent signal and noise correlations between multiple response variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distributions. We derive both efficient Markov chain Monte Carlo and variational Bayes inference procedures for this model. We apply GPRN as a multiple output regression and multivariate volatility model, demonstrating substantially improved performance over eight popular multiple output (multi-task) Gaussian process models and three multivariate volatility models on benchmark datasets, including a 1000 dimensional gene expression dataset.
Resumo:
INTRODUCTION: Platinum agents can cause the formation of DNA adducts and induce apoptosis to eliminate tumor cells. The aim of the present study was to investigate the influence of genetic variants of MDM2 on chemotherapy-related toxicities and clinical outcomes in patients with advanced non-small-cell lung cancer (NSCLC). MATERIALS AND METHODS: We recruited 663 patients with advanced NSCLC who had been treated with first-line platinum-based chemotherapy. Five tagging single nucleotide polymorphisms (SNPs) in MDM2 were genotyped in these patients. The associations of these SNPs with clinical toxicities and outcomes were evaluated using logistic regression and Cox regression analyses. RESULTS: Two SNPs (rs1470383 and rs1690924) showed significant associations with chemotherapy-related toxicities (ie, overall, hematologic, and gastrointestinal toxicity). Compared with the wild genotype AA carriers, patients with the GG genotype of rs1470383 had an increased risk of overall toxicity (odds ratio [OR], 3.28; 95% confidence interval [CI], 1.34-8.02; P = .009) and hematologic toxicity (OR, 4.10; 95% CI, 1.73-9.71; P = .001). Likewise, patients with the AG genotype of rs1690924 showed more sensitivity to gastrointestinal toxicity than did those with the wild-type homozygote GG (OR, 2.32; 95% CI, 1.30-4.14; P = .004). Stratified survival analysis revealed significant associations between rs1470383 genotypes and overall survival in patients without overall or hematologic toxicity (P = .007 and P = .0009, respectively). CONCLUSION: The results of our study suggest that SNPs in MDM2 might be used to predict the toxicities of platinum-based chemotherapy and overall survival in patients with advanced NSCLC. Additional validations of the association are warranted.
Resumo:
Plasma etch is a key process in modern semiconductor manufacturing facilities as it offers process simplification and yet greater dimensional tolerances compared to wet chemical etch technology. The main challenge of operating plasma etchers is to maintain a consistent etch rate spatially and temporally for a given wafer and for successive wafers processed in the same etch tool. Etch rate measurements require expensive metrology steps and therefore in general only limited sampling is performed. Furthermore, the results of measurements are not accessible in real-time, limiting the options for run-to-run control. This paper investigates a Virtual Metrology (VM) enabled Dynamic Sampling (DS) methodology as an alternative paradigm for balancing the need to reduce costly metrology with the need to measure more frequently and in a timely fashion to enable wafer-to-wafer control. Using a Gaussian Process Regression (GPR) VM model for etch rate estimation of a plasma etch process, the proposed dynamic sampling methodology is demonstrated and evaluated for a number of different predictive dynamic sampling rules. © 2013 IEEE.
Resumo:
In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.
Resumo:
Process monitoring and Predictive Maintenance (PdM) are gaining increasing attention in most manufacturing environments as a means of reducing maintenance related costs and downtime. This is especially true in industries that are data intensive such as semiconductor manufacturing. In this paper an adaptive PdM based flexible maintenance scheduling decision support system, which pays particular attention to associated opportunity and risk costs, is presented. The proposed system, which employs Machine Learning and regularized regression methods, exploits new information as it becomes available from newly processed components to refine remaining useful life estimates and associated costs and risks. The system has been validated on a real industrial dataset related to an Ion Beam Etching process for semiconductor manufacturing.
Resumo:
Virtual metrology (VM) aims to predict metrology values using sensor data from production equipment and physical metrology values of preceding samples. VM is a promising technology for the semiconductor manufacturing industry as it can reduce the frequency of in-line metrology operations and provide supportive information for other operations such as fault detection, predictive maintenance and run-to-run control. The prediction models for VM can be from a large variety of linear and nonlinear regression methods and the selection of a proper regression method for a specific VM problem is not straightforward, especially when the candidate predictor set is of high dimension, correlated and noisy. Using process data from a benchmark semiconductor manufacturing process, this paper evaluates the performance of four typical regression methods for VM: multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), neural networks (NN) and Gaussian process regression (GPR). It is observed that GPR performs the best among the four methods and that, remarkably, the performance of linear regression approaches that of GPR as the subset of selected input variables is increased. The observed competitiveness of high-dimensional linear regression models, which does not hold true in general, is explained in the context of extreme learning machines and functional link neural networks.
Resumo:
OBJECTIVE: To examine a panel of 28 biomarkers for prediction of cardiovascular disease (CVD) and non-CVD mortality in a population-based cohort of men.
METHODS: Starting in 1979, middle-aged men in Caerphilly underwent detailed medical examination. Subsequently 2171 men were re-examined during 1989-1993, and fasting blood samples obtained from 1911 men (88%). Fibrinogen, viscosity and white cell count (WCC), routine biochemistry tests and lipids were analysed using fresh samples. Stored aliquots were later analysed for novel biomarkers. Statistical analysis of CVD and non-CVD mortality follow-up used competing risk Cox regression models with biomarkers in thirds tested at the 1% significance level after covariate adjustment.
RESULTS: During an average of 15.4years follow-up, troponin (subhazard ratio per third 1.71, 95% CI 1.46-1.99) and B-natriuretic peptide (BNP) (subhazard ratio per third 1.54, 95% CI 1.34-1.78) showed strong trends with CVD death but not with non-CVD death. WCC and fibrinogen showed similar weaker findings. Plasma viscosity, growth differentiation factor 15 (GDF-15) and interleukin-6 (IL-6) were associated positively with both CVD death and non-CVD death while total cholesterol was associated positively with CVD death but negatively with non-CVD death. C-reactive protein (C-RP), alkaline phosphatase, gamma-glutamyltransferase (GGT), retinol binding protein 4 (RBP-4) and vitamin B6 were significantly associated only with non-CVD death, the last two negatively. Troponin, BNP and IL-6 showed evidence of diminishing associations with CVD mortality through follow-up.
CONCLUSION: Biomarkers for cardiac necrosis were strong, specific predictors of CVD mortality while many inflammatory markers were equally predictive of non-CVD mortality.
Resumo:
In many applications, and especially those where batch processes are involved, a target scalar output of interest is often dependent on one or more time series of data. With the exponential growth in data logging in modern industries such time series are increasingly available for statistical modeling in soft sensing applications. In order to exploit time series data for predictive modelling, it is necessary to summarise the information they contain as a set of features to use as model regressors. Typically this is done in an unsupervised fashion using simple techniques such as computing statistical moments, principal components or wavelet decompositions, often leading to significant information loss and hence suboptimal predictive models. In this paper, a functional learning paradigm is exploited in a supervised fashion to derive continuous, smooth estimates of time series data (yielding aggregated local information), while simultaneously estimating a continuous shape function yielding optimal predictions. The proposed Supervised Aggregative Feature Extraction (SAFE) methodology can be extended to support nonlinear predictive models by embedding the functional learning framework in a Reproducing Kernel Hilbert Spaces setting. SAFE has a number of attractive features including closed form solution and the ability to explicitly incorporate first and second order derivative information. Using simulation studies and a practical semiconductor manufacturing case study we highlight the strengths of the new methodology with respect to standard unsupervised feature extraction approaches.
Resumo:
Background: Around 10-15% of patients with locally advanced rectal cancer (LARC) undergo a pathologically complete response (TRG4) to neoadjuvant chemoradiotherapy; the rest of patients exhibit a spectrum of tumour regression (TRG1-3). Understanding therapy-related genomic alterations may help us to identify underlying biology or novel targets associated with response that could increase the efficacy of therapy in patients that do not benefit from the current standard of care.
Methods: 48 FFPE rectal cancer biopsies and matched resections were analysed using the WG-DASL HumanHT-12_v4 Beadchip array on the illumina iScan. Bioinformatic analysis was conducted in Partek genomics suite and R studio. Limma and glmnet packages were used to identify genes differentially expressed between tumour regression grades. Validation of microarray results will be carried out using IHC, RNAscope and RT-PCR.
Results: Immune response genes were observed from supervised analysis of the biopsies which may have predictive value. Differential gene expression from the resections as well as pre and post therapy analysis revealed induction of genes in a tumour regression dependent manner. Pathway mapping and Gene Ontology analysis of these genes suggested antigen processing and natural killer mediated cytotoxicity respectively. The natural killer-like gene signature was switched off in non-responders and on in the responders. IHC has confirmed the presence of Natural killer cells through CD56+ staining.
Conclusion: Identification of NK cell genes and CD56+ cells in patients responding to neoadjuvant chemoradiotherapy warrants further investigation into their association with tumour regression grade in LARC. NK cells are known to lyse malignant cells and determining whether their presence is a cause or consequence of response is crucial. Interrogation of the cytokines upregulated in our NK-like signature will help guide future in vitro models.