40 resultados para Cross-validation
Resumo:
The validation of variable-density flow models simulating seawater intrusion in coastal aquifers requires information about concentration distribution in groundwater. Electrical resistivity tomography (ERT) provides relevant data for this purpose. However, inverse modeling is not accurate because of the non-uniqueness of solutions. Such difficulties in evaluating seawater intrusion can be overcome by coupling geophysical data and groundwater modeling. First, the resistivity distribution obtained by inverse geo-electrical modeling is established. Second, a 3-D variable-density flow hydrogeological model is developed. Third, using Archie's Law, the electrical resistivity model deduced from salt concentration is compared to the formerly interpreted electrical model. Finally, aside from that usual comparison-validation, the theoretical geophysical response of concentrations simulated with the groundwater model can be compared to field-measured resistivity data. This constitutes a cross-validation of both the inverse geo-electrical model and the groundwater model.
[Comte, J.-C., and O. Banton (2007), Cross-validation of geo-electrical and hydrogeological models to evaluate seawater intrusion in coastal aquifers, Geophys. Res. Lett., 34, L10402, doi:10.1029/2007GL029981.]
Resumo:
The paper addresses the issue of choice of bandwidth in the application of semiparametric estimation of the long memory parameter in a univariate time series process. The focus is on the properties of forecasts from the long memory model. A variety of cross-validation methods based on out of sample forecasting properties are proposed. These procedures are used for the choice of bandwidth and subsequent model selection. Simulation evidence is presented that demonstrates the advantage of the proposed new methodology.
Resumo:
Purpose: Current prognostic factors are poor at identifying patients at risk of disease recurrence after surgery for stage II colon cancer. Here we describe a DNA microarray-based prognostic assay using clinically relevant formalin-fixed paraffin-embedded (FFPE) samples. Patients and Methods: A gene signature was developed from a balanced set of 73 patients with recurrent disease (high risk) and 142 patients with no recurrence (low risk) within 5 years of surgery. Results: The 634-probe set signature identified high-risk patients with a hazard ratio (HR) of 2.62 (P <.001) during cross validation of the training set. In an independent validation set of 144 samples, the signature identified high-risk patients with an HR of 2.53 (P <.001) for recurrence and an HR of 2.21 (P = .0084) for cancer-related death. Additionally, the signature was shown to perform independently from known prognostic factors (P <.001). Conclusion: This gene signature represents a novel prognostic biomarker for patients with stage II colon cancer that can be applied to FFPE tumor samples. © 2011 by American Society of Clinical Oncology.
Resumo:
In this paper NOx emissions modelling for real-time operation and control of a 200 MWe coal-fired power generation plant is studied. Three model types are compared. For the first model the fundamentals governing the NOx formation mechanisms and a system identification technique are used to develop a grey-box model. Then a linear AutoRegressive model with eXogenous inputs (ARX) model and a non-linear ARX model (NARX) are built. Operation plant data is used for modelling and validation. Model cross-validation tests show that the developed grey-box model is able to consistently produce better overall long-term prediction performance than the other two models.
Resumo:
This is the first paper that introduces a nonlinearity test for principal component models. The methodology involves the division of the data space into disjunct regions that are analysed using principal component analysis using the cross-validation principle. Several toy examples have been successfully analysed and the nonlinearity test has subsequently been applied to data from an internal combustion engine.
Resumo:
A comparative molecular field analysis (CoMFA) of alkanoic acid 3-oxo-cyclohex-1-enyl ester and 2-acylcyclohexane-1,3-dione derivatives of 4-hydroxyphenylpyruvate dioxygenase inhibitors has been performed to determine the factors required for the activity of these compounds. The substrate's conformation abstracted from dynamic modeling of the enzyme-substrate complex was used to build the initial structures of the inhibitors. Satisfactory results were obtained after an all-space searching procedure, performing a leave-one out (LOO) cross-validation study with cross-validation q(2) and conventional r(2) values of 0.779 and 0.989, respectively. The results provide the tools for predicting the affinity of related compounds, and for guiding the design and synthesis of new HPPD ligands with predetermined affinities.
Resumo:
The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.
Resumo:
Ground-penetrating radar (GPR) is a rapid geophysical technique that we have used to assess four illegally buried waste locations in Northern Ireland. GPR allowed informed positioning of the less-rapid, if more accurate use of electrical resistivity imaging (ERI). In conductive waste, GPR signal loss can be used to map the areal extent of waste, allowing ERI survey lines to be positioned. In less conductive waste the geometry of the burial can be ascertained from GPR alone, allowing rapid assessment. In both circumstances, the conjunctive use of GPR and ERI is considered best practice for cross-validation of results and enhancing data interpretation.
Resumo:
In small islands, a freshwater lens can develop due to the recharge induced by rain. Magnitude and spatial distribution of this recharge control the elevation of freshwater and the depth of its interface with salt water. Therefore, the study of lens morphology gives useful information on both the recharge and water uptake due to evapotranspiration by vegetation. Electrical resistivity tomography was applied on a small coral reef island, giving relevant information on the lens structure. Variable density groundwater flow models were then applied to simulate freshwater behavior. Cross validation of the geoelectrical model and the groundwater model showed that recharge exceeds water uptake in dunes with little vegetation, allowing the lens to develop. Conversely, in the low-lying and densely vegetated sectors, where water uptake exceeds recharge, the lens cannot develop and seawater intrusion occurs. This combined modeling method constitutes an original approach to evaluate effective groundwater recharge in such environments.
[Comte, J.-C., O. Banton, J.-L. Join, and G. Cabioch (2010), Evaluation of effective groundwater recharge of freshwater lens in small islands by the combined modeling of geoelectrical data and water heads, Water Resour. Res., 46, W06601, doi:10.1029/2009WR008058.]
Resumo:
Schizophrenia is a common psychotic mental disorder that is believed to result from the effects of multiple genetic and environmental factors. In this study, we explored gene-gene interactions and main effects in both case-control (657 cases and 411 controls) and family-based (273 families, 1350 subjects) datasets of English or Irish ancestry. Fifty three markers in 8 genes were genotyped in the family sample and 44 markers in 7 genes were genotyped in the case-control sample. The Multifactor Dimensionality Reduction Pedigree Disequilibrium Test (MDR-PDT) was used to examine epistasis in the family dataset and a 3-locus model was identified (permuted p=0.003). The 3-locus model involved the IL3 (rs2069803), RGS4 (rs2661319), and DTNBP1 (rs21319539) genes. We used MDR to analyze the case-control dataset containing the same markers typed in the RGS4, IL3 and DTNBP1 genes and found evidence of a joint effect between IL3 (rs31400) and DTNBP1 (rs760761) (cross-validation consistency 4/5, balanced prediction accuracy=56.84%, p=0.019). While this is not a direct replication, the results obtained from both the family and case-control samples collectively suggest that IL3 and DTNBP1 are likely to interact and jointly contribute to increase risk for schizophrenia. We also observed a significant main effect in DTNBP1, which survived correction for multiple comparisons, and numerous nominally significant effects in several genes. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
It is convenient and effective to solve nonlinear problems with a model that has a linear-in-the-parameters (LITP) structure. However, the nonlinear parameters (e.g. the width of Gaussian function) of each model term needs to be pre-determined either from expert experience or through exhaustive search. An alternative approach is to optimize them by a gradient-based technique (e.g. Newton’s method). Unfortunately, all of these methods still need a lot of computations. Recently, the extreme learning machine (ELM) has shown its advantages in terms of fast learning from data, but the sparsity of the constructed model cannot be guaranteed. This paper proposes a novel algorithm for automatic construction of a nonlinear system model based on the extreme learning machine. This is achieved by effectively integrating the ELM and leave-one-out (LOO) cross validation with our two-stage stepwise construction procedure [1]. The main objective is to improve the compactness and generalization capability of the model constructed by the ELM method. Numerical analysis shows that the proposed algorithm only involves about half of the computation of orthogonal least squares (OLS) based method. Simulation examples are included to confirm the efficacy and superiority of the proposed technique.
Resumo:
Nitrogen Dioxide (NO2) is known to act as an environmental trigger for many respiratory illnesses. As a pollutant it is difficult to map accurately, as concentrations can vary greatly over small distances. In this study three geostatistical techniques were compared, producing maps of NO2 concentrations in the United Kingdom (UK). The primary data source for each technique was NO2 point data, generated from background automatic monitoring and background diffusion tubes, which are analysed by different laboratories on behalf of local councils and authorities in the UK. The techniques used were simple kriging (SK), ordinary kriging (OK) and simple kriging with a locally varying mean (SKlm). SK and OK make use of the primary variable only. SKlm differs in that it utilises additional data to inform prediction, and hence potentially reduces uncertainty. The secondary data source was Oxides of Nitrogen (NOx) derived from dispersion modelling outputs, at 1km x 1km resolution for the UK. These data were used to define the locally varying mean in SKlm, using two regression approaches: (i) global regression (GR) and (ii) geographically weighted regression (GWR). Based upon summary statistics and cross-validation prediction errors, SKlm using GWR derived local means produced the most accurate predictions. Therefore, using GWR to inform SKlm was beneficial in this study.
Resumo:
In 2004 nineteen scientists from fourteen institutions in seven countries
collaborated in the landmark study described in chapter 2 (Thomas et al., 2004a). This chapter provides an overview of results of studies published subsequently and assesses how much, and why, new results differ from those of Thomas et al.
Some species distribution modeling (SDM) studies are directly comparable to the Thomas et al. estimates. Others using somewhat different methods nonetheless illuminate whether the original estimates were of the right order of magnitude. Climate similarity models (Williams et al., 2007; Williams and Jackson, 2007), biome, and vegetation dynamic models (Perry and Enright, 2006) have also been
applied in the context of climate change, providing interesting opportunities
for comparison and cross-validation with results from SDMs.
This chapter concludes with an assessment of whether the range of extinction risk estimates presented in 2004 can be narrowed, and whether the mean estimate should be revised upward or downward. To set the stage for these analyses, the chapter begins with brief reviews of advances in climate modeling and species modeling since 2004.
Resumo:
Health care research includes many studies that combine quantitative and qualitative methods. In this paper, we revisit the quantitative-qualitative debate and review the arguments for and against using mixed-methods. In addition, we discuss the implications stemming from our view, that the paradigms upon which the methods are based have a different view of reality and therefore a different view of the phenomenon under study. Because the two paradigms do not study the same phenomena, quantitative and qualitative methods cannot be combined for cross-validation or triangulation purposes. However, they can be combined for complementary purposes. Future standards for mixed-methods research should clearly reflect this recommendation.