917 resultados para Cross-validation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Estimation of the number of mixture components (k) is an unsolved problem. Available methods for estimation of k include bootstrapping the likelihood ratio test statistics and optimizing a variety of validity functionals such as AIC, BIC/MDL, and ICOMP. We investigate the minimization of distance between fitted mixture model and the true density as a method for estimating k. The distances considered are Kullback-Leibler (KL) and “L sub 2”. We estimate these distances using cross validation. A reliable estimate of k is obtained by voting of B estimates of k corresponding to B cross validation estimates of distance. This estimation methods with KL distance is very similar to Monte Carlo cross validated likelihood methods discussed by Smyth (2000). With focus on univariate normal mixtures, we present simulation studies that compare the cross validated distance method with AIC, BIC/MDL, and ICOMP. We also apply the cross validation estimate of distance approach along with AIC, BIC/MDL and ICOMP approach, to data from an osteoporosis drug trial in order to find groups that differentially respond to treatment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study aimed to replicate and cross-validate the Rapid Screen of Concussion (RSC) for diagnosing mild TBI (mTBI). One hundred (81 male, 19 female) cases of mTBI and 35 (23 male and 12 female) cases of orthopaedic injuries were tested within 24 hr of injury. Double cross-validation was used to examine whether total RSC scores obtained in the cur-rent sample, generalised to one previously reported. In the new sample, mTBI patients answered fewer orientation questions, recalled fewer words on the learning trial and after a delay, judged fewer sentences in 2 min, and completed fewer symbols in the Digit Symbol Substitution Test than orthopaedic controls. The formulae and cut-offs developed on the original and new samples produced similar sensitivity and overall correct classification rates. Inclusion of the Digit Symbol Substitution Test performance of the new sample improved the sensitivity (80.2%) and specificity (82.6%) in males. It did not improve the correct classification rate in females, which was 89.5% sensitivity and 91.7% specificity before the inclusion of the Digit Symbol Substitution Test. Taken together, these results indicate that a combined score on this 12-min screen yields a measure of level of brain impairment up to 24 hr after mTBI.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is known theoretically that an algorithm cannot be good for an arbitrary prior. We show that in practical terms this also applies to the technique of ``cross validation'', which has been widely regarded as defying this general rule. Numerical examples are analysed in detail. Their implications to researches on learning algorithms are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poster presented at the First International Congress of CiiEM - From Basic Sciences To Clinical Research. Egas Moniz, Caparica, Portugal, 27-28 November 2015.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this paper was to obtain evidence of the validity of the LSB-50 (de Rivera & Abuín, 2012), a screening measure of psychopathology, in Argentinean adolescents. The sample consisted of 1002 individuals (49.7% male; 50.3% female) between 12 and 18 years-old (M = 14.98; SD = 1.99). A cross-validation study and factorial invariance studies were performed in samples divided by sex and age to test if a seven-factor structure that corresponds to seven clinical scales (Hypersensitivity, Obsessive-Compulsive, Anxiety, Hostility, Somatization, Depression, and Sleep disturbance) was adequate for the LSB-50. The seven-factor structure proved to be suitable for all the subsamples. Next, the fit of the seven-factor structure was studied simultaneously? in the aforementioned subsamples through hierarchical models that imposed different constrains of equivalency?. Results indicated the invariance of the seven clinical dimensions of the LSB-50. Ordinal alphas showed good internal consistency for all the scales. Finally, the correlations with a diagnostic measure of psychopathology (PAI-A) indicated moderate convergence. It is concluded that the analyses performed provide robust evidence of construct validity for the LSB-50

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Body composition of 292 males aged between 18 and 65 years was measured using the deuterium oxide dilution technique. Participants were divided into development (n=146) and cross-validation (n=146) groups. Stature, body weight, skinfold thickness at eight sites, girth at five sites, and bone breadth at four sites were measured and body mass index (BMI), waist-to-hip ratio (WHR), and waist-to-stature ratio (WSR) calculated. Equations were developed using multiple regression analyses with skinfolds, breadth and girth measures, BMI, and other indices as independent variables and percentage body fat (%BF) determined from deuterium dilution technique as the reference. All equations were then tested in the cross-validation group. Results from the reference method were also compared with existing prediction equations by Durnin and Womersley (1974), Davidson et al (2011), and Gurrici et al (1998). The proposed prediction equations were valid in our cross-validation samples with r=0.77- 0.86, bias 0.2-0.5%, and pure error 2.8-3.6%. The strongest was generated from skinfolds with r=0.83, SEE 3.7%, and AIC 377.2. The Durnin and Womersley (1974) and Davidson et al (2011) equations significantly (p<0.001) underestimated %BF by 1.0 and 6.9% respectively, whereas the Gurrici et al (1998) equation significantly (p<0.001) overestimated %BF by 3.3% in our cross-validation samples compared to the reference. Results suggest that the proposed prediction equations are useful in the estimation of %BF in Indonesian men.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Stormwater pollution is linked to stream ecosystem degradation. In predicting stormwater pollution, various types of modelling techniques are adopted. The accuracy of predictions provided by these models depends on the data quality, appropriate estimation of model parameters, and the validation undertaken. It is well understood that available water quality datasets in urban areas span only relatively short time scales unlike water quantity data, which limits the applicability of the developed models in engineering and ecological assessment of urban waterways. This paper presents the application of leave-one-out (LOO) and Monte Carlo cross validation (MCCV) procedures in a Monte Carlo framework for the validation and estimation of uncertainty associated with pollutant wash-off when models are developed using a limited dataset. It was found that the application of MCCV is likely to result in a more realistic measure of model coefficients than LOO. Most importantly, MCCV and LOO were found to be effective in model validation when dealing with a small sample size which hinders detailed model validation and can undermine the effectiveness of stormwater quality management strategies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Objective: To develop bioelectrical impedance analysis (BIA) equations to predict total body water (TBW) and fat-free mass (FFM) of Sri Lankan children. Subjects/Methods: Data were collected from 5- to 15-year-old healthy children. They were randomly assigned to validation (M/F: 105/83) and cross-validation (M/F: 53/41) groups. Height, weight and BIA were measured. TBW was assessed using isotope dilution method (D2 O). Multiple regression analysis was used to develop preliminary equations and cross-validated on an independent group. Final prediction equation was constructed combining the two groups and validated by PRESS (prediction of sum of squares) statistics. Impedance index (height2/impedance; cm2/Ω), weight and sex code (male = 1; female = 0) were used as variables. Results: Independent variables of the final prediction equation for TBW were able to predict 86.3% of variance with root means-squared error (RMSE) of 2.1l. PRESS statistics was 2.1l with press residuals of 1.2l. Independent variables were able to predict 86.9% of variance of FFM with RMSE of 2.7 kg. PRESS statistics was 2.8 kg with press residuals of 1.4 kg. Bland Altman technique showed that the majority of the residuals were within mean bias±1.96 s.d. Conclusions: Results of this study provide BIA equation for the prediction of TBW and FFM in Sri Lankan children. To the best of our knowledge there are no published BIA prediction equations validated on South Asian populations. Results of this study need to be affirmed by more studies on other closely related populations by using multi-component body composition assessment.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper proposes a sparse modeling approach to solve ordinal regression problems using Gaussian processes (GP). Designing a sparse GP model is important from training time and inference time viewpoints. We first propose a variant of the Gaussian process ordinal regression (GPOR) approach, leave-one-out GPOR (LOO-GPOR). It performs model selection using the leave-one-out cross-validation (LOO-CV) technique. We then provide an approach to design a sparse model for GPOR. The sparse GPOR model reduces computational time and storage requirements. Further, it provides faster inference. We compare the proposed approaches with the state-of-the-art GPOR approach on some benchmark data sets. Experimental results show that the proposed approaches are competitive.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Purpose: Current prognostic factors are poor at identifying patients at risk of disease recurrence after surgery for stage II colon cancer. Here we describe a DNA microarray-based prognostic assay using clinically relevant formalin-fixed paraffin-embedded (FFPE) samples. Patients and Methods: A gene signature was developed from a balanced set of 73 patients with recurrent disease (high risk) and 142 patients with no recurrence (low risk) within 5 years of surgery. Results: The 634-probe set signature identified high-risk patients with a hazard ratio (HR) of 2.62 (P <.001) during cross validation of the training set. In an independent validation set of 144 samples, the signature identified high-risk patients with an HR of 2.53 (P <.001) for recurrence and an HR of 2.21 (P = .0084) for cancer-related death. Additionally, the signature was shown to perform independently from known prognostic factors (P <.001). Conclusion: This gene signature represents a novel prognostic biomarker for patients with stage II colon cancer that can be applied to FFPE tumor samples. © 2011 by American Society of Clinical Oncology.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Analyzing and modeling relationships between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects in chemical datasets is a challenging task for scientific researchers in the field of cheminformatics. Therefore, (Q)SAR model validation is essential to ensure future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to approve its use in real-world scenarios as an alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model is still under discussion. In this work, we empirically compare a k-fold cross-validation with external test set validation. The introduced workflow allows to apply the built and validated models to large amounts of unseen data, and to compare the performance of the different validation approaches. Our experimental results indicate that cross-validation produces (Q)SAR models with higher predictivity than external test set validation and reduces the variance of the results. Statistical validation is important to evaluate the performance of (Q)SAR models, but does not support the user in better understanding the properties of the model or the underlying correlations. We present the 3D molecular viewer CheS-Mapper (Chemical Space Mapper) that arranges compounds in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kinds of features, like structural fragments as well as quantitative chemical descriptors. Comprehensive functionalities including clustering, alignment of compounds according to their 3D structure, and feature highlighting aid the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. Even though visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allows for the investigation of model validation results are still lacking. We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. New functionalities in CheS-Mapper 2.0 facilitate the analysis of (Q)SAR information and allow the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. Our approach reveals if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography (GCxGC) and high-resolution mass spectrometry (HRMS). GCxGC-HRMS analysis produces large data sets that are rich with information, but highly complex. The size of the data and volume of information requires automated processing for comprehensive cross-sample analysis, but the complexity poses a challenge for developing robust methods. The approach developed here analyzes GCxGC-HRMS data from multiple samples to extract a feature template that comprehensively captures the pattern of peaks detected in the retention-times plane. Then, for each sample chromatogram, the template is geometrically transformed to align with the detected peak pattern and generate a set of feature measurements for cross-sample analyses such as sample classification and biomarker discovery. The approach avoids the intractable problem of comprehensive peak matching by using a few reliable peaks for alignment and peak-based retention-plane windows to define comprehensive features that can be reliably matched for cross-sample analysis. The informatics are demonstrated with a set of 18 samples from breast-cancer tumors, each from different individuals, six each for Grades 1-3. The features allow classification that matches grading by a cancer pathologist with 78% success in leave-one-out cross-validation experiments. The HRMS signatures of the features of interest can be examined for determining elemental compositions and identifying compounds.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recently, the target function for crystallographic refinement has been improved through a maximum likelihood analysis, which makes proper allowance for the effects of data quality, model errors, and incompleteness. The maximum likelihood target reduces the significance of false local minima during the refinement process, but it does not completely eliminate them, necessitating the use of stochastic optimization methods such as simulated annealing for poor initial models. It is shown that the combination of maximum likelihood with cross-validation, which reduces overfitting, and simulated annealing by torsion angle molecular dynamics, which simplifies the conformational search problem, results in a major improvement of the radius of convergence of refinement and the accuracy of the refined structure. Torsion angle molecular dynamics and the maximum likelihood target function interact synergistically, the combination of both methods being significantly more powerful than each method individually. This is demonstrated in realistic test cases at two typical minimum Bragg spacings (dmin = 2.0 and 2.8 Å, respectively), illustrating the broad applicability of the combined method. In an application to the refinement of a new crystal structure, the combined method automatically corrected a mistraced loop in a poor initial model, moving the backbone by 4 Å.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Published birthweight references in Australia do not fully take into account constitutional factors that influence birthweight and therefore may not provide an accurate reference to identify the infant with abnormal growth. Furthermore, studies in other regions that have derived adjusted (customised) birthweight references have applied untested assumptions in the statistical modelling. Aims: To validate the customised birthweight model and to produce a reference set of coefficients for estimating a customised birthweight that may be useful for maternity care in Australia and for future research. Methods: De-identified data were extracted from the clinical database for all births at the Mater Mother's Hospital, Brisbane, Australia, between January 1997 and June 2005. Births with missing data for the variables under study were excluded. In addition the following were excluded: multiple pregnancies, births less than 37 completed week's gestation, stillbirths, and major congenital abnormalities. Multivariate analysis was undertaken. A double cross-validation procedure was used to validate the model. Results: The study of 42 206 births demonstrated that, for statistical purposes, birthweight is normally distributed. Coefficients for the derivation of customised birthweight in an Australian population were developed and the statistical model is demonstrably robust. Conclusions: This study provides empirical data as to the robustness of the model to determine customised birthweight. Further research is required to define where normal physiology ends and pathology begins, and which segments of the population should be included in the construction of a customised birthweight standard.