950 resultados para CROSS-VALIDATION


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this paper was to obtain evidence of the validity of the LSB-50 (de Rivera & Abuín, 2012), a screening measure of psychopathology, in Argentinean adolescents. The sample consisted of 1002 individuals (49.7% male; 50.3% female) between 12 and 18 years-old (M = 14.98; SD = 1.99). A cross-validation study and factorial invariance studies were performed in samples divided by sex and age to test if a seven-factor structure that corresponds to seven clinical scales (Hypersensitivity, Obsessive-Compulsive, Anxiety, Hostility, Somatization, Depression, and Sleep disturbance) was adequate for the LSB-50. The seven-factor structure proved to be suitable for all the subsamples. Next, the fit of the seven-factor structure was studied simultaneously? in the aforementioned subsamples through hierarchical models that imposed different constrains of equivalency?. Results indicated the invariance of the seven clinical dimensions of the LSB-50. Ordinal alphas showed good internal consistency for all the scales. Finally, the correlations with a diagnostic measure of psychopathology (PAI-A) indicated moderate convergence. It is concluded that the analyses performed provide robust evidence of construct validity for the LSB-50

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of this paper is to compare the performance of twopredictive radiological models, logistic regression (LR) and neural network (NN), with five different resampling methods. One hundred and sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models. Both models were developed with cross validation, leave-one-out and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (Az). The neural network obtained statistically higher Az than LR with cross validation. The remaining resampling validation methods did not reveal statistically significant differences between LR and NN rules. The neural network classifier performs better than the one based on logistic regression. This advantage is well detected by three-fold cross-validation, but remains unnoticed when leave-one-out or bootstrap algorithms are used.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recent studies have started to use media data to measure party positions and issue salience. The aim of this article is to compare and cross-validate this alternative approach with the more commonly used party manifestos, expert judgments and mass surveys. To this purpose, we present two methods to generate indicators of party positions and issue salience from media coverage: the core sentence approach and political claims analysis. Our cross-validation shows that with regard to party positions, indicators derived from the media converge with traditionally used measurements from party manifestos, mass surveys and expert judgments, but that salience indicators measure different underlying constructs. We conclude with a discussion of specific research questions for which media data offer potential advantages over more established methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Analyzing and modeling relationships between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects in chemical datasets is a challenging task for scientific researchers in the field of cheminformatics. Therefore, (Q)SAR model validation is essential to ensure future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to approve its use in real-world scenarios as an alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model is still under discussion. In this work, we empirically compare a k-fold cross-validation with external test set validation. The introduced workflow allows to apply the built and validated models to large amounts of unseen data, and to compare the performance of the different validation approaches. Our experimental results indicate that cross-validation produces (Q)SAR models with higher predictivity than external test set validation and reduces the variance of the results. Statistical validation is important to evaluate the performance of (Q)SAR models, but does not support the user in better understanding the properties of the model or the underlying correlations. We present the 3D molecular viewer CheS-Mapper (Chemical Space Mapper) that arranges compounds in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kinds of features, like structural fragments as well as quantitative chemical descriptors. Comprehensive functionalities including clustering, alignment of compounds according to their 3D structure, and feature highlighting aid the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. Even though visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allows for the investigation of model validation results are still lacking. We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. New functionalities in CheS-Mapper 2.0 facilitate the analysis of (Q)SAR information and allow the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. Our approach reveals if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography (GCxGC) and high-resolution mass spectrometry (HRMS). GCxGC-HRMS analysis produces large data sets that are rich with information, but highly complex. The size of the data and volume of information requires automated processing for comprehensive cross-sample analysis, but the complexity poses a challenge for developing robust methods. The approach developed here analyzes GCxGC-HRMS data from multiple samples to extract a feature template that comprehensively captures the pattern of peaks detected in the retention-times plane. Then, for each sample chromatogram, the template is geometrically transformed to align with the detected peak pattern and generate a set of feature measurements for cross-sample analyses such as sample classification and biomarker discovery. The approach avoids the intractable problem of comprehensive peak matching by using a few reliable peaks for alignment and peak-based retention-plane windows to define comprehensive features that can be reliably matched for cross-sample analysis. The informatics are demonstrated with a set of 18 samples from breast-cancer tumors, each from different individuals, six each for Grades 1-3. The features allow classification that matches grading by a cancer pathologist with 78% success in leave-one-out cross-validation experiments. The HRMS signatures of the features of interest can be examined for determining elemental compositions and identifying compounds.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recently, the target function for crystallographic refinement has been improved through a maximum likelihood analysis, which makes proper allowance for the effects of data quality, model errors, and incompleteness. The maximum likelihood target reduces the significance of false local minima during the refinement process, but it does not completely eliminate them, necessitating the use of stochastic optimization methods such as simulated annealing for poor initial models. It is shown that the combination of maximum likelihood with cross-validation, which reduces overfitting, and simulated annealing by torsion angle molecular dynamics, which simplifies the conformational search problem, results in a major improvement of the radius of convergence of refinement and the accuracy of the refined structure. Torsion angle molecular dynamics and the maximum likelihood target function interact synergistically, the combination of both methods being significantly more powerful than each method individually. This is demonstrated in realistic test cases at two typical minimum Bragg spacings (dmin = 2.0 and 2.8 Å, respectively), illustrating the broad applicability of the combined method. In an application to the refinement of a new crystal structure, the combined method automatically corrected a mistraced loop in a poor initial model, moving the backbone by 4 Å.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Published birthweight references in Australia do not fully take into account constitutional factors that influence birthweight and therefore may not provide an accurate reference to identify the infant with abnormal growth. Furthermore, studies in other regions that have derived adjusted (customised) birthweight references have applied untested assumptions in the statistical modelling. Aims: To validate the customised birthweight model and to produce a reference set of coefficients for estimating a customised birthweight that may be useful for maternity care in Australia and for future research. Methods: De-identified data were extracted from the clinical database for all births at the Mater Mother's Hospital, Brisbane, Australia, between January 1997 and June 2005. Births with missing data for the variables under study were excluded. In addition the following were excluded: multiple pregnancies, births less than 37 completed week's gestation, stillbirths, and major congenital abnormalities. Multivariate analysis was undertaken. A double cross-validation procedure was used to validate the model. Results: The study of 42 206 births demonstrated that, for statistical purposes, birthweight is normally distributed. Coefficients for the derivation of customised birthweight in an Australian population were developed and the statistical model is demonstrably robust. Conclusions: This study provides empirical data as to the robustness of the model to determine customised birthweight. Further research is required to define where normal physiology ends and pathology begins, and which segments of the population should be included in the construction of a customised birthweight standard.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Hydrophobicity as measured by Log P is an important molecular property related to toxicity and carcinogenicity. With increasing public health concerns for the effects of Disinfection By-Products (DBPs), there are considerable benefits in developing Quantitative Structure and Activity Relationship (QSAR) models capable of accurately predicting Log P. In this research, Log P values of 173 DBP compounds in 6 functional classes were used to develop QSAR models, by applying 3 molecular descriptors, namely, Energy of the Lowest Unoccupied Molecular Orbital (ELUMO), Number of Chlorine (NCl) and Number of Carbon (NC) by Multiple Linear Regression (MLR) analysis. The QSAR models developed were validated based on the Organization for Economic Co-operation and Development (OECD) principles. The model Applicability Domain (AD) and mechanistic interpretation were explored. Considering the very complex nature of DBPs, the established QSAR models performed very well with respect to goodness-of-fit, robustness and predictability. The predicted values of Log P of DBPs by the QSAR models were found to be significant with a correlation coefficient R2 from 81% to 98%. The Leverage Approach by Williams Plot was applied to detect and remove outliers, consequently increasing R 2 by approximately 2% to 13% for different DBP classes. The developed QSAR models were statistically validated for their predictive power by the Leave-One-Out (LOO) and Leave-Many-Out (LMO) cross validation methods. Finally, Monte Carlo simulation was used to assess the variations and inherent uncertainties in the QSAR models of Log P and determine the most influential parameters in connection with Log P prediction. The developed QSAR models in this dissertation will have a broad applicability domain because the research data set covered six out of eight common DBP classes, including halogenated alkane, halogenated alkene, halogenated aromatic, halogenated aldehyde, halogenated ketone, and halogenated carboxylic acid, which have been brought to the attention of regulatory agencies in recent years. Furthermore, the QSAR models are suitable to be used for prediction of similar DBP compounds within the same applicability domain. The selection and integration of various methodologies developed in this research may also benefit future research in similar fields.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Ce mémoire s’intéresse à l’étude du critère de validation croisée pour le choix des modèles relatifs aux petits domaines. L’étude est limitée aux modèles de petits domaines au niveau des unités. Le modèle de base des petits domaines est introduit par Battese, Harter et Fuller en 1988. C’est un modèle de régression linéaire mixte avec une ordonnée à l’origine aléatoire. Il se compose d’un certain nombre de paramètres : le paramètre β de la partie fixe, la composante aléatoire et les variances relatives à l’erreur résiduelle. Le modèle de Battese et al. est utilisé pour prédire, lors d’une enquête, la moyenne d’une variable d’intérêt y dans chaque petit domaine en utilisant une variable auxiliaire administrative x connue sur toute la population. La méthode d’estimation consiste à utiliser une distribution normale, pour modéliser la composante résiduelle du modèle. La considération d’une dépendance résiduelle générale, c’est-à-dire autre que la loi normale donne une méthodologie plus flexible. Cette généralisation conduit à une nouvelle classe de modèles échangeables. En effet, la généralisation se situe au niveau de la modélisation de la dépendance résiduelle qui peut être soit normale (c’est le cas du modèle de Battese et al.) ou non-normale. L’objectif est de déterminer les paramètres propres aux petits domaines avec le plus de précision possible. Cet enjeu est lié au choix de la bonne dépendance résiduelle à utiliser dans le modèle. Le critère de validation croisée sera étudié à cet effet.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Current research on achievement goals acknowledges that students can manifest different goal patterns. This study aimed to adapt and validate a self-report scale to assess the goal orientations of Portuguese students. A total of 2675 (age range 9–24 years) Portuguese students completed the Goal Orientations Scale (GOS). Through a cross-validation procedure, confirmatory factor analysis and descriptive statistics supports the existence of four different goal orientations: task, self-enhancing, self-defeating and avoidance orientations. The reliability and the internal validity estimates confirm that the GOS is an adequate instrument in assessing student goal orientations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Soils are an important component in the biogeochemical cycle of carbon, storing about four times more carbon than biomass plants and nearly three times more than the atmosphere. Moreover, the carbon content is directly related on the capacity of water retention, fertility. among other properties. Thus, soil carbon quantification in field conditions is an important challenge related to carbon cycle and global climatic changes. Nowadays. Laser Induced Breakdown Spectroscopy (LIBS) can be used for qualitative elemental analyses without previous treatment of samples and the results are obtained quickly. New optical technologies made possible the portable LIBS systems and now, the great expectation is the development of methods that make possible quantitative measurements with LIBS. The goal of this work is to calibrate a portable LIBS system to carry out quantitative measures of carbon in whole tropical soil sample. For this, six samples from the Brazilian Cerrado region (Argisoil) were used. Tropical soils have large amounts of iron in their compositions, so the carbon line at 247.86 nm presents strong interference of this element (iron lines at 247.86 and 247.95). For this reason, in this work the carbon line at 193.03 nm was used. Using methods of statistical analysis as a simple linear regression, multivariate linear regression and cross-validation were possible to obtain correlation coefficients higher than 0.91. These results show the great potential of using portable LIBS systems for quantitative carbon measurements in tropical soils. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Our objective was to develop a methodology to predict soil fertility using visible near-infrared (vis-NIR) diffuse reflectance spectra and terrain attributes derived from a digital elevation model (DEM). Specifically, our aims were to: (i) assemble a minimum data set to develop a soil fertility index for sugarcane (Sarcharum officinarum L.) (SFI-SC) for biofuel production in tropical soils; (ii) construct a model to predict the SFI-SC using soil vis-NIR spectra and terrain attributes; and (iii) produce a soil fertility map for our study area and assess it by comparing it with a green vegetation index (GVI). The study area was 185 ha located in sao Paulo State, Brazil. In total, 184 soil samples were collected and analyzed for a range of soil chemical and physical properties. Their vis-NIR spectra were collected from 400 to 2500 nm. The Shuttle Radar Topographic Mission 3-arcsec (90-m resolution) DEM of the area was used to derive 17 terrain attributes. A minimum data set of soil properties was selected to develop the SFI-SC. The SFI-SC consisted of three classes: Class 1, the highly fertile soils; Class 2, the fertile soils; and Class 3, the least fertile soils. It was derived heuristically with conditionals and using expert knowledge. The index was modeled with the spectra and terrain data using cross-validated decision trees. The cross-validation of the model correctly predicted Class 1 in 75% of cases, Class 2 in 61%, and Class 3 in 65%. A fertility map was derived for the study area and compared with a map of the GVI. Our approach offers a methodology that incorporates expert knowledge to derive the SFI-SC and uses a versatile spectro-spatial methodology that may be implemented for rapid and accurate determination of soil fertility and better exploration of areas suitable for production.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The DSSAT/CANEGRO model was parameterized and its predictions evaluated using data from five sugarcane (Sacchetrum spp.) experiments conducted in southern Brazil. The data used are from two of the most important Brazilian cultivars. Some parameters whose values were either directly measured or considered to be well known were not adjusted. Ten of the 20 parameters were optimized using a Generalized Likelihood Uncertainty Estimation (GLUE) algorithm using the leave-one-out cross-validation technique. Model predictions were evaluated using measured data of leaf area index (LA!), stalk and aerial dry mass, sucrose content, and soil water content, using bias, root mean squared error (RMSE), modeling efficiency (Eff), correlation coefficient, and agreement index. The Decision Support System for Agrotechnology Transfer (DSSAT)/CANEGRO model simulated the sugarcane crop in southern Brazil well, using the parameterization reported here. The soil water content predictions were better for rainfed (mean RMSE = 0.122mm) than for irrigated treatment (mean RMSE = 0.214mm). Predictions were best for aerial dry mass (Eff = 0.850), followed by stalk dry mass (Eff = 0.765) and then sucrose mass (Eff = 0.170). Number of green leaves showed the worst fit (Eff = -2.300). The cross-validation technique permits using multiple datasets that would have limited use if used independently because of the heterogeneity of measures and measurement strategies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tuberculosis is an infection caused mainly by Mycobacterium tuberculosis. A first-line antimycobacterial drug is pyrazinamide (PZA), which acts partially as a prodrug activated by a pyrazinamidase releasing the active agent, pyrazinoic acid (POA). As pyrazinoic acid presents some difficulty to cross the mycobacterial cell wall, and also the pyrazinamide-resistant strains do not express the pyrazinamidase, a set of pyrazinoic acid esters have been evaluated as antimycobacterial agents. In this work, a QSAR approach was applied to a set of forty-three pyrazinoates against M. tuberculosis ATCC 27294, using genetic algorithm function and partial least squares regression (WOLF 5.5 program). The independent variables selected were the Balaban index (I), calculated n-octanol/water partition coefficient (ClogP), van-der-Waals surface area, dipole moment, and stretching-energy contribution. The final QSAR model (N = 32, r(2) = 0.68, q(2) = 0.59, LOF = 0.25, and LSE = 0.19) was fully validated employing leave-N-out cross-validation and y-scrambling techniques. The test set (N = 11) presented an external prediction power of 73%. In conclusion, the QSAR model generated can be used as a valuable tool to optimize the activity of future pyrazinoic acid esters in the designing of new antituberculosis agents.