950 resultados para CROSS-VALIDATION
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
OBJECTIVE: Mild neurocognitive disorders (MND) affect a subset of HIV+ patients under effective combination antiretroviral therapy (cART). In this study, we used an innovative multi-contrast magnetic resonance imaging (MRI) approach at high-field to assess the presence of micro-structural brain alterations in MND+ patients. METHODS: We enrolled 17 MND+ and 19 MND- patients with undetectable HIV-1 RNA and 19 healthy controls (HC). MRI acquisitions at 3T included: MP2RAGE for T1 relaxation times, Magnetization Transfer (MT), T2* and Susceptibility Weighted Imaging (SWI) to probe micro-structural integrity and iron deposition in the brain. Statistical analysis used permutation-based tests and correction for family-wise error rate. Multiple regression analysis was performed between MRI data and (i) neuropsychological results (ii) HIV infection characteristics. A linear discriminant analysis (LDA) based on MRI data was performed between MND+ and MND- patients and cross-validated with a leave-one-out test. RESULTS: Our data revealed loss of structural integrity and micro-oedema in MND+ compared to HC in the global white and cortical gray matter, as well as in the thalamus and basal ganglia. Multiple regression analysis showed a significant influence of sub-cortical nuclei alterations on the executive index of MND+ patients (p = 0.04 he and R(2) = 95.2). The LDA distinguished MND+ and MND- patients with a classification quality of 73% after cross-validation. CONCLUSION: Our study shows micro-structural brain tissue alterations in MND+ patients under effective therapy and suggests that multi-contrast MRI at high field is a powerful approach to discriminate between HIV+ patients on cART with and without mild neurocognitive deficits.
Resumo:
BACKGROUND Ovarian carcinoma is the most important cause of gynecological cancer-related mortality in Western societies. Despite the improved median overall survival in patients receiving chemotherapy regimens such as paclitaxel and carboplatin combination, relapse still occurs in most advanced diseased patients. Increased angiogenesis is associated with rapid recurrence and decreased survival in ovarian cancer. This study was planned to identify an angiogenesis-related gene expression profile with prognostic value in advanced ovarian carcinoma patients. METHODOLOGY/PRINCIPAL FINDINGS RNAs were collected from formalin-fixed paraffin-embedded samples of 61 patients with III/IV FIGO stage ovarian cancer who underwent surgical cytoreduction and received a carboplatin plus paclitaxel regimen. Expression levels of 82 angiogenesis related genes were measured by quantitative real-time polymerase chain reaction using TaqMan low-density arrays. A 34-gene-profile which was able to predict the overall survival of ovarian carcinoma patients was identified. After a leave-one-out cross validation, the profile distinguished two groups of patients with different outcomes. Median overall survival and progression-free survival for the high risk group was 28.3 and 15.0 months, respectively, and was not reached by patients in the low risk group at the end of follow-up. Moreover, the profile maintained an independent prognostic value in the multivariate analysis. The hazard ratio for death was 2.3 (95% CI, 1.5 to 3.2; p<0.001). CONCLUSIONS/SIGNIFICANCE It is possible to generate a prognostic model for advanced ovarian carcinoma based on angiogenesis-related genes using formalin-fixed paraffin-embedded samples. The present results are consistent with the increasing weight of angiogenesis genes in the prognosis of ovarian carcinoma.
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
The overall survival of patients with pancreatic ductal adenocarcinoma is extremely low. Although gemcitabine is the standard used chemotherapy for this disease, clinical outcomes do not reflect significant improvements, not even when combined with adjuvant treatments. There is an urgent need for prognosis markers to be found. The aim of this study was to analyze the potential value of serum cytokines to find a profile that can predict the clinical outcome in patients with pancreatic cancer and to establish a practical prognosis index that significantly predicts patients' outcomes. We have conducted an extensive analysis of serum prognosis biomarkers using an antibody array comprising 507 human cytokines. Overall survival was estimated using the Kaplan-Meier method. Univariate and multivariate Cox's proportional hazard models were used to analyze prognosis factors. To determine the extent that survival could be predicted based on this index, we used the leave-one-out cross-validation model. The multivariate model showed a better performance and it could represent a novel panel of serum cytokines that correlates to poor prognosis in pancreatic cancer. B7-1/CD80, EG-VEGF/PK1, IL-29, NRG1-beta1/HRG1-beta1, and PD-ECGF expressions portend a poor prognosis for patients with pancreatic cancer and these cytokines could represent novel therapeutic targets for this disease.
Resumo:
Topological indices have been applied to build QSAR models for a set of 20 antimalarial cyclic peroxy cetals. In order to evaluate the reliability of the proposed linear models leave-n-out and Internal Test Sets (ITS) approaches have been considered. The proposed procedure resulted in a robust and consensued prediction equation and here it is shown why it is superior to the employed standard cross-validation algorithms involving multilinear regression models
Resumo:
Reliable estimates of heavy-truck volumes are important in a number of transportation applications. Estimates of truck volumes are necessary for pavement design and pavement management. Truck volumes are important in traffic safety. The number of trucks on the road also influences roadway capacity and traffic operations. Additionally, heavy vehicles pollute at higher rates than passenger vehicles. Consequently, reliable estimates of heavy-truck vehicle miles traveled (VMT) are important in creating accurate inventories of on-road emissions. This research evaluated three different methods to calculate heavy-truck annual average daily traffic (AADT) which can subsequently be used to estimate vehicle miles traveled (VMT). Traffic data from continuous count stations provided by the Iowa DOT were used to estimate AADT for two different truck groups (single-unit and multi-unit) using the three methods. The first method developed monthly and daily expansion factors for each truck group. The second and third methods created general expansion factors for all vehicles. Accuracy of the three methods was compared using n-fold cross-validation. In n-fold cross-validation, data are split into n partitions, and data from the nth partition are used to validate the remaining data. A comparison of the accuracy of the three methods was made using the estimates of prediction error obtained from cross-validation. The prediction error was determined by averaging the squared error between the estimated AADT and the actual AADT. Overall, the prediction error was the lowest for the method that developed expansion factors separately for the different truck groups for both single- and multi-unit trucks. This indicates that use of expansion factors specific to heavy trucks results in better estimates of AADT, and, subsequently, VMT, than using aggregate expansion factors and applying a percentage of trucks. Monthly, daily, and weekly traffic patterns were also evaluated. Significant variation exists in the temporal and seasonal patterns of heavy trucks as compared to passenger vehicles. This suggests that the use of aggregate expansion factors fails to adequately describe truck travel patterns.
Resumo:
Aim This study used data from temperate forest communities to assess: (1) five different stepwise selection methods with generalized additive models, (2) the effect of weighting absences to ensure a prevalence of 0.5, (3) the effect of limiting absences beyond the environmental envelope defined by presences, (4) four different methods for incorporating spatial autocorrelation, and (5) the effect of integrating an interaction factor defined by a regression tree on the residuals of an initial environmental model. Location State of Vaud, western Switzerland. Methods Generalized additive models (GAMs) were fitted using the grasp package (generalized regression analysis and spatial predictions, http://www.cscf.ch/grasp). Results Model selection based on cross-validation appeared to be the best compromise between model stability and performance (parsimony) among the five methods tested. Weighting absences returned models that perform better than models fitted with the original sample prevalence. This appeared to be mainly due to the impact of very low prevalence values on evaluation statistics. Removing zeroes beyond the range of presences on main environmental gradients changed the set of selected predictors, and potentially their response curve shape. Moreover, removing zeroes slightly improved model performance and stability when compared with the baseline model on the same data set. Incorporating a spatial trend predictor improved model performance and stability significantly. Even better models were obtained when including local spatial autocorrelation. A novel approach to include interactions proved to be an efficient way to account for interactions between all predictors at once. Main conclusions Models and spatial predictions of 18 forest communities were significantly improved by using either: (1) cross-validation as a model selection method, (2) weighted absences, (3) limited absences, (4) predictors accounting for spatial autocorrelation, or (5) a factor variable accounting for interactions between all predictors. The final choice of model strategy should depend on the nature of the available data and the specific study aims. Statistical evaluation is useful in searching for the best modelling practice. However, one should not neglect to consider the shapes and interpretability of response curves, as well as the resulting spatial predictions in the final assessment.
Resumo:
The n-octanol/water partition coefficient (log Po/w) is a key physicochemical parameter for drug discovery, design, and development. Here, we present a physics-based approach that shows a strong linear correlation between the computed solvation free energy in implicit solvents and the experimental log Po/w on a cleansed data set of more than 17,500 molecules. After internal validation by five-fold cross-validation and data randomization, the predictive power of the most interesting multiple linear model, based on two GB/SA parameters solely, was tested on two different external sets of molecules. On the Martel druglike test set, the predictive power of the best model (N = 706, r = 0.64, MAE = 1.18, and RMSE = 1.40) is similar to six well-established empirical methods. On the 17-drug test set, our model outperformed all compared empirical methodologies (N = 17, r = 0.94, MAE = 0.38, and RMSE = 0.52). The physical basis of our original GB/SA approach together with its predictive capacity, computational efficiency (1 to 2 s per molecule), and tridimensional molecular graphics capability lay the foundations for a promising predictor, the implicit log P method (iLOGP), to complement the portfolio of drug design tools developed and provided by the SIB Swiss Institute of Bioinformatics.
Resumo:
The most widely used formula for estimating glomerular filtration rate (eGFR) in children is the Schwartz formula. It was revised in 2009 using iohexol clearances with measured GFR (mGFR) ranging between 15 and 75 ml/min × 1.73 m(2). Here we assessed the accuracy of the Schwartz formula using the inulin clearance (iGFR) method to evaluate its accuracy for children with less renal impairment comparing 551 iGFRs of 392 children with their Schwartz eGFRs. Serum creatinine was measured using the compensated Jaffe method. In order to find the best relationship between iGFR and eGFR, a linear quadratic regression model was fitted and a more accurate formula was derived. This quadratic formula was: 0.68 × (Height (cm)/serum creatinine (mg/dl))-0.0008 × (height (cm)/serum creatinine (mg/dl))(2)+0.48 × age (years)-(21.53 in males or 25.68 in females). This formula was validated using a split-half cross-validation technique and also externally validated with a new cohort of 127 children. Results show that the Schwartz formula is accurate until a height (Ht)/serum creatinine value of 251, corresponding to an iGFR of 103 ml/min × 1.73 m(2), but significantly unreliable for higher values. For an accuracy of 20 percent, the quadratic formula was significantly better than the Schwartz formula for all patients and for patients with a Ht/serum creatinine of 251 or greater. Thus, the new quadratic formula could replace the revised Schwartz formula, which is accurate for children with moderate renal failure but not for those with less renal impairment or hyperfiltration.
Resumo:
Neuroimaging studies typically compare experimental conditions using average brain responses, thereby overlooking the stimulus-related information conveyed by distributed spatio-temporal patterns of single-trial responses. Here, we take advantage of this rich information at a single-trial level to decode stimulus-related signals in two event-related potential (ERP) studies. Our method models the statistical distribution of the voltage topographies with a Gaussian Mixture Model (GMM), which reduces the dataset to a number of representative voltage topographies. The degree of presence of these topographies across trials at specific latencies is then used to classify experimental conditions. We tested the algorithm using a cross-validation procedure in two independent EEG datasets. In the first ERP study, we classified left- versus right-hemifield checkerboard stimuli for upper and lower visual hemifields. In a second ERP study, when functional differences cannot be assumed, we classified initial versus repeated presentations of visual objects. With minimal a priori information, the GMM model provides neurophysiologically interpretable features - vis à vis voltage topographies - as well as dynamic information about brain function. This method can in principle be applied to any ERP dataset testing the functional relevance of specific time periods for stimulus processing, the predictability of subject's behavior and cognitive states, and the discrimination between healthy and clinical populations.
Resumo:
In this paper we present a Bayesian image reconstruction algorithm with entropy prior (FMAPE) that uses a space-variant hyperparameter. The spatial variation of the hyperparameter allows different degrees of resolution in areas of different statistical characteristics, thus avoiding the large residuals resulting from algorithms that use a constant hyperparameter. In the first implementation of the algorithm, we begin by segmenting a Maximum Likelihood Estimator (MLE) reconstruction. The segmentation method is based on using a wavelet decomposition and a self-organizing neural network. The result is a predetermined number of extended regions plus a small region for each star or bright object. To assign a different value of the hyperparameter to each extended region and star, we use either feasibility tests or cross-validation methods. Once the set of hyperparameters is obtained, we carried out the final Bayesian reconstruction, leading to a reconstruction with decreased bias and excellent visual characteristics. The method has been applied to data from the non-refurbished Hubble Space Telescope. The method can be also applied to ground-based images.
Resumo:
BACKGROUND AND OBJECTIVES: The estimated GFR (eGFR) is important in clinical practice. To find the best formula for eGFR, this study assessed the best model of correlation between sinistrin clearance (iGFR) and the solely or combined cystatin C (CysC)- and serum creatinine (SCreat)-derived models. It also evaluated the accuracy of the combined Schwartz formula across all GFR levels. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: Two hundred thirty-eight iGFRs performed between January 2012 and April 2013 for 238 children were analyzed. Regression techniques were used to fit the different equations used for eGFR (i.e., logarithmic, inverse, linear, and quadratic). The performance of each model was evaluated using the Cohen κ correlation coefficient and the percentage reaching 30% accuracy was calculated. RESULTS: The best model of correlation between iGFRs and CysC is linear; however, it presents a low κ coefficient (0.24) and is far below the Kidney Disease Outcomes Quality Initiative targets to be validated, with only 84% of eGFRs reaching accuracy of 30%. SCreat and iGFRs showed the best correlation in a fitted quadratic model with a κ coefficient of 0.53 and 93% accuracy. Adding CysC significantly (P<0.001) increased the κ coefficient to 0.56 and the quadratic model accuracy to 97%. Therefore, a combined SCreat and CysC quadratic formula was derived and internally validated using the cross-validation technique. This quadratic formula significantly outperformed the combined Schwartz formula, which was biased for an iGFR≥91 ml/min per 1.73 m(2). CONCLUSIONS: This study allowed deriving a new combined SCreat and CysC quadratic formula that could replace the combined Schwartz formula, which is accurate only for children with moderate chronic kidney disease.
Resumo:
In the areas where irrigated rice is grown in the south of Brazil, few studies have been carried out to investigate the spatial variability structure of soil properties and to establish new forms of soil management as well as determine soil corrective and fertilizer applications. In this sense, this study had the objective of evaluating the spatial variability of chemical, physical and biological soil properties in a lowland area under irrigated rice cultivation in the conventional till system. For this purpose, a 10 x 10 m grid of 100 points was established, in an experimental field of the Embrapa Clima Temperado, in the County of Capão do Leão, State of Rio Grande do Sul. The spatial variability structure was evaluated by geostatistical tools and the number of subsamples required to represent each soil property in future studies was calculated using classical statistics. Results showed that the spatial variability structure of sand, silt, SMP index, cation exchange capacity (pH 7.0), Al3+ and total N properties could be detected by geostatistical analysis. A pure nugget effect was observed for the nutrients K, S and B, as well as macroporosity, mean weighted diameter of aggregates, and soil water storage. The cross validation procedure, based on linear regression and the determination coefficient, was more efficient to evaluate the quality of the adjusted mathematical model than the degree of spatial dependence. It was also concluded that the combination of classical with geostatistics can in many cases simplify the soil sampling process without losing information quality.
Resumo:
The estimation of non available soil variables through the knowledge of other related measured variables can be achieved through pedotransfer functions (PTF) mainly saving time and reducing cost. Great differences among soils, however, can yield non desirable results when applying this method. This study discusses the application of developed PTFs by several authors using a variety of soils of different characteristics, to evaluate soil water contents of two Brazilian lowland soils. Comparisons are made between PTF evaluated data and field measured data, using statistical and geostatistical tools, like mean error, root mean square error, semivariogram, cross-validation, and regression coefficient. The eight tested PTFs to evaluate gravimetric soil water contents (Ug) at the tensions of 33 kPa and 1,500 kPa presented a tendency to overestimate Ug 33 kPa and underestimate Ug1,500 kPa. The PTFs were ranked according to their performance and also with respect to their potential in describing the structure of the spatial variability of the set of measured values. Although none of the PTFs have changed the distribution pattern of the data, all resulted in mean and variance statistically different from those observed for all measured values. The PTFs that presented the best predictive values of Ug33 kPa and Ug1,500 kPa were not the same that had the best performance to reproduce the structure of spatial variability of these variables.