36 resultados para cross validation

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper describes informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography (GCxGC) and high-resolution mass spectrometry (HRMS). GCxGC-HRMS analysis produces large data sets that are rich with information, but highly complex. The size of the data and volume of information requires automated processing for comprehensive cross-sample analysis, but the complexity poses a challenge for developing robust methods. The approach developed here analyzes GCxGC-HRMS data from multiple samples to extract a feature template that comprehensively captures the pattern of peaks detected in the retention-times plane. Then, for each sample chromatogram, the template is geometrically transformed to align with the detected peak pattern and generate a set of feature measurements for cross-sample analyses such as sample classification and biomarker discovery. The approach avoids the intractable problem of comprehensive peak matching by using a few reliable peaks for alignment and peak-based retention-plane windows to define comprehensive features that can be reliably matched for cross-sample analysis. The informatics are demonstrated with a set of 18 samples from breast-cancer tumors, each from different individuals, six each for Grades 1-3. The features allow classification that matches grading by a cancer pathologist with 78% success in leave-one-out cross-validation experiments. The HRMS signatures of the features of interest can be examined for determining elemental compositions and identifying compounds.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To check the effectiveness of campaigns preventing drug abuse or indicating local effects of efforts against drug trafficking, it is beneficial to know consumed amounts of substances in a high spatial and temporal resolution. The analysis of drugs of abuse in wastewater (WW) has the potential to provide this information. In this study, the reliability of WW drug consumption estimates is assessed and a novel method presented to calculate the total uncertainty in observed WW cocaine (COC) and benzoylecgonine (BE) loads. Specifically, uncertainties resulting from discharge measurements, chemical analysis and the applied sampling scheme were addressed and three approaches presented. These consist of (i) a generic model-based procedure to investigate the influence of the sampling scheme on the uncertainty of observed or expected drug loads, (ii) a comparative analysis of two analytical methods (high performance liquid chromatography-tandem mass spectrometry and gas chromatography-mass spectrometry), including an extended cross-validation by influent profiling over several days, and (iii) monitoring COC and BE concentrations in WW of the largest Swiss sewage treatment plants. In addition, the COC and BE loads observed in the sewage treatment plant of the city of Berne were used to back-calculate the COC consumption. The estimated mean daily consumed amount was 107 ± 21 g of pure COC, corresponding to 321 g of street-grade COC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Climate and environmental reconstructions from natural archives are important for the interpretation of current climatic change. Few quantitative high-resolution reconstructions exist for South America which is the only land mass extending from the tropics to the southern high latitudes at 56°S. We analyzed sediment cores from two adjacent lakes in Northern Chilean Patagonia, Lago Castor (45°36′S, 71°47′W) and Laguna Escondida (45°31′S, 71°49′W). Radiometric dating (210Pb, 137Cs, 14C-AMS) suggests that the cores reach back to c. 900 BC (Laguna Escondida) and c. 1900 BC (Lago Castor). Both lakes show similarities and reproducibility in sedimentation rate changes and tephra layer deposition. We found eight macroscopic tephras (0.2–5.5 cm thick) dated at 1950 BC, 1700 BC, at 300 BC, 50 BC, 90 AD, 160 AD, 400 AD and at 900 AD. These can be used as regional time-synchronous stratigraphic markers. The two thickest tephras represent known well-dated explosive eruptions of Hudson volcano around 1950 and 300 BC. Biogenic silica flux revealed in both lakes a climate signal and correlation with annual temperature reanalysis data (calibration 1900–2006 AD; Lago Castor r = 0.37; Laguna Escondida r = 0.42, seven years filtered data). We used a linear inverse regression plus scaling model for calibration and leave-one-out cross-validation (RMSEv = 0.56 °C) to reconstruct sub decadal-scale temperature variability for Laguna Escondida back to AD 400. The lower part of the core from Laguna Escondida prior to AD 400 and the core of Lago Castor are strongly influenced by primary and secondary tephras and, therefore, not used for the temperature reconstruction. The temperature reconstruction from Laguna Escondida shows cold conditions in the 5th century (relative to the 20th century mean), warmer temperatures from AD 600 to AD 1150 and colder temperatures from AD 1200 to AD 1450. From AD 1450 to AD 1700 our reconstruction shows a period with stronger variability and on average higher values than the 20th century mean. Until AD 1900 the temperature values decrease but stay slightly above the 20th century mean. Most of the centennial-scale features are reproduced in the few other natural climate archives in the region. The early onset of cool conditions from c. AD 1200 onward seems to be confirmed for this region.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nitazoxanide (2-acetolyloxy-N-(5-nitro 2-thiazolyl) benzamide; NTZ) represents the parent compound of a novel class of broad-spectrum anti-parasitic compounds named thiazolides. NTZ is active against a wide variety of intestinal and tissue-dwelling helminths, protozoa, enteric bacteria and a number of viruses infecting animals and humans. While potent, this poses a problem in practice, since this obvious non-selectivity can lead to undesired side effects in both humans and animals. In this study, we used real time PCR to determine the in vitro activities of 29 different thiazolides (NTZ-derivatives), which carry distinct modifications on both the thiazole- and the benzene moieties, against the tachyzoite stage of the intracellular protozoan Neospora caninum. The goal was to identify a highly active compound lacking the undesirable nitro group, which would have a more specific applicability, such as in food animals. By applying self-organizing molecular field analysis (SOMFA), these data were used to develop a predictive model for future drug design. SOMFA performs self-alignment of the molecules, and takes into account the steric and electrostatic properties, in order to determine 3D-quantitative structure activity relationship models. The best model was obtained by overlay of the thiazole moieties. Plotting of predicted versus experimentally determined activity produced an r2 value of 0.8052 and cross-validation using the "leave one out" methodology resulted in a q2 value of 0.7987. A master grid map showed that large steric groups at the R2 position, the nitrogen of the amide bond and position Y could greatly reduce activity, and the presence of large steric groups placed at positions X, R4 and surrounding the oxygen atom of the amide bond, may increase the activity of thiazolides against Neospora caninum tachyzoites. The model obtained here will be an important predictive tool for future development of this important class of drugs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Many HIV-infected patients on highly active antiretroviral therapy (HAART) experience metabolic complications including dyslipidaemia and insulin resistance, which may increase their coronary heart disease (CHD) risk. We developed a prognostic model for CHD tailored to the changes in risk factors observed in patients starting HAART. METHODS: Data from five cohort studies (British Regional Heart Study, Caerphilly and Speedwell Studies, Framingham Offspring Study, Whitehall II) on 13,100 men aged 40-70 and 114,443 years of follow up were used. CHD was defined as myocardial infarction or death from CHD. Model fit was assessed using the Akaike Information Criterion; generalizability across cohorts was examined using internal-external cross-validation. RESULTS: A parametric model based on the Gompertz distribution generalized best. Variables included in the model were systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, triglyceride, glucose, diabetes mellitus, body mass index and smoking status. Compared with patients not on HAART, the estimated CHD hazard ratio (HR) for patients on HAART was 1.46 (95% CI 1.15-1.86) for moderate and 2.48 (95% CI 1.76-3.51) for severe metabolic complications. CONCLUSIONS: The change in the risk of CHD in HIV-infected men starting HAART can be estimated based on typical changes in risk factors, assuming that HRs estimated using data from non-infected men are applicable to HIV-infected men. Based on this model the risk of CHD is likely to increase, but increases may often be modest, and could be offset by lifestyle changes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this study, we demonstrate the power of applying complementary DNA (cDNA) microarray technology to identifying candidate loci that exhibit subtle differences in expression levels associated with a complex trait in natural populations of a nonmodel organism. Using a highly replicated experimental design involving 180 cDNA microarray experiments, we measured gene-expression levels from 1098 transcript probes in 90 individuals originating from six brown trout (Salmo trutta) and one Atlantic salmon (Salmo salar) population, which follow either a migratory or a sedentary life history. We identified several candidate genes associated with preparatory adaptations to different life histories in salmonids, including genes encoding for transaldolase 1, constitutive heat-shock protein HSC70-1 and endozepine. Some of these genes clustered into functional groups, providing insight into the physiological pathways potentially involved in the expression of life-history related phenotypic differences. Such differences included the down-regulation of genes involved in the respiratory system of future migratory individuals. In addition, we used linear discriminant analysis to identify a set of 12 genes that correctly classified immature individuals as migratory or sedentary with high accuracy. Using the expression levels of these 12 genes, 17 out of 18 individuals used for cross-validation were correctly assigned to their respective life-history phenotype. Finally, we found various candidate genes associated with physiological changes that are likely to be involved in preadaptations to seawater in anadromous populations of the genus Salmo, one of which was identified to encode for nucleophosmin 1. Our findings thus provide new molecular insights into salmonid life-history variation, opening new perspectives in the study of this complex trait.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

High-resolution and highly precise age models for recent lake sediments (last 100–150 years) are essential for quantitative paleoclimate research. These are particularly important for sedimentological and geochemical proxies, where transfer functions cannot be established and calibration must be based upon the relation of sedimentary records to instrumental data. High-precision dating for the calibration period is most critical as it determines directly the quality of the calibration statistics. Here, as an example, we compare radionuclide age models obtained on two high-elevation glacial lakes in the Central Chilean Andes (Laguna Negra: 33°38′S/70°08′W, 2,680 m a.s.l. and Laguna El Ocho: 34°02′S/70°19′W, 3,250 m a.s.l.). We show the different numerical models that produce accurate age-depth chronologies based on 210Pb profiles, and we explain how to obtain reduced age-error bars at the bottom part of the profiles, i.e., typically around the end of the 19th century. In order to constrain the age models, we propose a method with five steps: (i) sampling at irregularly-spaced intervals for 226Ra, 210Pb and 137Cs depending on the stratigraphy and microfacies, (ii) a systematic comparison of numerical models for the calculation of 210Pb-based age models: constant flux constant sedimentation (CFCS), constant initial concentration (CIC), constant rate of supply (CRS) and sediment isotope tomography (SIT), (iii) numerical constraining of the CRS and SIT models with the 137Cs chronomarker of AD 1964 and, (iv) step-wise cross-validation with independent diagnostic environmental stratigraphic markers of known age (e.g., volcanic ash layer, historical flood and earthquakes). In both examples, we also use airborne pollutants such as spheroidal carbonaceous particles (reflecting the history of fossil fuel emissions), excess atmospheric Cu deposition (reflecting the production history of a large local Cu mine), and turbidites related to historical earthquakes. Our results show that the SIT model constrained with the 137Cs AD 1964 peak performs best over the entire chronological profile (last 100–150 years) and yields the smallest standard deviations for the sediment ages. Such precision is critical for the calibration statistics, and ultimately, for the quality of the quantitative paleoclimate reconstruction. The systematic comparison of CRS and SIT models also helps to validate the robustness of the chronologies in different sections of the profile. Although surprisingly poorly known and under-explored in paleolimnological research, the SIT model has a great potential in paleoclimatological reconstructions based on lake sediments

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Peatlands are widely exploited archives of paleoenvironmental change. We developed and compared multiple transfer functions to infer peatland depth to the water table (DWT) and pH based on testate amoeba (percentages, or presence/absence), bryophyte presence/absence, and vascular plant presence/absence data from sub-alpine peatlands in the SE Swiss Alps in order to 1) compare the performance of single-proxy vs. multi-proxy models and 2) assess the performance of presence/absence models. Bootstrapping cross-validation showing the best performing single-proxy transfer functions for both DWT and pH were those based on bryophytes. The best performing transfer functions overall for DWT were those based on combined testate amoebae percentages, bryophytes and vascular plants; and, for pH, those based on testate amoebae and bryophytes. The comparison of DWT and pH inferred from testate amoeba percentages and presence/absence data showed similar general patterns but differences in the magnitude and timing of some shifts. These results show new directions for paleoenvironmental research, 1) suggesting that it is possible to build good-performing transfer functions using presence/absence data, although with some loss of accuracy, and 2) supporting the idea that multi-proxy inference models may improve paleoecological reconstruction. The performance of multi-proxy and single-proxy transfer functions should be further compared in paleoecological data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dynamic changes in ERP topographies can be conveniently analyzed by means of microstates, the so-called "atoms of thoughts", that represent brief periods of quasi-stable synchronized network activation. Comparing temporal microstate features such as on- and offset or duration between groups and conditions therefore allows a precise assessment of the timing of cognitive processes. So far, this has been achieved by assigning the individual time-varying ERP maps to spatially defined microstate templates obtained from clustering the grand mean data into predetermined numbers of topographies (microstate prototypes). Features obtained from these individual assignments were then statistically compared. This has the problem that the individual noise dilutes the match between individual topographies and templates leading to lower statistical power. We therefore propose a randomization-based procedure that works without assigning grand-mean microstate prototypes to individual data. In addition, we propose a new criterion to select the optimal number of microstate prototypes based on cross-validation across subjects. After a formal introduction, the method is applied to a sample data set of an N400 experiment and to simulated data with varying signal-to-noise ratios, and the results are compared to existing methods. In a first comparison with previously employed statistical procedures, the new method showed an increased robustness to noise, and a higher sensitivity for more subtle effects of microstate timing. We conclude that the proposed method is well-suited for the assessment of timing differences in cognitive processes. The increased statistical power allows identifying more subtle effects, which is particularly important in small and scarce patient populations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Soil spectroscopy was applied for predicting soil organic carbon (SOC) in the highlands of Ethiopia. Soil samples were acquired from Ethiopia’s National Soil Testing Centre and direct field sampling. The reflectance of samples was measured using a FieldSpec 3 diffuse reflectance spectrometer. Outliers and sample relation were evaluated using principal component analysis (PCA) and models were developed through partial least square regression (PLSR). For nine watersheds sampled, 20% of the samples were set aside to test prediction and 80% were used to develop calibration models. Depending on the number of samples per watershed, cross validation or independent validation were used.The stability of models was evaluated using coefficient of determination (R2), root mean square error (RMSE), and the ratio performance deviation (RPD). The R2 (%), RMSE (%), and RPD, respectively, for validation were Anjeni (88, 0.44, 3.05), Bale (86, 0.52, 2.7), Basketo (89, 0.57, 3.0), Benishangul (91, 0.30, 3.4), Kersa (82, 0.44, 2.4), Kola tembien (75, 0.44, 1.9),Maybar (84. 0.57, 2.5),Megech (85, 0.15, 2.6), andWondoGenet (86, 0.52, 2.7) indicating that themodels were stable. Models performed better for areas with high SOC values than areas with lower SOC values. Overall, soil spectroscopy performance ranged from very good to good.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper studied two different regression techniques for pelvic shape prediction, i.e., the partial least square regression (PLSR) and the principal component regression (PCR). Three different predictors such as surface landmarks, morphological parameters, or surface models of neighboring structures were used in a cross-validation study to predict the pelvic shape. Results obtained from applying these two different regression techniques were compared to the population mean model. In almost all the prediction experiments, both regression techniques unanimously generated better results than the population mean model, while the difference on prediction accuracy between these two regression methods is not statistically significant (α=0.01).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present an independent calibration model for the determination of biogenic silica (BSi) in sediments, developed from analysis of synthetic sediment mixtures and application of Fourier transform infrared spectroscopy (FTIRS) and partial least squares regression (PLSR) modeling. In contrast to current FTIRS applications for quantifying BSi, this new calibration is independent from conventional wet-chemical techniques and their associated measurement uncertainties. This approach also removes the need for developing internal calibrations between the two methods for individual sediments records. For the independent calibration, we produced six series of different synthetic sediment mixtures using two purified diatom extracts, with one extract mixed with quartz sand, calcite, 60/40 quartz/calcite and two different natural sediments, and a second extract mixed with one of the natural sediments. A total of 306 samples—51 samples per series—yielded BSi contents ranging from 0 to 100 %. The resulting PLSR calibration model between the FTIR spectral information and the defined BSi concentration of the synthetic sediment mixtures exhibits a strong cross-validated correlation ( R2cv = 0.97) and a low root-mean square error of cross-validation (RMSECV = 4.7 %). Application of the independent calibration to natural lacustrine and marine sediments yields robust BSi reconstructions. At present, the synthetic mixtures do not include the variation in organic matter that occurs in natural samples, which may explain the somewhat lower prediction accuracy of the calibration model for organic-rich samples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVES This study aimed to update the Logistic Clinical SYNTAX score to predict 3-year survival after percutaneous coronary intervention (PCI) and compare the performance with the SYNTAX score alone. BACKGROUND The SYNTAX score is a well-established angiographic tool to predict long-term outcomes after PCI. The Logistic Clinical SYNTAX score, developed by combining clinical variables with the anatomic SYNTAX score, has been shown to perform better than the SYNTAX score alone in predicting 1-year outcomes after PCI. However, the ability of this score to predict long-term survival is unknown. METHODS Patient-level data (N = 6,304, 399 deaths within 3 years) from 7 contemporary PCI trials were analyzed. We revised the overall risk and the predictor effects in the core model (SYNTAX score, age, creatinine clearance, and left ventricular ejection fraction) using Cox regression analysis to predict mortality at 3 years. We also updated the extended model by combining the core model with additional independent predictors of 3-year mortality (i.e., diabetes mellitus, peripheral vascular disease, and body mass index). RESULTS The revised Logistic Clinical SYNTAX models showed better discriminative ability than the anatomic SYNTAX score for the prediction of 3-year mortality after PCI (c-index: SYNTAX score, 0.61; core model, 0.71; and extended model, 0.73 in a cross-validation procedure). The extended model in particular performed better in differentiating low- and intermediate-risk groups. CONCLUSIONS Risk scores combining clinical characteristics with the anatomic SYNTAX score substantially better predict 3-year mortality than the SYNTAX score alone and should be used for long-term risk stratification of patients undergoing PCI.