27 resultados para Lanczos, Linear systems, Generalized cross validation

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Index tracking has become one of the most common strategies in asset management. The index-tracking problem consists of constructing a portfolio that replicates the future performance of an index by including only a subset of the index constituents in the portfolio. Finding the most representative subset is challenging when the number of stocks in the index is large. We introduce a new three-stage approach that at first identifies promising subsets by employing data-mining techniques, then determines the stock weights in the subsets using mixed-binary linear programming, and finally evaluates the subsets based on cross validation. The best subset is returned as the tracking portfolio. Our approach outperforms state-of-the-art methods in terms of out-of-sample performance and running times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes informatics for cross-sample analysis with comprehensive two-dimensional gas chromatography (GCxGC) and high-resolution mass spectrometry (HRMS). GCxGC-HRMS analysis produces large data sets that are rich with information, but highly complex. The size of the data and volume of information requires automated processing for comprehensive cross-sample analysis, but the complexity poses a challenge for developing robust methods. The approach developed here analyzes GCxGC-HRMS data from multiple samples to extract a feature template that comprehensively captures the pattern of peaks detected in the retention-times plane. Then, for each sample chromatogram, the template is geometrically transformed to align with the detected peak pattern and generate a set of feature measurements for cross-sample analyses such as sample classification and biomarker discovery. The approach avoids the intractable problem of comprehensive peak matching by using a few reliable peaks for alignment and peak-based retention-plane windows to define comprehensive features that can be reliably matched for cross-sample analysis. The informatics are demonstrated with a set of 18 samples from breast-cancer tumors, each from different individuals, six each for Grades 1-3. The features allow classification that matches grading by a cancer pathologist with 78% success in leave-one-out cross-validation experiments. The HRMS signatures of the features of interest can be examined for determining elemental compositions and identifying compounds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Climate and environmental reconstructions from natural archives are important for the interpretation of current climatic change. Few quantitative high-resolution reconstructions exist for South America which is the only land mass extending from the tropics to the southern high latitudes at 56°S. We analyzed sediment cores from two adjacent lakes in Northern Chilean Patagonia, Lago Castor (45°36′S, 71°47′W) and Laguna Escondida (45°31′S, 71°49′W). Radiometric dating (210Pb, 137Cs, 14C-AMS) suggests that the cores reach back to c. 900 BC (Laguna Escondida) and c. 1900 BC (Lago Castor). Both lakes show similarities and reproducibility in sedimentation rate changes and tephra layer deposition. We found eight macroscopic tephras (0.2–5.5 cm thick) dated at 1950 BC, 1700 BC, at 300 BC, 50 BC, 90 AD, 160 AD, 400 AD and at 900 AD. These can be used as regional time-synchronous stratigraphic markers. The two thickest tephras represent known well-dated explosive eruptions of Hudson volcano around 1950 and 300 BC. Biogenic silica flux revealed in both lakes a climate signal and correlation with annual temperature reanalysis data (calibration 1900–2006 AD; Lago Castor r = 0.37; Laguna Escondida r = 0.42, seven years filtered data). We used a linear inverse regression plus scaling model for calibration and leave-one-out cross-validation (RMSEv = 0.56 °C) to reconstruct sub decadal-scale temperature variability for Laguna Escondida back to AD 400. The lower part of the core from Laguna Escondida prior to AD 400 and the core of Lago Castor are strongly influenced by primary and secondary tephras and, therefore, not used for the temperature reconstruction. The temperature reconstruction from Laguna Escondida shows cold conditions in the 5th century (relative to the 20th century mean), warmer temperatures from AD 600 to AD 1150 and colder temperatures from AD 1200 to AD 1450. From AD 1450 to AD 1700 our reconstruction shows a period with stronger variability and on average higher values than the 20th century mean. Until AD 1900 the temperature values decrease but stay slightly above the 20th century mean. Most of the centennial-scale features are reproduced in the few other natural climate archives in the region. The early onset of cool conditions from c. AD 1200 onward seems to be confirmed for this region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Many HIV-infected patients on highly active antiretroviral therapy (HAART) experience metabolic complications including dyslipidaemia and insulin resistance, which may increase their coronary heart disease (CHD) risk. We developed a prognostic model for CHD tailored to the changes in risk factors observed in patients starting HAART. METHODS: Data from five cohort studies (British Regional Heart Study, Caerphilly and Speedwell Studies, Framingham Offspring Study, Whitehall II) on 13,100 men aged 40-70 and 114,443 years of follow up were used. CHD was defined as myocardial infarction or death from CHD. Model fit was assessed using the Akaike Information Criterion; generalizability across cohorts was examined using internal-external cross-validation. RESULTS: A parametric model based on the Gompertz distribution generalized best. Variables included in the model were systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, triglyceride, glucose, diabetes mellitus, body mass index and smoking status. Compared with patients not on HAART, the estimated CHD hazard ratio (HR) for patients on HAART was 1.46 (95% CI 1.15-1.86) for moderate and 2.48 (95% CI 1.76-3.51) for severe metabolic complications. CONCLUSIONS: The change in the risk of CHD in HIV-infected men starting HAART can be estimated based on typical changes in risk factors, assuming that HRs estimated using data from non-infected men are applicable to HIV-infected men. Based on this model the risk of CHD is likely to increase, but increases may often be modest, and could be offset by lifestyle changes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we demonstrate the power of applying complementary DNA (cDNA) microarray technology to identifying candidate loci that exhibit subtle differences in expression levels associated with a complex trait in natural populations of a nonmodel organism. Using a highly replicated experimental design involving 180 cDNA microarray experiments, we measured gene-expression levels from 1098 transcript probes in 90 individuals originating from six brown trout (Salmo trutta) and one Atlantic salmon (Salmo salar) population, which follow either a migratory or a sedentary life history. We identified several candidate genes associated with preparatory adaptations to different life histories in salmonids, including genes encoding for transaldolase 1, constitutive heat-shock protein HSC70-1 and endozepine. Some of these genes clustered into functional groups, providing insight into the physiological pathways potentially involved in the expression of life-history related phenotypic differences. Such differences included the down-regulation of genes involved in the respiratory system of future migratory individuals. In addition, we used linear discriminant analysis to identify a set of 12 genes that correctly classified immature individuals as migratory or sedentary with high accuracy. Using the expression levels of these 12 genes, 17 out of 18 individuals used for cross-validation were correctly assigned to their respective life-history phenotype. Finally, we found various candidate genes associated with physiological changes that are likely to be involved in preadaptations to seawater in anadromous populations of the genus Salmo, one of which was identified to encode for nucleophosmin 1. Our findings thus provide new molecular insights into salmonid life-history variation, opening new perspectives in the study of this complex trait.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Extraction of surface models of a hip joint from CT data is a pre-requisite step for computer assisted diagnosis and planning (CADP) of periacetabular osteotomy (PAO). Most of existing CADP systems are based on manual segmentation, which is time-consuming and hard to achieve reproducible results. In this paper, we present a Fully Automatic CT Segmentation (FACTS) approach to simultaneously extract both pelvic and femoral models. Our approach works by combining fast random forest (RF) regression based landmark detection, multi-atlas based segmentation, with articulated statistical shape model (aSSM) based fitting. The two fundamental contributions of our approach are: (1) an improved fast Gaussian transform (IFGT) is used within the RF regression framework for a fast and accurate landmark detection, which then allows for a fully automatic initialization of the multi-atlas based segmentation; and (2) aSSM based fitting is used to preserve hip joint structure and to avoid penetration between the pelvic and femoral models. Taking manual segmentation as the ground truth, we evaluated the present approach on 30 hip CT images (60 hips) with a 6-fold cross validation. When the present approach was compared to manual segmentation, a mean segmentation accuracy of 0.40, 0.36, and 0.36 mm was found for the pelvis, the left proximal femur, and the right proximal femur, respectively. When the models derived from both segmentations were used to compute the PAO diagnosis parameters, a difference of 2.0 ± 1.5°, 2.1 ± 1.6°, and 3.5 ± 2.3% were found for anteversion, inclination, and acetabular coverage, respectively. The achieved accuracy is regarded as clinically accurate enough for our target applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To check the effectiveness of campaigns preventing drug abuse or indicating local effects of efforts against drug trafficking, it is beneficial to know consumed amounts of substances in a high spatial and temporal resolution. The analysis of drugs of abuse in wastewater (WW) has the potential to provide this information. In this study, the reliability of WW drug consumption estimates is assessed and a novel method presented to calculate the total uncertainty in observed WW cocaine (COC) and benzoylecgonine (BE) loads. Specifically, uncertainties resulting from discharge measurements, chemical analysis and the applied sampling scheme were addressed and three approaches presented. These consist of (i) a generic model-based procedure to investigate the influence of the sampling scheme on the uncertainty of observed or expected drug loads, (ii) a comparative analysis of two analytical methods (high performance liquid chromatography-tandem mass spectrometry and gas chromatography-mass spectrometry), including an extended cross-validation by influent profiling over several days, and (iii) monitoring COC and BE concentrations in WW of the largest Swiss sewage treatment plants. In addition, the COC and BE loads observed in the sewage treatment plant of the city of Berne were used to back-calculate the COC consumption. The estimated mean daily consumed amount was 107 ± 21 g of pure COC, corresponding to 321 g of street-grade COC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: To compare clinical outcomes after laparoscopic cholecystectomy (LC) for acute cholecystitis performed at various time-points after hospital admission. Background: Symptomatic gallstones represent an important public health problem with LC the treatment of choice. LC is increasingly offered for acute cholecystitis, however, the optimal time-point for LC in this setting remains a matter of debate. Methods: Analysis was based on the prospective database of the Swiss Association of Laparoscopic and Thoracoscopic Surgery and included patients undergoing emergency LC for acute cholecystitis between 1995 and 2006, grouped according to the time-points of LC since hospital admission (admission day (d0), d1, d2, d3, d4/5, d ≥6). Linear and generalized linear regression models assessed the effect of timing of LC on intra- or postoperative complications, conversion and reoperation rates and length of postoperative hospital stay. Results: Of 4113 patients, 52.8% were female, median age was 59.8 years. Delaying LC resulted in significantly higher conversion rates (from 11.9% at d0 to 27.9% at d ≥6 days after admission, P < 0.001), surgical postoperative complications (5.7% to 13%, P < 0.001) and re-operation rates (0.9% to 3%, P = 0.007), with a significantly longer postoperative hospital stay (P < 0.001). Conclusions: Delaying LC for acute cholecystitis has no advantages, resulting in significantly increased conversion/re-operation rate, postoperative complications and longer postoperative hospital stay. This investigation—one of the largest in the literature—provides compelling evidence that acute cholecystitis merits surgery within 48 hours of hospital admission if impact on the patient and health care system is to be minimized.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nitazoxanide (2-acetolyloxy-N-(5-nitro 2-thiazolyl) benzamide; NTZ) represents the parent compound of a novel class of broad-spectrum anti-parasitic compounds named thiazolides. NTZ is active against a wide variety of intestinal and tissue-dwelling helminths, protozoa, enteric bacteria and a number of viruses infecting animals and humans. While potent, this poses a problem in practice, since this obvious non-selectivity can lead to undesired side effects in both humans and animals. In this study, we used real time PCR to determine the in vitro activities of 29 different thiazolides (NTZ-derivatives), which carry distinct modifications on both the thiazole- and the benzene moieties, against the tachyzoite stage of the intracellular protozoan Neospora caninum. The goal was to identify a highly active compound lacking the undesirable nitro group, which would have a more specific applicability, such as in food animals. By applying self-organizing molecular field analysis (SOMFA), these data were used to develop a predictive model for future drug design. SOMFA performs self-alignment of the molecules, and takes into account the steric and electrostatic properties, in order to determine 3D-quantitative structure activity relationship models. The best model was obtained by overlay of the thiazole moieties. Plotting of predicted versus experimentally determined activity produced an r2 value of 0.8052 and cross-validation using the "leave one out" methodology resulted in a q2 value of 0.7987. A master grid map showed that large steric groups at the R2 position, the nitrogen of the amide bond and position Y could greatly reduce activity, and the presence of large steric groups placed at positions X, R4 and surrounding the oxygen atom of the amide bond, may increase the activity of thiazolides against Neospora caninum tachyzoites. The model obtained here will be an important predictive tool for future development of this important class of drugs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-resolution and highly precise age models for recent lake sediments (last 100–150 years) are essential for quantitative paleoclimate research. These are particularly important for sedimentological and geochemical proxies, where transfer functions cannot be established and calibration must be based upon the relation of sedimentary records to instrumental data. High-precision dating for the calibration period is most critical as it determines directly the quality of the calibration statistics. Here, as an example, we compare radionuclide age models obtained on two high-elevation glacial lakes in the Central Chilean Andes (Laguna Negra: 33°38′S/70°08′W, 2,680 m a.s.l. and Laguna El Ocho: 34°02′S/70°19′W, 3,250 m a.s.l.). We show the different numerical models that produce accurate age-depth chronologies based on 210Pb profiles, and we explain how to obtain reduced age-error bars at the bottom part of the profiles, i.e., typically around the end of the 19th century. In order to constrain the age models, we propose a method with five steps: (i) sampling at irregularly-spaced intervals for 226Ra, 210Pb and 137Cs depending on the stratigraphy and microfacies, (ii) a systematic comparison of numerical models for the calculation of 210Pb-based age models: constant flux constant sedimentation (CFCS), constant initial concentration (CIC), constant rate of supply (CRS) and sediment isotope tomography (SIT), (iii) numerical constraining of the CRS and SIT models with the 137Cs chronomarker of AD 1964 and, (iv) step-wise cross-validation with independent diagnostic environmental stratigraphic markers of known age (e.g., volcanic ash layer, historical flood and earthquakes). In both examples, we also use airborne pollutants such as spheroidal carbonaceous particles (reflecting the history of fossil fuel emissions), excess atmospheric Cu deposition (reflecting the production history of a large local Cu mine), and turbidites related to historical earthquakes. Our results show that the SIT model constrained with the 137Cs AD 1964 peak performs best over the entire chronological profile (last 100–150 years) and yields the smallest standard deviations for the sediment ages. Such precision is critical for the calibration statistics, and ultimately, for the quality of the quantitative paleoclimate reconstruction. The systematic comparison of CRS and SIT models also helps to validate the robustness of the chronologies in different sections of the profile. Although surprisingly poorly known and under-explored in paleolimnological research, the SIT model has a great potential in paleoclimatological reconstructions based on lake sediments

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Peatlands are widely exploited archives of paleoenvironmental change. We developed and compared multiple transfer functions to infer peatland depth to the water table (DWT) and pH based on testate amoeba (percentages, or presence/absence), bryophyte presence/absence, and vascular plant presence/absence data from sub-alpine peatlands in the SE Swiss Alps in order to 1) compare the performance of single-proxy vs. multi-proxy models and 2) assess the performance of presence/absence models. Bootstrapping cross-validation showing the best performing single-proxy transfer functions for both DWT and pH were those based on bryophytes. The best performing transfer functions overall for DWT were those based on combined testate amoebae percentages, bryophytes and vascular plants; and, for pH, those based on testate amoebae and bryophytes. The comparison of DWT and pH inferred from testate amoeba percentages and presence/absence data showed similar general patterns but differences in the magnitude and timing of some shifts. These results show new directions for paleoenvironmental research, 1) suggesting that it is possible to build good-performing transfer functions using presence/absence data, although with some loss of accuracy, and 2) supporting the idea that multi-proxy inference models may improve paleoecological reconstruction. The performance of multi-proxy and single-proxy transfer functions should be further compared in paleoecological data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dynamic changes in ERP topographies can be conveniently analyzed by means of microstates, the so-called "atoms of thoughts", that represent brief periods of quasi-stable synchronized network activation. Comparing temporal microstate features such as on- and offset or duration between groups and conditions therefore allows a precise assessment of the timing of cognitive processes. So far, this has been achieved by assigning the individual time-varying ERP maps to spatially defined microstate templates obtained from clustering the grand mean data into predetermined numbers of topographies (microstate prototypes). Features obtained from these individual assignments were then statistically compared. This has the problem that the individual noise dilutes the match between individual topographies and templates leading to lower statistical power. We therefore propose a randomization-based procedure that works without assigning grand-mean microstate prototypes to individual data. In addition, we propose a new criterion to select the optimal number of microstate prototypes based on cross-validation across subjects. After a formal introduction, the method is applied to a sample data set of an N400 experiment and to simulated data with varying signal-to-noise ratios, and the results are compared to existing methods. In a first comparison with previously employed statistical procedures, the new method showed an increased robustness to noise, and a higher sensitivity for more subtle effects of microstate timing. We conclude that the proposed method is well-suited for the assessment of timing differences in cognitive processes. The increased statistical power allows identifying more subtle effects, which is particularly important in small and scarce patient populations.