38 resultados para Dirichlet Regression compositional model.


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computationally efficient than training a single SVR model while reducing error. Boosting, however, does not improve results on this specific problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

RATIONALE: An objective and simple prognostic model for patients with pulmonary embolism could be helpful in guiding initial intensity of treatment. OBJECTIVES: To develop a clinical prediction rule that accurately classifies patients with pulmonary embolism into categories of increasing risk of mortality and other adverse medical outcomes. METHODS: We randomly allocated 15,531 inpatient discharges with pulmonary embolism from 186 Pennsylvania hospitals to derivation (67%) and internal validation (33%) samples. We derived our prediction rule using logistic regression with 30-day mortality as the primary outcome, and patient demographic and clinical data routinely available at presentation as potential predictor variables. We externally validated the rule in 221 inpatients with pulmonary embolism from Switzerland and France. MEASUREMENTS: We compared mortality and nonfatal adverse medical outcomes across the derivation and two validation samples. MAIN RESULTS: The prediction rule is based on 11 simple patient characteristics that were independently associated with mortality and stratifies patients with pulmonary embolism into five severity classes, with 30-day mortality rates of 0-1.6% in class I, 1.7-3.5% in class II, 3.2-7.1% in class III, 4.0-11.4% in class IV, and 10.0-24.5% in class V across the derivation and validation samples. Inpatient death and nonfatal complications were <or= 1.1% among patients in class I and <or= 1.9% among patients in class II. CONCLUSIONS: Our rule accurately classifies patients with pulmonary embolism into classes of increasing risk of mortality and other adverse medical outcomes. Further validation of the rule is important before its implementation as a decision aid to guide the initial management of patients with pulmonary embolism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spatial data analysis mapping and visualization is of great importance in various fields: environment, pollution, natural hazards and risks, epidemiology, spatial econometrics, etc. A basic task of spatial mapping is to make predictions based on some empirical data (measurements). A number of state-of-the-art methods can be used for the task: deterministic interpolations, methods of geostatistics: the family of kriging estimators (Deutsch and Journel, 1997), machine learning algorithms such as artificial neural networks (ANN) of different architectures, hybrid ANN-geostatistics models (Kanevski and Maignan, 2004; Kanevski et al., 1996), etc. All the methods mentioned above can be used for solving the problem of spatial data mapping. Environmental empirical data are always contaminated/corrupted by noise, and often with noise of unknown nature. That's one of the reasons why deterministic models can be inconsistent, since they treat the measurements as values of some unknown function that should be interpolated. Kriging estimators treat the measurements as the realization of some spatial randomn process. To obtain the estimation with kriging one has to model the spatial structure of the data: spatial correlation function or (semi-)variogram. This task can be complicated if there is not sufficient number of measurements and variogram is sensitive to outliers and extremes. ANN is a powerful tool, but it also suffers from the number of reasons. of a special type ? multiplayer perceptrons ? are often used as a detrending tool in hybrid (ANN+geostatistics) models (Kanevski and Maignank, 2004). Therefore, development and adaptation of the method that would be nonlinear and robust to noise in measurements, would deal with the small empirical datasets and which has solid mathematical background is of great importance. The present paper deals with such model, based on Statistical Learning Theory (SLT) - Support Vector Regression. SLT is a general mathematical framework devoted to the problem of estimation of the dependencies from empirical data (Hastie et al, 2004; Vapnik, 1998). SLT models for classification - Support Vector Machines - have shown good results on different machine learning tasks. The results of SVM classification of spatial data are also promising (Kanevski et al, 2002). The properties of SVM for regression - Support Vector Regression (SVR) are less studied. First results of the application of SVR for spatial mapping of physical quantities were obtained by the authorsin for mapping of medium porosity (Kanevski et al, 1999), and for mapping of radioactively contaminated territories (Kanevski and Canu, 2000). The present paper is devoted to further understanding of the properties of SVR model for spatial data analysis and mapping. Detailed description of the SVR theory can be found in (Cristianini and Shawe-Taylor, 2000; Smola, 1996) and basic equations for the nonlinear modeling are given in section 2. Section 3 discusses the application of SVR for spatial data mapping on the real case study - soil pollution by Cs137 radionuclide. Section 4 discusses the properties of the modelapplied to noised data or data with outliers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the general regression neural networks (GRNN) as a nonlinear regression method for the interpolation of monthly wind speeds in complex Alpine orography. GRNN is trained using data coming from Swiss meteorological networks to learn the statistical relationship between topographic features and wind speed. The terrain convexity, slope and exposure are considered by extracting features from the digital elevation model at different spatial scales using specialised convolution filters. A database of gridded monthly wind speeds is then constructed by applying GRNN in prediction mode during the period 1968-2008. This study demonstrates that using topographic features as inputs in GRNN significantly reduces cross-validation errors with respect to low-dimensional models integrating only geographical coordinates and terrain height for the interpolation of wind speed. The spatial predictability of wind speed is found to be lower in summer than in winter due to more complex and weaker wind-topography relationships. The relevance of these relationships is studied using an adaptive version of the GRNN algorithm which allows to select the useful terrain features by eliminating the noisy ones. This research provides a framework for extending the low-dimensional interpolation models to high-dimensional spaces by integrating additional features accounting for the topographic conditions at multiple spatial scales. Copyright (c) 2012 Royal Meteorological Society.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

NlmCategory="UNASSIGNED">The efficacy of antitumoral responses can be increased using combinatorial vaccine strategies. We recently showed that vaccination could be optimized by local administration of diverse molecular or bacterial agents to target and augment antitumoral CD8 T cells in the genital mucosa (GM) and increase regression of cervical cancer in an animal model. Non muscle-invasive bladder cancer is another disease that is easily amenable to local therapies. In contrast to data obtained in the GM, in this study we show that intravesical (IVES) instillation of synthetic toll-like receptor (TLR) agonists only modestly induced recruitment of CD8 T cells to the bladder. However, IVES administration of Ty21a, a live bacterial vaccine against typhoid fever, was much more effective and increased the number of total and vaccine-specific CD8 T cells in the bladder approximately 10 fold. Comparison of chemokines induced in the bladder by either CpG (a TLR-9 agonist) or Ty21a highlighted the preferential increase in complement component 5a, CXCL5, CXCL2, CCL8, and CCL5 by Ty21a, suggesting their involvement in the attraction of T cells to the bladder. IVES treatment with Ty21a after vaccination also significantly increased tumor regression compared to vaccination alone, resulting in 90% survival in an orthotopic murine model of bladder cancer expressing a prototype tumor antigen. Our data demonstrate that combining vaccination with local immunostimulation may be an effective treatment strategy for different types of cancer and also highlight the great potential of the Ty21a vaccine, which is routinely used worldwide, in such combinatorial therapies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background. We elaborated a model that predicts the centiles of the 25(OH)D distribution taking into account seasonal variation. Methods. Data from two Swiss population-based studies were used to generate (CoLaus) and validate (Bus Santé) the model. Serum 25(OH)D was measured by ultra high pressure LC-MS/MS and immunoassay. Linear regression models on square-root transformed 25(OH)D values were used to predict centiles of the 25(OH)D distribution. Distribution functions of the observations from the replication set predicted with the model were inspected to assess replication. Results. Overall, 4,912 and 2,537 Caucasians were included in original and replication sets, respectively. Mean (SD) 25(OH)D, age, BMI, and % of men were 47.5 (22.1) nmol/L, 49.8 (8.5) years, 25.6 (4.1) kg/m(2), and 49.3% in the original study. The best model included gender, BMI, and sin-cos functions of measurement day. Sex- and BMI-specific 25(OH)D centile curves as a function of measurement date were generated. The model estimates any centile of the 25(OH)D distribution for given values of sex, BMI, and date and the quantile corresponding to a 25(OH)D measurement. Conclusions. We generated and validated centile curves of 25(OH)D in the general adult Caucasian population. These curves can help rank vitamin D centile independently of when 25(OH)D is measured.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As increasingly large molecular data sets are collected for phylogenomics, the conflicting phylogenetic signal among gene trees poses challenges to resolve some difficult nodes of the Tree of Life. Among these nodes, the phylogenetic position of the honey bees (Apini) within the corbiculate bee group remains controversial, despite its considerable importance for understanding the emergence and maintenance of eusociality. Here, we show that this controversy stems in part from pervasive phylogenetic conflicts among GC-rich gene trees. GC-rich genes typically have a high nucleotidic heterogeneity among species, which can induce topological conflicts among gene trees. When retaining only the most GC-homogeneous genes or using a nonhomogeneous model of sequence evolution, our analyses reveal a monophyletic group of the three lineages with a eusocial lifestyle (honey bees, bumble bees, and stingless bees). These phylogenetic relationships strongly suggest a single origin of eusociality in the corbiculate bees, with no reversal to solitary living in this group. To accurately reconstruct other important evolutionary steps across the Tree of Life, we suggest removing GC-rich and GC-heterogeneous genes from large phylogenomic data sets. Interpreted as a consequence of genome-wide variations in recombination rates, this GC effect can affect all taxa featuring GC-biased gene conversion, which is common in eukaryotes.