Biblioteca Digital

945 resultados para multivariate regression tree

Selection bias and the cross-validation of regression models for prediction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Genetic variants in the mammalian target of rapamycin (mTOR) signaling pathway as predictors of survival and clinical response in women with ovarian cancer

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background. The mTOR pathway is commonly altered in human tumors and promotes cell survival and proliferation. Preliminary evidence suggests this pathway's involvement in chemoresistance to platinum and taxanes, first line therapy for epithelial ovarian cancer. A pathway-based approach was used to identify individual germline single nucleotide polymorphisms (SNPs) and cumulative effects of multiple genetic variants in mTOR pathway genes and their association with clinical outcome in women with ovarian cancer. ^ Methods. The case-series was restricted to 319 non-Hispanic white women with high grade ovarian cancer treated with surgery and platinum-based chemotherapy. 135 SNPs in 20 representative genes in the mTOR pathway were genotyped. Hazard ratios (HRs) for death and Odds ratios (ORs) for failure to respond to primary therapy were estimated for each SNP using the multivariate Cox proportional hazards model and multivariate logistic regression model, respectively, while adjusting for age, stage, histology and treatment sequence. A survival tree analysis of SNPs with a statistically significant association (p<0.05) was performed to identify higher order gene-gene interactions and their association with overall survival. ^ Results. There was no statistically significant difference in survival by tumor histology or treatment regimen. The median survival for the cohort was 48.3 months. Seven SNPs were significantly associated with decreased survival. Compared to those with no unfavorable genotypes, the HR for death increased significantly with the increasing number of unfavorable genotypes and women in the highest risk category had HR of 4.06 (95% CI 2.29–7.21). The survival tree analysis also identified patients with different survival patterns based on their genetic profiles. 13 SNPs on five different genes were found to be significantly associated with a treatment response, defined as no evidence of disease after completion of primary therapy. Rare homozygous genotype of SNP rs6973428 showed a 5.5-fold increased risk compared to the wild type carrying genotypes. In the cumulative effect analysis, the highest risk group (individuals with ≥8 unfavorable genotypes) was significantly less likely to respond to chemotherapy (OR=8.40, 95% CI 3.10–22.75) compared to the low risk group (≤4 unfavorable genotypes). ^ Conclusions. A pathway-based approach can demonstrate cumulative effects of multiple genetic variants on clinical response to chemotherapy and survival. Therapy targeting the mTOR pathway may modify outcome in select patients.^

A method to incorporate the effect of taxonomic uncertainty on multivariate analyses of ecological data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Researchers in ecology commonly use multivariate analyses (e.g. redundancy analysis, canonical correspondence analysis, Mantel correlation, multivariate analysis of variance) to interpret patterns in biological data and relate these patterns to environmental predictors. There has been, however, little recognition of the errors associated with biological data and the influence that these may have on predictions derived from ecological hypotheses. We present a permutational method that assesses the effects of taxonomic uncertainty on the multivariate analyses typically used in the analysis of ecological data. The procedure is based on iterative randomizations that randomly re-assign non identified species in each site to any of the other species found in the remaining sites. After each re-assignment of species identities, the multivariate method at stake is run and a parameter of interest is calculated. Consequently, one can estimate a range of plausible values for the parameter of interest under different scenarios of re-assigned species identities. We demonstrate the use of our approach in the calculation of two parameters with an example involving tropical tree species from western Amazonia: 1) the Mantel correlation between compositional similarity and environmental distances between pairs of sites, and; 2) the variance explained by environmental predictors in redundancy analysis (RDA). We also investigated the effects of increasing taxonomic uncertainty (i.e. number of unidentified species), and the taxonomic resolution at which morphospecies are determined (genus-resolution, family-resolution, or fully undetermined species) on the uncertainty range of these parameters. To achieve this, we performed simulations on a tree dataset from southern Mexico by randomly selecting a portion of the species contained in the dataset and classifying them as unidentified at each level of decreasing taxonomic resolution. An analysis of covariance showed that both taxonomic uncertainty and resolution significantly influence the uncertainty range of the resulting parameters. Increasing taxonomic uncertainty expands our uncertainty of the parameters estimated both in the Mantel test and RDA. The effects of increasing taxonomic resolution, however, are not as evident. The method presented in this study improves the traditional approaches to study compositional change in ecological communities by accounting for some of the uncertainty inherent to biological data. We hope that this approach can be routinely used to estimate any parameter of interest obtained from compositional data tables when faced with taxonomic uncertainty.

Using ecological niche models to support tree species selection for forest restoration planning in largely deforested regions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Species selection for forest restoration is often supported by expert knowledge on local distribution patterns of native tree species. This approach is not applicable to largely deforested regions unless enough data on pre-human tree species distribution is available. In such regions, ecological niche models may provide essential information to support species selection in the framework of forest restoration planning. In this study we used ecological niche models to predict habitat suitability for native tree species in "Tierra de Campos" region, an almost totally deforested area of the Duero Basin (Spain). Previously available models provide habitat suitability predictions for dominant native tree species, but including non-dominant tree species in the forest restoration planning may be desirable to promote biodiversity, specially in largely deforested areas were near seed sources are not expected. We used the Forest Map of Spain as species occurrence data source to maximize the number of modeled tree species. Penalized logistic regression was used to train models using climate and lithological predictors. Using model predictions a set of tools were developed to support species selection in forest restoration planning. Model predictions were used to build ordered lists of suitable species for each cell of the study area. The suitable species lists were summarized drawing maps that showed the two most suitable species for each cell. Additionally, potential distribution maps of the suitable species for the study area were drawn. For a scenario with two dominant species, the models predicted a mixed forest (Quercus ilex and a coniferous tree species) for almost one half of the study area. According to the models, 22 non-dominant native tree species are suitable for the study area, with up to six suitable species per cell. The model predictions pointed to Crataegus monogyna, Juniperus communis, J.oxycedrus and J.phoenicea as the most suitable non-dominant native tree species in the study area. Our results encourage further use of ecological niche models for forest restoration planning in largely deforested regions.

Modelling seed germination in forest tree species through survival analysis. The Pinus pinea L. case study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The direct application of existing models for seed germination may often be inadequate in the context of ecology and forestry germination experiments. This is because basic model assumptions are violated and variables available to forest managers are rarely used. In this paper, we present a method which addresses the aforementioned shortcomings. The approach is illustrated through a case study of Pinus pinea L. Our findings will also shed light on the role of germination in the general failure of natural regeneration in managed forests of this species. The presented technique consists of a mixed regression model based on survival analysis. Climate and stand covariates were tested. Data for fitting the model were gathered from a 5-year germination experiment in a mature, managed P. pinea stand in the Northern Plateau of Spain in which two different stand densities can be found. The model predictions proved to be unbiased and highly accurate when compared with the training data. Germination in P. pinea was controlled through thermal variables at stand level. At microsite level, low densities negatively affected the probability of germination. A time-lag in the response was also detected. Overall, the proposed technique provides a reliable alternative to germination modelling in ecology/forestry studies by using accessible/ suitable variables. The P. pinea case study highlights the importance of producing unbiased predictions. In this species, the occurrence and timing of germination suggest a very different regeneration strategy from that understood by forest managers until now, which may explain the high failure rate of natural regeneration in managed stands. In addition, these findings provide valuable information for the management of P. pinea under climate-change conditions.

Artificial analysis of molecular marker loci linked to tree resistance response by an artificial neural network

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the biggest challenges that software developers face is to make an accurate estimate of the project effort. Radial basis function neural networks have been used to software effort estimation in this work using NASA dataset. This paper evaluates and compares radial basis function versus a regression model. The results show that radial basis function neural network have obtained less Mean Square Error than the regression method.

Missing Rings in Pinus halepensis – The Missing Link to Relate the Tree-Ring Record to Extreme Climatic Events

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Climate predictions for the Mediterranean Basin include increased temperatures, decreased precipitation, and increased frequency of extreme climatic events (ECE). These conditions are associated with decreased tree growth and increased vulnerability to pests and diseases. The anatomy of tree rings responds to these environmental conditions. Quantitatively, the width of a tree ring is largely determined by the rate and duration of cell division by the vascular cambium. In the Mediterranean climate, this division may occur throughout almost the entire year. Alternatively, cell division may cease during relatively cool and dry winters, only to resume in the same calendar year with milder temperatures and increased availability of water. Under particularly adverse conditions, no xylem may be produced in parts of the stem, resulting in a missing ring (MR). A dendrochronological network of Pinus halepensis was used to determine the relationship of MR to ECE. The network consisted of 113 sites, 1,509 trees, 2,593 cores, and 225,428 tree rings throughout the distribution range of the species. A total of 4,150 MR were identified. Binomial logistic regression analysis determined that MR frequency increased with increased cambial age. Spatial analysis indicated that the geographic areas of south-eastern Spain and northern Algeria contained the greatest frequency of MR. Dendroclimatic regression analysis indicated a non-linear relationship of MR to total monthly precipitation and mean temperature. MR are strongly associated with the combination of monthly mean temperature from previous October till current February and total precipitation from previous September till current May. They are likely to occur with total precipitation lower than 50 mm and temperatures higher than 5°C. This conclusion is global and can be applied to every site across the distribution area. Rather than simply being a complication for dendrochronology, MR formation is a fundamental response of trees to adverse environmental conditions. The demonstrated relationship of MR formation to ECE across this dendrochronological network in the Mediterranean basin shows the potential of MR analysis to reconstruct the history of past climatic extremes and to predict future forest dynamics in a changing climate.

Coreferentiality: A New Method for the Hypothesis-Based Analysis of Phenotypes Characterized by Multivariate Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many multifactorial biologic effects, particularly in the context of complex human diseases, are still poorly understood. At the same time, the systematic acquisition of multivariate data has become increasingly easy. The use of such data to analyze and model complex phenotypes, however, remains a challenge. Here, a new analytic approach is described, termed coreferentiality, together with an appropriate statistical test. Coreferentiality is the indirect relation of two variables of functional interest in respect to whether they parallel each other in their respective relatedness to multivariate reference data, which can be informative for a complex effect or phenotype. It is shown that the power of coreferentiality testing is comparable to multiple regression analysis, sufficient even when reference data are informative only to a relatively small extent of 2.5%, and clearly exceeding the power of simple bivariate correlation testing. Thus, coreferentiality testing uses the increased power of multivariate analysis, however, in order to address a more straightforward interpretable bivariate relatedness. Systematic application of this approach could substantially improve the analysis and modeling of complex phenotypes, particularly in the context of human study where addressing functional hypotheses by direct experimentation is often difficult.

Multivariate statistical analysis of highway accident and highway conditions. Final report.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transportation Department, Office of University Research, Washington, D.C.

Multiple classification analysis ; a report on a computer program for multiple regression using categorical predictors /

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bibliographical footnotes.

A vulnerability analysis of the temperate forests of south central Chile A vulnerability analysis of the temperate forests of south central Chile. Biological Conservation, 122 2005: 9-21.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Areas of the landscape that are priorities for conservation should be those that are both vulnerable to threatening processes and that if lost or degraded, will result in conservation targets being compromised. While much attention is directed towards understanding the patterns of biodiversity, much less is given to determining the areas of the landscape most vulnerable to threats. We assessed the relative vulnerability of remaining areas of native forest to conversion to plantations in the ecologically significant temperate rainforest region of south central Chile. The area of the study region is 4.2 million ha and the extent of plantations is approximately 200000 ha. First, the spatial distribution of native forest conversion to plantations was determined. The variables related to the spatial distribution of this threatening process were identified through the development of a classification tree and the generation of a multivariate. spatially explicit, statistical model. The model of native forest conversion explained 43% of the deviance and the discrimination ability of the model was high. Predictions were made of where native forest conversion is likely to occur in the future. Due to patterns of climate, topography, soils and proximity to infrastructure and towns, remaining forest areas differ in their relative risk of being converted to plantations. Another factor that may increase the vulnerability of remaining native forest in a subset of the study region is the proposed construction of a highway. We found that 90% of the area of existing plantations within this region is within 2.5 km of roads. When the predictions of native forest conversion were recalculated accounting for the construction of this highway, it was found that: approximately 27000 ha of native forest had an increased probability of conversion. The areas of native forest identified to be vulnerable to conversion are outside of the existing reserve network. (C) 2004 Elsevier Ltd. All tights reserved.

Modelling pre-clearing vegetation distribution using GIS-integrated statistical, ecological and data models: A case study from the wet tropics of Northeastern Australia

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.

Toward prediction of class II mouse major histocompatibility complex peptide binding affinity:in silico bioinformatic evaluation using partial least squares, a robust multivariate statistical technique

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The accurate identification of T-cell epitopes remains a principal goal of bioinformatics within immunology. As the immunogenicity of peptide epitopes is dependent on their binding to major histocompatibility complex (MHC) molecules, the prediction of binding affinity is a prerequisite to the reliable prediction of epitopes. The iterative self-consistent (ISC) partial-least-squares (PLS)-based additive method is a recently developed bioinformatic approach for predicting class II peptide−MHC binding affinity. The ISC−PLS method overcomes many of the conceptual difficulties inherent in the prediction of class II peptide−MHC affinity, such as the binding of a mixed population of peptide lengths due to the open-ended class II binding site. The method has applications in both the accurate prediction of class II epitopes and the manipulation of affinity for heteroclitic and competitor peptides. The method is applied here to six class II mouse alleles (I-Ab, I-Ad, I-Ak, I-As, I-Ed, and I-Ek) and included peptides up to 25 amino acids in length. A series of regression equations highlighting the quantitative contributions of individual amino acids at each peptide position was established. The initial model for each allele exhibited only moderate predictivity. Once the set of selected peptide subsequences had converged, the final models exhibited a satisfactory predictive power. Convergence was reached between the 4th and 17th iterations, and the leave-one-out cross-validation statistical terms - q2, SEP, and NC - ranged between 0.732 and 0.925, 0.418 and 0.816, and 1 and 6, respectively. The non-cross-validated statistical terms r2 and SEE ranged between 0.98 and 0.995 and 0.089 and 0.180, respectively. The peptides used in this study are available from the AntiJen database (http://www.jenner.ac.uk/AntiJen). The PLS method is available commercially in the SYBYL molecular modeling software package. The resulting models, which can be used for accurate T-cell epitope prediction, will be made freely available online (http://www.jenner.ac.uk/MHCPred).

Correlating liposomal adjuvant characteristics to in-vivo cell-mediated immunity using a novel Mycobacterium tuberculosis fusion protein:a multivariate analysis study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective In this study, we have used a chemometrics-based method to correlate key liposomal adjuvant attributes with in-vivo immune responses based on multivariate analysis. Methods The liposomal adjuvant composed of the cationic lipid dimethyldioctadecylammonium bromide (DDA) and trehalose 6,6-dibehenate (TDB) was modified with 1,2-distearoyl-sn-glycero-3-phosphocholine at a range of mol% ratios, and the main liposomal characteristics (liposome size and zeta potential) was measured along with their immunological performance as an adjuvant for the novel, postexposure fusion tuberculosis vaccine, Ag85B-ESAT-6-Rv2660c (H56 vaccine). Partial least square regression analysis was applied to correlate and cluster liposomal adjuvants particle characteristics with in-vivo derived immunological performances (IgG, IgG1, IgG2b, spleen proliferation, IL-2, IL-5, IL-6, IL-10, IFN-γ). Key findings While a range of factors varied in the formulations, decreasing the 1,2-distearoyl-sn-glycero-3-phosphocholine content (and subsequent zeta potential) together built the strongest variables in the model. Enhanced DDA and TDB content (and subsequent zeta potential) stimulated a response skewed towards a cell mediated immunity, with the model identifying correlations with IFN-γ, IL-2 and IL-6. Conclusion This study demonstrates the application of chemometrics-based correlations and clustering, which can inform liposomal adjuvant design.

Comparison of Multivariate and Univariate Models for Genetic Evaluation of Milk Yield based on Test Day Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H12, 62P99

«
1
2
...
20
21
22
23
24
25
26
...
62
63
»