876 resultados para Boosted regression trees


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The accumulation of mutations after long-lasting exposure to a failing combination antiretroviral therapy (cART) is problematic and severely reduces the options for further successful treatments. Methods We studied patients from the Swiss HIV Cohort Study who failed cART with nucleoside reverse transcriptase inhibitors (NRTIs) and either a ritonavir-boosted PI (PI/r) or a non-nucleoside reverse transcriptase inhibitor (NNRTI). The loss of genotypic activity <3, 3–6, >6 months after virological failure was analyzed with Stanford algorithm. Risk factors associated with early emergence of drug resistance mutations (<6 months after failure) were identified with multivariable logistic regression. Results Ninety-nine genotypic resistance tests from PI/r-treated and 129 from NNRTI-treated patients were analyzed. The risk of losing the activity of ≥1 NRTIs was lower among PI/r- compared to NNRTI-treated individuals <3, 3–6, and >6 months after failure: 8.8% vs. 38.2% (p = 0.009), 7.1% vs. 46.9% (p<0.001) and 18.9% vs. 60.9% (p<0.001). The percentages of patients who have lost PI/r activity were 2.9%, 3.6% and 5.4% <3, 3–6, >6 months after failure compared to 41.2%, 49.0% and 63.0% of those who have lost NNRTI activity (all p<0.001). The risk to accumulate an early NRTI mutation was strongly associated with NNRTI-containing cART (adjusted odds ratio: 13.3 (95% CI: 4.1–42.8), p<0.001). Conclusions The loss of activity of PIs and NRTIs was low among patients treated with PI/r, even after long-lasting exposure to a failing cART. Thus, more options remain for second-line therapy. This finding is potentially of high relevance, in particular for settings with poor or lacking virological monitoring.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Conventional reflectance spectroscopy (NIRS) and hyperspectral imaging (HI) in the near-infrared region (1000-2500 nm) are evaluated and compared, using, as the case study, the determination of relevant properties related to the quality of natural rubber. Mooney viscosity (MV) and plasticity indices (PI) (PI0 - original plasticity, PI30 - plasticity after accelerated aging, and PRI - the plasticity retention index after accelerated aging) of rubber were determined using multivariate regression models. Two hundred and eighty six samples of rubber were measured using conventional and hyperspectral near-infrared imaging reflectance instruments in the range of 1000-2500 nm. The sample set was split into regression (n = 191) and external validation (n = 95) sub-sets. Three instruments were employed for data acquisition: a line scanning hyperspectral camera and two conventional FT-NIR spectrometers. Sample heterogeneity was evaluated using hyperspectral images obtained with a resolution of 150 × 150 μm and principal component analysis. The probed sample area (5 cm(2); 24,000 pixels) to achieve representativeness was found to be equivalent to the average of 6 spectra for a 1 cm diameter probing circular window of one FT-NIR instrument. The other spectrophotometer can probe the whole sample in only one measurement. The results show that the rubber properties can be determined with very similar accuracy and precision by Partial Least Square (PLS) regression models regardless of whether HI-NIR or conventional FT-NIR produce the spectral datasets. The best Root Mean Square Errors of Prediction (RMSEPs) of external validation for MV, PI0, PI30, and PRI were 4.3, 1.8, 3.4, and 5.3%, respectively. Though the quantitative results provided by the three instruments can be considered equivalent, the hyperspectral imaging instrument presents a number of advantages, being about 6 times faster than conventional bulk spectrometers, producing robust spectral data by ensuring sample representativeness, and minimizing the effect of the presence of contaminants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this study was to develop and validate equations to estimate the aboveground phytomass of a 30 years old plot of Atlantic Forest. In two plots of 100 m², a total of 82 trees were cut down at ground level. For each tree, height and diameter were measured. Leaves and woody material were separated in order to determine their fresh weights in field conditions. Samples of each fraction were oven dried at 80 °C to constant weight to determine their dry weight. Tree data were divided into two random samples. One sample was used for the development of the regression equations, and the other for validation. The models were developed using single linear regression analysis, where the dependent variable was the dry mass, and the independent variables were height (h), diameter (d) and d²h. The validation was carried out using Pearson correlation coefficient, paired t-Student test and standard error of estimation. The best equations to estimate aboveground phytomass were: lnDW = -3.068+2.522lnd (r² = 0.91; s y/x = 0.67) and lnDW = -3.676+0.951ln d²h (r² = 0.94; s y/x = 0.56).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Este estudo teve como objetivo desenvolver modelos preditores de fitomassa epigéa da vegetação arbórea da Floresta Baixa de Restinga. Foram selecionadas 102 árvores de 29 espécies ocorrentes na área de estudo e 102 indivíduos de jerivá (Syagrus romanzoffiana (Cham.) Glassman), distribuídos proporcionalmente entre as classes de diâmetro da vegetação arbórea. As árvores foram cortadas, ao nível do solo e foram medidos a altura total e o diâmetro à altura do peito (DAP) de cada árvore. As folhas foram separadas do lenho e a massa fresca da porção lenhosa e foliar medidas separadamente. Amostras de cada fração foram secas a 70 °C, até peso constante, para determinação da massa seca das árvores. Os modelos foram desenvolvidos através de análise de regressão linear, sendo a variável dependente a massa seca (MS) das árvores e as variáveis independentes a altura (h), o diâmetro a altura do peito (d) e as relações d² h e d² h multiplicada pela densidade da madeira (ρ d² h). Os modelos desenvolvidos indicam que o diâmetro explica grande parte da variabilidade da fitomassa das árvores da restinga e a altura é a variável explanatória da equação específica para o jerivá. Os modelos selecionados foram: ln MS (kg) = -1,352 + 2,009 ln d (R² = 0,96; s yx = 0,34) para a comunidade vegetal sem jerivá, ln MS (kg) = -2,052 + 0,801 ln d² h (R² = 0,94; s yx = 0,38) para a comunidade incluindo o jerivá, e ln MS (kg) = -0,884 + 2,40 ln h (R² = 0,92; s yx = 0,49) para o jerivá.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The troglobitic armored catfish, Ancistrus cryptophthalmus (Loricariidae, Ancistrinae) is known from four caves in the São Domingos karst area, upper rio Tocantins basin, Central Brazil. These populations differ in general body shape and degree of reduction of eyes and of pigmentation. The small Passa Três population (around 1,000 individuals) presents the most reduced eyes, which are not externally visible in adults. A small group of Passa Três catfish, one male and three females, reproduced spontaneously thrice in laboratory, at the end of summertime in 2000, 2003 and 2004. Herein we describe the reproductive behavior during the 2003 event, as well as the early development of the 2003 and 2004 offsprings, with focus on body growth and ontogenetic regression of eyes. The parental care by the male, which includes defense of the rock shelter where the egg clutch is laid, cleaning and oxygenation of eggs, is typical of many loricariids. On the other hand, the slow development, including delayed eye degeneration, low body growth rates and high estimated longevity (15 years or more) are characteristic of precocial, or K-selected, life cycles. In the absence of comparable data for close epigean relatives (Ancistrus spp.), it is not possible to establish whether these features are an autapomorphic specialization of the troglobitic A. cryptophthalmus or a plesiomorphic trait already present in the epigean ancestor, possibly favoring the adoption of the life in the food-poor cave environment. We briefly discuss the current hypotheses on eye regression in troglobitic vertebrates.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The tree Gmelina arborea has been widely introduced in Costa Rica for commercial purposes. This new conditions for melina cause variations on anatomy in secondary xylem of the trees growing in plantations. The objective of the present research was to determine the variation in the anatomy of xylem caused by the ecological conduction variation. Dimensions of fiber, axial parenchyma percentage of cross sections, parameters of vessels and the ray were measured. The results showed that some anatomical characteristics remained stable despite variations of ecological conditions, especially radial parenchyma and anatomical features which were less affected by the altitude. On the other hand, the vessels, axial parenchyma and fiber were less stable because they were affected significantly by the longitude, latitude, altitude and precipitation. Latitude significantly affected vessel percentage, length and diameter of the fiber and lumen. Longitude affected vessel percentage and fiber diameter. Altitude had a significant correlation with the amount of cells at my height. Annual average precipitation affected vessel percentage and diameter, not only of the fiber, but also of the lumen. These results suggest that the new growth conditions of G. arborea trees in Costa Rica have produced an anatomic adaptation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The heartwood of candeia tree is a source of essential oil rich in alpha-bisabolol, a substance widely used in the cosmetic and pharmaceutical industry. Bearing in mind the economic importance of alpha-bisabolol, this work aimed to evaluate the influence of tree age on the yield and content of alpha-bisabolol present in essential oil from candeia, considering two distinct reliefs and three diameter classes, in Aiuruoca region, south Minas Gerais state. The two distinct reliefs correspond respectively to one section of the stand growing at 1,000m of altitude (Area 1) and another section growing at 1,100m of altitude (Area 2). In each section, 15 trees were felled from among 3 different diameter classes. Discs were removed from the base of each tree to estimate their age by doing growth ring count. Soil samples were taken and Subjected to physical and chemical analysis. The logs were reduced into chips and random samples were taken for distillation to extract essential oil. The method used was steam distillation at a pressure of 2 kgf/cm(2)/2.5 h. The chemical analysis was performed in a gas chromatograph (GC) based on the alpha-bisabolol standard reference. The yield of essential oil from trees in Area I was higher than that from trees in Area 2, with the same pattern of influence for older trees. In Area 2, the alpha-bisabolol content was higher in younger trees. No differences were found between the relevant parameters in relation to diameter classes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mature weight breeding values were estimated using a multi-trait animal model (MM) and a random regression animal model (RRM). Data consisted of 82 064 weight records from 8 145 animals, recorded from birth to eight years of age. Weights at standard ages were considered in the MM. All models included contemporary groups as fixed effects, and age of dam (linear and quadratic effects) and animal age as covariates. In the RRM, mean trends were modelled through a cubic regression on orthogonal polynomials of animal age and genetic maternal and direct and maternal permanent environmental effects were also included as random. Legendre polynomials of orders 4, 3, 6 and 3 were used for animal and maternal genetic and permanent environmental effects, respectively, considering five classes of residual variances. Mature weight (five years) direct heritability estimates were 0.35 (MM) and 0.38 (RRM). Rank correlation between sires' breeding values estimated by MM and RRM was 0.82. However, selecting the top 2% (12) or 10% (62) of the young sires based on the MM predicted breeding values, respectively 71% and 80% of the same sires would be selected if RRM estimates were used instead. The RRM modelled the changes in the (co) variances with age adequately and larger breeding value accuracies can be expected using this model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Loebl, Komlos, and Sos conjectured that if at least half the vertices of a graph G have degree at least some k is an element of N, then every tree with at most k edges is a subgraph of G. We prove the conjecture for all trees of diameter at most 5 and for a class of caterpillars. Our result implies a bound on the Ramsey number r( T, T') of trees T, T' from the above classes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To test whether plant species influence greenhouse gas production in diverse ecosystems, we measured wet season soil CO(2) and N(2)O fluxes close to similar to 300 large (>35 cm in diameter at breast height (DBH)) trees of 15 species at three clay-rich forest sites in central Amazonia. We found that soil CO(2) fluxes were 38% higher near large trees than at control sites >10 m away from any tree (P < 0.0001). After adjusting for large tree presence, a multiple linear regression of soil temperature, bulk density, and liana DBH explained 19% of remaining CO(2) flux variability. Soil N(2)O fluxes adjacent to Caryocar villosum, Lecythis lurida, Schefflera morototoni, and Manilkara huberi were 84%-196% greater than Erisma uncinatum and Vochysia maxima, both Vochysiaceae. Tree species identity was the most important explanatory factor for N(2)O fluxes, accounting for more than twice the N(2)O flux variability as all other factors combined. Two observations suggest a mechanism for this finding: (1) sugar addition increased N(2)O fluxes near C. villosum twice as much (P < 0.05) as near Vochysiaceae and (2) species mean N(2)O fluxes were strongly negatively correlated with tree growth rate (P = 0.002). These observations imply that through enhanced belowground carbon allocation liana and tree species can stimulate soil CO(2) and N(2)O fluxes (by enhancing denitrification when carbon limits microbial metabolism). Alternatively, low N(2)O fluxes potentially result from strong competition of tree species with microbes for nutrients. Species-specific patterns in CO(2) and N(2)O fluxes demonstrate that plant species can influence soil biogeochemical processes in a diverse tropical forest.