956 resultados para random forest regression
Resumo:
OBJECTIVE The current Ebola epidemic massively affected the Macenta district in Forest Guinea. We aimed at investigating its impact on general and HIV care at the only HIV care facility in the district. DESIGN Prospective observational single-facility study. METHODS Routinely collected data on use of general hospital services and HIV care were linked to Ebola surveillance data published by the Guinea Ministry of Health. In addition, we compared retention among HIV-infected patients enrolled into care in the first semesters of 2013 and 2014. RESULTS Throughout 2014, service offer was continuous and unaltered at the facility. During the main epidemic period (August-December 2014), compared with the same period of 2013, there were important reductions in attendance at the primary care outpatient clinic (-40%), in HIV tests done (-46%), in new diagnoses of tuberculosis (-53%) and in patients enrolled into HIV care (-47%). There was a smaller reduction in attendance at the HIV follow-up clinic (-11%). Kaplan-Meier estimates of retention were similar among the patients enrolled into care in 2014 and 2013. In a multivariable Cox regression analysis, the year of enrolment was not associated with attrition (hazard ratio 1.02; 95% confidence interval: 0.72-1.43). CONCLUSION The Ebola epidemic resulted in an important decrease in utilization of the facility despite unaltered service offer. Effects on care of HIV-positive patients enrolled prior to the epidemic were limited. HIV care in such circumstances is challenging, but not impossible.
Resumo:
Climatic relationships were established in two 210Pb dated pollen sequences from small mires closely surrounded by forest just below actual forest limits (but about 300 m below potential climatic forest limits) in the northern Swiss Alps (suboceanic in climate; mainly with Picea) and the central Swiss Alps (subcontinental; mainly Pinus cembra and Larix) at annual or near-annual resolution from ad 1901 to 1996. Effects of vegetational succession were removed by splitting the time series into early and late periods and by linear detrending. Both pollen concentrations detrended by the depth-age model and modified percentages (in which counts of dominant pollen types are down-weighted) are correlated by simple linear regression with smoothed climatic parameters with one-and two-year timelags, including average monthly and April/September daylight air temperatures and with seasonal and annual precipitation sums. Results from detrended pollen concentrations suggest that peat accumulation is favoured in the northern-Alpine mire either by early snowmelt or by summer precipitation, but in the central-Alpine mire by increased precipitation and cooler summers, suggesting a position of the northern-Alpine mire near the upper altitudinal limit of peat formation, but of the central-Alpine mire near the lower limit. Results from modified pollen percentages indicate that pollen pro duction by plants growing near their upper altitudinal limit is limited by insufficient warmth in summer, and pollen production by plants growing near their lower altitudinal limit is limited by too-high temperatures. Only weakly significant pollen/climate relationships were found for Pinus cembra and Larix, probably because they experience little climatic stress growing 300 m below the potential climatic forest limit.
Resumo:
Let Y_i = f(x_i) + E_i\ (1\le i\le n) with given covariates x_1\lt x_2\lt \cdots\lt x_n , an unknown regression function f and independent random errors E_i with median zero. It is shown how to apply several linear rank test statistics simultaneously in order to test monotonicity of f in various regions and to identify its local extrema.
Resumo:
Charcoal in unlaminated sediments dated by 210Pb was analysed by the pollen-slide and thin-section methods. The results were compared with the number and area of forest fires on different spatial scales in the area around Lago di Origlio as listed in the wildfire database of southern Switzerland since AD 1920. The influx of the number of charcoal particles > 75 µm2 in pollen slides correlates well with the number of annual forest fires recorded within a distance of 20-50 km from the coring site. Hence a size-class distinction or an area measurement by image analysis may not be absolutely necessary for the reconstruction of regional fire history. A regression equation was computed and tested against an independent data set. Its use makes it possible to estimate the charcoal area influx (or concentration) from the particle number influx (or concentration). Local fires within a radius of 2 km around the coring site correlate well with the area influx of charcoal particles estimated by the thin-section method measuring the area of charcoal particles larger than 20 000 µm2 or longer than 50 µm. Pollen percentages and influx values suggest that intensive agriculture and Castanea sativa cultivation were reduced 30-40 years ago, followed by an increase of forest area and a development to more natural woodlands. The traditional Castanea sativa cultivation was characterized by a complete use of the biomass produced, so abandonment of chestnut led to an increasing accumulation of dead biomass, thereby raising the fire risk. On the other hand, the pollen record of the regional vegetation does not show any clear response to the increase of fire frequency during the last three decades in this area.
Resumo:
We present a framework for fitting multiple random walks to animal movement paths consisting of ordered sets of step lengths and turning angles. Each step and turn is assigned to one of a number of random walks, each characteristic of a different behavioral state. Behavioral state assignments may be inferred purely from movement data or may include the habitat type in which the animals are located. Switching between different behavioral states may be modeled explicitly using a state transition matrix estimated directly from data, or switching probabilities may take into account the proximity of animals to landscape features. Model fitting is undertaken within a Bayesian framework using the WinBUGS software. These methods allow for identification of different movement states using several properties of observed paths and lead naturally to the formulation of movement models. Analysis of relocation data from elk released in east-central Ontario, Canada, suggests a biphasic movement behavior: elk are either in an "encamped" state in which step lengths are small and turning angles are high, or in an "exploratory" state, in which daily step lengths are several kilometers and turning angles are small. Animals encamp in open habitat (agricultural fields and opened forest), but the exploratory state is not associated with any particular habitat type.
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
Gastroesophageal reflux disease is a common condition affecting 25 to 40% of the population and causes significant morbidity in the U.S., accounting for at least 9 million office visits to physicians with estimated annual costs of $10 billion. Previous research has not clearly established whether infection with Helicobacter pylori, a known cause of peptic ulcer, atrophic gastritis and non cardia adenocarcinoma of the stomach, is associated with gastroesophageal reflux disease. This study is a secondary analysis of data collected in a cross-sectional study of a random sample of adult residents of Ciudad Juarez, Mexico, that was conducted in 2004 (Prevalence and Determinants of Chronic Atrophic Gastritis Study or CAG study, Dr. Victor M. Cardenas, Principal Investigator). In this study, the presence of gastroesophageal reflux disease was based on responses to the previously validated Spanish Language Dyspepsia Questionnaire. Responses to this questionnaire indicating the presence of gastroesophageal reflux symptoms and disease were compared with the presence of H. pylori infection as measured by culture, histology and rapid urease test, and with findings of upper endoscopy (i.e., hiatus hernia and erosive and atrophic esophagitis). The prevalence ratio was calculated using bivariate, stratified and multivariate negative binomial logistic regression analyses in order to assess the relation between active H. pylori infection and the prevalence of gastroesophageal reflux typical syndrome and disease, while controlling for known risk factors of gastroesophageal reflux disease such as obesity. In a random sample of 174 adults 48 (27.6%) of the study participants had typical reflux syndrome and only 5% (or 9/174) had gastroesophageal reflux disease per se according to the Montreal consensus, which defines reflux syndromes and disease based on whether the symptoms are perceived as troublesome by the subject. There was no association between H. pylori infection and typical reflux syndrome or gastroesophageal reflux disease. However, we found that in this Northern Mexican population, there was a moderate association (Prevalence Ratio=2.5; 95% CI=1.3, 4.7) between obesity (≥30 kg/m2) and typical reflux syndrome. Management and prevention of obesity will significantly curb the growing numbers of persons affected by gastroesophageal reflux symptoms and disease in Northern Mexico. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^
Resumo:
Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^
Resumo:
Persistence and abundance of species is determined by habitat availability and the ability to disperse and colonize habitats at contrasting spatial scales. Favourable habitat fragments are also heterogeneous in quality, providing differing opportunities for establishment and affecting the population dynamics of a species. Based on these principles, we suggest that the presence and abundance of epiphytes may reflect their dispersal ability, which is primarily determined by the spatial structure of host trees, but also by host quality. To our knowledge there has been no explicit test of the importance of host tree spatial pattern for epiphytes in Mediterranean forests. We hypothesized that performance and host occupancy in a favourable habitat depend on the spatial pattern of host trees, because this pattern affects the dispersal ability of each epiphyte and it also determines the availability of suitable sites for establishment. We tested this hypothesis using new point pattern analysis tools and generalized linear mixed models to investigate the spatial distribution and performance of the epiphytic lichen Lobaria pulmonaria, which inhabits two types of host trees (beeches and Iberian oaks). We tested the effects on L. pulmonaria distribution of tree size, spatial configuration, and host tree identity. We built a model including tree size, stand structure, and several neighbourhood predictors to understand the effect of host tree on L. pulmonaria. We also investigated the relative importance of spatial patterning on the presence and abundance of the species, independently of the host tree configuration. L. pulmonaria distribution was highly dependent on habitat quality for successful establishment, i.e., tree species identity, tree diameter, and several forest stand structure surrogates. For beech trees, tree diameter was the main factor influencing presence and cover of the lichen, although larger lichen-colonized trees were located close to focal trees, i.e., young trees. However, oak diameter was not an important factor, suggesting that bark roughness at all diameters favoured lichen establishment. Our results indicate that L. pulmonaria dispersal is not spatially restricted, but it is dependent on habitat quality. Furthermore, new spatial analysis tools suggested that L. pulmonaria cover exhibits a distinct pattern, although the spatial pattern of tree position and size was random.
Resumo:
Species selection for forest restoration is often supported by expert knowledge on local distribution patterns of native tree species. This approach is not applicable to largely deforested regions unless enough data on pre-human tree species distribution is available. In such regions, ecological niche models may provide essential information to support species selection in the framework of forest restoration planning. In this study we used ecological niche models to predict habitat suitability for native tree species in "Tierra de Campos" region, an almost totally deforested area of the Duero Basin (Spain). Previously available models provide habitat suitability predictions for dominant native tree species, but including non-dominant tree species in the forest restoration planning may be desirable to promote biodiversity, specially in largely deforested areas were near seed sources are not expected. We used the Forest Map of Spain as species occurrence data source to maximize the number of modeled tree species. Penalized logistic regression was used to train models using climate and lithological predictors. Using model predictions a set of tools were developed to support species selection in forest restoration planning. Model predictions were used to build ordered lists of suitable species for each cell of the study area. The suitable species lists were summarized drawing maps that showed the two most suitable species for each cell. Additionally, potential distribution maps of the suitable species for the study area were drawn. For a scenario with two dominant species, the models predicted a mixed forest (Quercus ilex and a coniferous tree species) for almost one half of the study area. According to the models, 22 non-dominant native tree species are suitable for the study area, with up to six suitable species per cell. The model predictions pointed to Crataegus monogyna, Juniperus communis, J.oxycedrus and J.phoenicea as the most suitable non-dominant native tree species in the study area. Our results encourage further use of ecological niche models for forest restoration planning in largely deforested regions.
Resumo:
Persistence and abundance of species is determined by habitat availability and the ability to disperse and colonize habitats at contrasting spatial scales. Favourable habitat fragments are also heterogeneous in quality, providing differing opportunities for establishment and affecting the population dynamics of a species. Based on these principles, we suggest that the presence and abundance of epiphytes may reflect their dispersal ability, which is primarily determined by the spatial structure of host trees, but also by host quality. To our knowledge there has been no explicit test of the importance of host tree spatial pattern for epiphytes in Mediterranean forests. We hypothesized that performance and host occupancy in a favourable habitat depend on the spatial pattern of host trees, because this pattern affects the dispersal ability of each epiphyte and it also determines the availability of suitable sites for establishment. We tested this hypothesis using new point pattern analysis tools and generalized linear mixed models to investigate the spatial distribution and performance of the epiphytic lichen Lobaria pulmonaria, which inhabits two types of host trees (beeches and Iberian oaks). We tested the effects on L. pulmonaria distribution of tree size, spatial configuration, and host tree identity. We built a model including tree size, stand structure, and several neighbourhood predictors to understand the effect of host tree on L. pulmonaria. We also investigated the relative importance of spatial patterning on the presence and abundance of the species, independently of the host tree configuration. L. pulmonaria distribution was highly dependent on habitat quality for successful establishment, i.e., tree species identity, tree diameter, and several forest stand structure surrogates. For beech trees, tree diameter was the main factor influencing presence and cover of the lichen, although larger lichen-colonized trees were located close to focal trees, i.e., young trees. However, oak diameter was not an important factor, suggesting that bark roughness at all diameters favoured lichen establishment. Our results indicate that L. pulmonaria dispersal is not spatially restricted, but it is dependent on habitat quality. Furthermore, new spatial analysis tools suggested that L. pulmonaria cover exhibits a distinct pattern, although the spatial pattern of tree position and size was random.
Resumo:
The direct application of existing models for seed germination may often be inadequate in the context of ecology and forestry germination experiments. This is because basic model assumptions are violated and variables available to forest managers are rarely used. In this paper, we present a method which addresses the aforementioned shortcomings. The approach is illustrated through a case study of Pinus pinea L. Our findings will also shed light on the role of germination in the general failure of natural regeneration in managed forests of this species. The presented technique consists of a mixed regression model based on survival analysis. Climate and stand covariates were tested. Data for fitting the model were gathered from a 5-year germination experiment in a mature, managed P. pinea stand in the Northern Plateau of Spain in which two different stand densities can be found. The model predictions proved to be unbiased and highly accurate when compared with the training data. Germination in P. pinea was controlled through thermal variables at stand level. At microsite level, low densities negatively affected the probability of germination. A time-lag in the response was also detected. Overall, the proposed technique provides a reliable alternative to germination modelling in ecology/forestry studies by using accessible/ suitable variables. The P. pinea case study highlights the importance of producing unbiased predictions. In this species, the occurrence and timing of germination suggest a very different regeneration strategy from that understood by forest managers until now, which may explain the high failure rate of natural regeneration in managed stands. In addition, these findings provide valuable information for the management of P. pinea under climate-change conditions.
Resumo:
An understanding of spatial patterns of plant species diversity and the factors that drive those patterns is critical for the development of appropriate biodiversity management in forest ecosystems. We studied the spatial organization of plants species in human- modified and managed oak forests (primarily, Quercus faginea) in the Central Pre- Pyrenees, Spain. To test whether plant community assemblages varied non-randomly across the spatial scales, we used multiplicative diversity partitioning based on a nested hierarchical design of three increasingly coarser spatial scales (transect, stand, region). To quantify the importance of the structural, spatial, and topographical characteristics of stands in patterning plant species assemblages and identify the determinants of plant diversity patterns, we used canonical ordination. We observed a high contribution of ˟-diversity to total -diversity and found ˟-diversity to be higher and ˞-diversity to be lower than expected by random distributions of individuals at different spatial scales. Results, however, partly depended on the weighting of rare and abundant species. Variables expressing the historical management intensities of the stand such as mean stand age, the abundance of the dominant tree species (Q. faginea), age structure of the stand, and stand size were the main factors that explained the compositional variation in plant communities. The results indicate that (1) the structural, spatial, and topographical characteristics of the forest stands have the greatest effect on diversity patterns, (2) forests in landscapes that have different land use histories are environmentally heterogeneous and, therefore, can experience high levels of compositional differentiation, even at local scales (e.g., within the same stand). Maintaining habitat heterogeneity at multiple spatial scales should be considered in the development of management plans for enhancing plant diversity and related functions in human-altered forests