155 resultados para random forest regression
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Objective: To develop a model to predict the bleeding source and identify the cohort amongst patients with acute gastrointestinal bleeding (GIB) who require urgent intervention, including endoscopy. Patients with acute GIB, an unpredictable event, are most commonly evaluated and managed by non-gastroenterologists. Rapid and consistently reliable risk stratification of patients with acute GIB for urgent endoscopy may potentially improve outcomes amongst such patients by targeting scarce health-care resources to those who need it the most. Design and methods: Using ICD-9 codes for acute GIB, 189 patients with acute GIB and all. available data variables required to develop and test models were identified from a hospital medical records database. Data on 122 patients was utilized for development of the model and on 67 patients utilized to perform comparative analysis of the models. Clinical data such as presenting signs and symptoms, demographic data, presence of co-morbidities, laboratory data and corresponding endoscopic diagnosis and outcomes were collected. Clinical data and endoscopic diagnosis collected for each patient was utilized to retrospectively ascertain optimal management for each patient. Clinical presentations and corresponding treatment was utilized as training examples. Eight mathematical models including artificial neural network (ANN), support vector machine (SVM), k-nearest neighbor, linear discriminant analysis (LDA), shrunken centroid (SC), random forest (RF), logistic regression, and boosting were trained and tested. The performance of these models was compared using standard statistical analysis and ROC curves. Results: Overall the random forest model best predicted the source, need for resuscitation, and disposition with accuracies of approximately 80% or higher (accuracy for endoscopy was greater than 75%). The area under ROC curve for RF was greater than 0.85, indicating excellent performance by the random forest model Conclusion: While most mathematical models are effective as a decision support system for evaluation and management of patients with acute GIB, in our testing, the RF model consistently demonstrated the best performance. Amongst patients presenting with acute GIB, mathematical models may facilitate the identification of the source of GIB, need for intervention and allow optimization of care and healthcare resource allocation; these however require further validation. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
We analyze data obtained from a study designed to evaluate training effects on the performance of certain motor activities of Parkinson`s disease patients. Maximum likelihood methods were used to fit beta-binomial/Poisson regression models tailored to evaluate the effects of training on the numbers of attempted and successful specified manual movements in 1 min periods, controlling for disease stage and use of the preferred hand. We extend models previously considered by other authors in univariate settings to account for the repeated measures nature of the data. The results suggest that the expected number of attempts and successes increase with training, except for patients with advanced stages of the disease using the non-preferred hand. Copyright (c) 2008 John Wiley & Sons, Ltd.
Resumo:
The use of remote sensing is necessary for monitoring forest carbon stocks at large scales. Optical remote sensing, although not the most suitable technique for the direct estimation of stand biomass, offers the advantage of providing large temporal and spatial datasets. In particular, information on canopy structure is encompassed in stand reflectance time series. This study focused on the example of Eucalyptus forest plantations, which have recently attracted much attention as a result of their high expansion rate in many tropical countries. Stand scale time-series of Normalized Difference Vegetation Index (NDVI) were obtained from MODIS satellite data after a procedure involving un-mixing and interpolation, on about 15,000 ha of plantations in southern Brazil. The comparison of the planting date of the current rotation (and therefore the age of the stands) estimated from these time series with real values provided by the company showed that the root mean square error was 35.5 days. Age alone explained more than 82% of stand wood volume variability and 87% of stand dominant height variability. Age variables were combined with other variables derived from the NDVI time series and simple bioclimatic data by means of linear (Stepwise) or nonlinear (Random Forest) regressions. The nonlinear regressions gave r-square values of 0.90 for volume and 0.92 for dominant height, and an accuracy of about 25 m(3)/ha for volume (15% of the volume average value) and about 1.6 m for dominant height (8% of the height average value). The improvement including NDVI and bioclimatic data comes from the fact that the cumulative NDVI since planting date integrates the interannual variability of leaf area index (LAI), light interception by the foliage and growth due for example to variations of seasonal water stress. The accuracy of biomass and height predictions was strongly improved by using the NDVI integrated over the two first years after planting, which are critical for stand establishment. These results open perspectives for cost-effective monitoring of biomass at large scales in intensively-managed plantation forests. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
The purpose of this study was to develop and validate equations to estimate the aboveground phytomass of a 30 years old plot of Atlantic Forest. In two plots of 100 m², a total of 82 trees were cut down at ground level. For each tree, height and diameter were measured. Leaves and woody material were separated in order to determine their fresh weights in field conditions. Samples of each fraction were oven dried at 80 °C to constant weight to determine their dry weight. Tree data were divided into two random samples. One sample was used for the development of the regression equations, and the other for validation. The models were developed using single linear regression analysis, where the dependent variable was the dry mass, and the independent variables were height (h), diameter (d) and d²h. The validation was carried out using Pearson correlation coefficient, paired t-Student test and standard error of estimation. The best equations to estimate aboveground phytomass were: lnDW = -3.068+2.522lnd (r² = 0.91; s y/x = 0.67) and lnDW = -3.676+0.951ln d²h (r² = 0.94; s y/x = 0.56).
Resumo:
Mature weight breeding values were estimated using a multi-trait animal model (MM) and a random regression animal model (RRM). Data consisted of 82 064 weight records from 8 145 animals, recorded from birth to eight years of age. Weights at standard ages were considered in the MM. All models included contemporary groups as fixed effects, and age of dam (linear and quadratic effects) and animal age as covariates. In the RRM, mean trends were modelled through a cubic regression on orthogonal polynomials of animal age and genetic maternal and direct and maternal permanent environmental effects were also included as random. Legendre polynomials of orders 4, 3, 6 and 3 were used for animal and maternal genetic and permanent environmental effects, respectively, considering five classes of residual variances. Mature weight (five years) direct heritability estimates were 0.35 (MM) and 0.38 (RRM). Rank correlation between sires' breeding values estimated by MM and RRM was 0.82. However, selecting the top 2% (12) or 10% (62) of the young sires based on the MM predicted breeding values, respectively 71% and 80% of the same sires would be selected if RRM estimates were used instead. The RRM modelled the changes in the (co) variances with age adequately and larger breeding value accuracies can be expected using this model.
Resumo:
The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.
Resumo:
A total of 152,145 weekly test-day milk yield records from 7317 first lactations of Holstein cows distributed in 93 herds in southeastern Brazil were analyzed. Test-day milk yields were classified into 44 weekly classes of DIM. The contemporary groups were defined as herd-year-week of test-day. The model included direct additive genetic, permanent environmental and residual effects as random and fixed effects of contemporary group and age of cow at calving as covariable, linear and quadratic effects. Mean trends were modeled by a cubic regression on orthogonal polynomials of DIM. Additive genetic and permanent environmental random effects were estimated by random regression on orthogonal Legendre polynomials. Residual variances were modeled using third to seventh-order variance functions or a step function with 1, 6,13,17 and 44 variance classes. Results from Akaike`s and Schwarz`s Bayesian information criterion suggested that a model considering a 7th-order Legendre polynomial for additive effect, a 12th-order polynomial for permanent environment effect and a step function with 6 classes for residual variances, fitted best. However, a parsimonious model, with a 6th-order Legendre polynomial for additive effects and a 7th-order polynomial for permanent environmental effects, yielded very similar genetic parameter estimates. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Every year, autochthonous cases of Plasmodium vivax malaria occur in low-endemicity areas of Vale do Ribeira in the south-eastern part of the Atlantic Forest, state of São Paulo, where Anopheles cruzii and Anopheles bellator are considered the primary vectors. However, other species in the subgenus Nyssorhynchus of Anopheles (e.g., Anopheles marajoara) are abundant and may participate in the dynamics of malarial transmission in that region. The objectives of the present study were to assess the spatial distribution of An. cruzii, An. bellator and An. marajoara and to associate the presence of these species with malaria cases in the municipalities of the Vale do Ribeira. Potential habitat suitability modelling was applied to determine both the spatial distribution of An. cruzii, An. bellator and An. marajoara and to establish the density of each species. Poisson regression was utilized to associate malaria cases with estimated vector densities. As a result, An. cruzii was correlated with the forested slopes of the Serra do Mar, An. bellator with the coastal plain and An. marajoara with the deforested areas. Moreover, both An. marajoara and An. cruzii were positively associated with malaria cases. Considering that An. marajoara was demonstrated to be a primary vector of human Plasmodium in the rural areas of the state of Amapá, more attention should be given to the species in the deforested areas of the Atlantic Forest, where it might be a secondary vector.
Resumo:
We consider the problem of interaction neighborhood estimation from the partial observation of a finite number of realizations of a random field. We introduce a model selection rule to choose estimators of conditional probabilities among natural candidates. Our main result is an oracle inequality satisfied by the resulting estimator. We use then this selection rule in a two-step procedure to evaluate the interacting neighborhoods. The selection rule selects a small prior set of possible interacting points and a cutting step remove from this prior set the irrelevant points. We also prove that the Ising models satisfy the assumptions of the main theorems, without restrictions on the temperature, on the structure of the interacting graph or on the range of the interactions. It provides therefore a large class of applications for our results. We give a computationally efficient procedure in these models. We finally show the practical efficiency of our approach in a simulation study.
Resumo:
To test whether plant species influence greenhouse gas production in diverse ecosystems, we measured wet season soil CO(2) and N(2)O fluxes close to similar to 300 large (>35 cm in diameter at breast height (DBH)) trees of 15 species at three clay-rich forest sites in central Amazonia. We found that soil CO(2) fluxes were 38% higher near large trees than at control sites >10 m away from any tree (P < 0.0001). After adjusting for large tree presence, a multiple linear regression of soil temperature, bulk density, and liana DBH explained 19% of remaining CO(2) flux variability. Soil N(2)O fluxes adjacent to Caryocar villosum, Lecythis lurida, Schefflera morototoni, and Manilkara huberi were 84%-196% greater than Erisma uncinatum and Vochysia maxima, both Vochysiaceae. Tree species identity was the most important explanatory factor for N(2)O fluxes, accounting for more than twice the N(2)O flux variability as all other factors combined. Two observations suggest a mechanism for this finding: (1) sugar addition increased N(2)O fluxes near C. villosum twice as much (P < 0.05) as near Vochysiaceae and (2) species mean N(2)O fluxes were strongly negatively correlated with tree growth rate (P = 0.002). These observations imply that through enhanced belowground carbon allocation liana and tree species can stimulate soil CO(2) and N(2)O fluxes (by enhancing denitrification when carbon limits microbial metabolism). Alternatively, low N(2)O fluxes potentially result from strong competition of tree species with microbes for nutrients. Species-specific patterns in CO(2) and N(2)O fluxes demonstrate that plant species can influence soil biogeochemical processes in a diverse tropical forest.
Resumo:
Using data from a logging experiment in the eastern Brazilian Amazon region, we develop a matrix growth and yield model that captures the dynamic effects of harvest system choice on forest structure and composition. Multinomial logistic regression is used to estimate the growth transition parameters for a 10-year time step, while a Poisson regression model is used to estimate recruitment parameters. The model is designed to be easily integrated with an economic model of decisionmaking to perform tropical forest policy analysis. The model is used to compare the long-run structure and composition of a stand arising from the choice of implementing either conventional logging techniques or more carefully planned and executed reduced-impact logging (RIL) techniques, contrasted against a baseline projection of an unlogged forest. Results from log and leave scenarios show that a stand logged according to Brazilian management requirements will require well over 120 years to recover its initial commercial volume, regardless of logging technique employed. Implementing RIL, however, accelerates this recovery. Scenarios imposing a 40-year cutting cycle raise the possibility of sustainable harvest volumes, although at significantly lower levels than is implied by current regulations. Meeting current Brazilian forest policy goals may require an increase in the planned total area of permanent production forest or the widespread adoption of silvicultural practices that increase stand recovery and volume accumulation rates after RIL harvests. Published by Elsevier B.V.
Resumo:
This paper addresses the investment decisions considering the presence of financial constraints of 373 large Brazilian firms from 1997 to 2004, using panel data. A Bayesian econometric model was used considering ridge regression for multicollinearity problems among the variables in the model. Prior distributions are assumed for the parameters, classifying the model into random or fixed effects. We used a Bayesian approach to estimate the parameters, considering normal and Student t distributions for the error and assumed that the initial values for the lagged dependent variable are not fixed, but generated by a random process. The recursive predictive density criterion was used for model comparisons. Twenty models were tested and the results indicated that multicollinearity does influence the value of the estimated parameters. Controlling for capital intensity, financial constraints are found to be more important for capital-intensive firms, probably due to their lower profitability indexes, higher fixed costs and higher degree of property diversification.
Effects of roads, topography, and land use on forest cover dynamics in the Brazilian Atlantic Forest
Resumo:
Roads and topography can determine patterns of land use and distribution of forest cover, particularly in tropical regions. We evaluated how road density, land use, and topography affected forest fragmentation, deforestation and forest regrowth in a Brazilian Atlantic Forest region near the city of Sao Paulo. We mapped roads and land use/land cover for three years (1962, 1981 and 2000) from historical aerial photographs, and summarized the distribution of roads, land use/land cover and topography within a grid of 94 non-overlapping 100 ha squares. We used generalized least squares regression models for data analysis. Our models showed that forest fragmentation and deforestation depended on topography, land use and road density, whereas forest regrowth depended primarily on land use. However, the relationships between these variables and forest dynamics changed in the two studied periods; land use and slope were the strongest predictors from 1962 to 1981, and past (1962) road density and land use were the strongest predictors for the following period (1981-2000). Roads had the strongest relationship with deforestation and forest fragmentation when the expansions of agriculture and buildings were limited to already deforested areas, and when there was a rapid expansion of development, under influence of Sao Paulo city. Furthermore, the past(1962)road network was more important than the recent road network (1981) when explaining forest dynamics between 1981 and 2000, suggesting a long-term effect of roads. Roads are permanent scars on the landscape and facilitate deforestation and forest fragmentation due to increased accessibility and land valorization, which control land-use and land-cover dynamics. Topography directly affected deforestation, agriculture and road expansion, mainly between 1962 and 1981. Forest are thus in peril where there are more roads, and long-term conservation strategies should consider ways to mitigate roads as permanent landscape features and drivers facilitators of deforestation and forest fragmentation. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The degree to which habitat fragmentation affects bird incidence is species specific and may depend on varying spatial scales. Selecting the correct scale of measurement is essential to appropriately assess the effects of habitat fragmentation on bird occurrence. Our objective was to determine which spatial scale of landscape measurement best describes the incidence of three bird species (Pyriglena leucoptera, Xiphorhynchus fuscus and Chiroxiphia caudata) in the fragmented Brazilian Atlantic forest and test if multi-scalar models perform better than single-scalar ones. Bird incidence was assessed in 80 forest fragments. The surrounding landscape structure was described with four indices measured at four spatial scales (400-, 600-, 800- and 1,000-m buffers around the sample points). The explanatory power of each scale in predicting bird incidence was assessed using logistic regression, bootstrapped with 1,000 repetitions. The best results varied between species (1,000-m radius for P. leucoptera; 800-m for X. fuscus and 600-m for C. caudata), probably due to their distinct feeding habits and foraging strategies. Multi-scale models always resulted in better predictions than single-scale models, suggesting that different aspects of the landscape structure are related to different ecological processes influencing bird incidence. In particular, our results suggest that local extinction and (re)colonisation processes might simultaneously act at different scales. Thus, single-scale models may not be good enough to properly describe complex pattern-process relationships. Selecting variables at multiple ecologically relevant scales is a reasonable procedure to optimise the accuracy of species incidence models.
Resumo:
Chromosomes of the South American geckos Gymnodactylus amarali and G. geckoides from open and dry areas of the Cerrado and Caatinga biomes in Brazil, respectively, were studied for the first time, after conventional and AgNOR staining, CBG- and RBG-banding, and FISH with telomeric sequences. Comparative analyses between the karyotypes of open areas and the previously studied Atlantic forest species G. darwinii were also performed. The chromosomal polymorphisms detected in populations of G. amarali from the states of Goias and Tocantins is the result of centric fusions (2n = 38, 39 and 40), suggesting a differentiation from a 2n = 40 ancestral karyotype and the presence of supernumerary chromosomes. The CBG- and RBG-banding patterns of the Bs are described. G. geckoides has 40 chromosomes with gradually decreasing sizes, but it is distinct from the 2n = 40 karyotypes of G. amarali and G. darwinii due to occurrence of pericentric inversions or centromere repositioning. NOR location seems to be a marker for Gymnodactylus, as G. amarali and G. geckoides share a medium-sized subtelocentric NOR-bearing pair, while G. darwinii has NORs at the secondary constriction of the long arm of pair 1. The comparative analyses indicate a non-random nature of the Robertsonian rearrangements in the genus Gymnodactylus. Copyright (C) 2010 S. Karger AG, Basel