956 resultados para random forest regression
Resumo:
We propose alternative approaches to analyze residuals in binary regression models based on random effect components. Our preferred model does not depend upon any tuning parameter, being completely automatic. Although the focus is mainly on accommodation of outliers, the proposed methodology is also able to detect them. Our approach consists of evaluating the posterior distribution of random effects included in the linear predictor. The evaluation of the posterior distributions of interest involves cumbersome integration, which is easily dealt with through stochastic simulation methods. We also discuss different specifications of prior distributions for the random effects. The potential of these strategies is compared in a real data set. The main finding is that the inclusion of extra variability accommodates the outliers, improving the adjustment of the model substantially, besides correctly indicating the possible outliers.
Resumo:
In an attempt to estimate the soil-water transit time using the variation in 18O values, a statistical model was used. This model is based on linear regression analysis applied to the values observed for soil water and rain water. The time obtained from these correlations represents the mean time necessary for the water to run from one collecting point to the next.-from Authors
Resumo:
In order to evaluate the flying capacity and nest site selection of Angiopolybia pallens (Lepeletier, 1836), we made 17 incursions (136 hours of sample efforts) in Atlantic Rain Forest environments in Bahia state. Our data show this wasp prefers to nest on wide leaves of bushes and short trees (nests between 0.30 and 3m from the ground) placed in half-shady environments (clearings and shadowed cultivations). The logistic regression model using Quasi-Newton method provided a good description of the flying capacity observed in A. pallens (x 2 = 91.52; p≪0.001). According to the logistic regression model, the A. pallens flight autonomy is low, flying for short distances and with an effective radius of action of about 24m measured from their nests, which means a foraging area of nearly 1,800 m 2.
Resumo:
The Optimum-Path Forest (OPF) classifier is a recent and promising method for pattern recognition, with a fast training algorithm and good accuracy results. Therefore, the investigation of a combining method for this kind of classifier can be important for many applications. In this paper we report a fast method to combine OPF-based classifiers trained with disjoint training subsets. Given a fixed number of subsets, the algorithm chooses random samples, without replacement, from the original training set. Each subset accuracy is improved by a learning procedure. The final decision is given by majority vote. Experiments with simulated and real data sets showed that the proposed combining method is more efficient and effective than naive approach provided some conditions. It was also showed that OPF training step runs faster for a series of small subsets than for the whole training set. The combining scheme was also designed to support parallel or distributed processing, speeding up the procedure even more. © 2011 Springer-Verlag.
Resumo:
Pós-graduação em Agronomia (Energia na Agricultura) - FCA
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This paper addresses the investment decisions considering the presence of financial constraints of 373 large Brazilian firms from 1997 to 2004, using panel data. A Bayesian econometric model was used considering ridge regression for multicollinearity problems among the variables in the model. Prior distributions are assumed for the parameters, classifying the model into random or fixed effects. We used a Bayesian approach to estimate the parameters, considering normal and Student t distributions for the error and assumed that the initial values for the lagged dependent variable are not fixed, but generated by a random process. The recursive predictive density criterion was used for model comparisons. Twenty models were tested and the results indicated that multicollinearity does influence the value of the estimated parameters. Controlling for capital intensity, financial constraints are found to be more important for capital-intensive firms, probably due to their lower profitability indexes, higher fixed costs and higher degree of property diversification.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Lianas can change forest dynamics, slowing down forest regeneration after a perturbation. In these cases, it may be necessary to manage these woody climbers. Our aim was to simulate two management strategies: (1) focusing on abundant liana species and (2) focusing on the largest lianas, and contrast them with the random removal of lianas. We applied mathematical simulations for liana removal in three different vegetation types in southeastern Brazil: a Rainforest, a Seasonal Tropical Forest, and a Woodland Savanna. Using these samples, we performed simulations based on two liana removal procedures and compared them with random removal. We also used regression analysis with quasi-Poisson distribution to test whether larger lianas were aggressive, i.e., if they climbed into many trees. The procedure of cutting larger lianas was as effective as cutting them randomly and proved not to be a good method for liana management. Moreover, most of the lianas climbed into one or two trees, i.e., were not aggressive. Cutting the most abundant lianas proved to be a more effective method than cutting lianas randomly. This method could maintain liana richness and presumably should accelerate forest regeneration.
Resumo:
The design of a network is a solution to several engineering and science problems. Several network design problems are known to be NP-hard, and population-based metaheuristics like evolutionary algorithms (EAs) have been largely investigated for such problems. Such optimization methods simultaneously generate a large number of potential solutions to investigate the search space in breadth and, consequently, to avoid local optima. Obtaining a potential solution usually involves the construction and maintenance of several spanning trees, or more generally, spanning forests. To efficiently explore the search space, special data structures have been developed to provide operations that manipulate a set of spanning trees (population). For a tree with n nodes, the most efficient data structures available in the literature require time O(n) to generate a new spanning tree that modifies an existing one and to store the new solution. We propose a new data structure, called node-depth-degree representation (NDDR), and we demonstrate that using this encoding, generating a new spanning forest requires average time O(root n). Experiments with an EA based on NDDR applied to large-scale instances of the degree-constrained minimum spanning tree problem have shown that the implementation adds small constants and lower order terms to the theoretical bound.
Resumo:
An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. This novel class of models provides a useful generalization of the heteroscedastic symmetrical nonlinear regression models (Cysneiros et al., 2010), since the random term distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as skew-t, skew-slash, skew-contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters is presented and the observed information matrix is derived analytically. In order to examine the performance of the proposed methods, some simulation studies are presented to show the robust aspect of this flexible class against outlying and influential observations and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Furthermore, local influence measures and the one-step approximations of the estimates in the case-deletion model are obtained. Finally, an illustration of the methodology is given considering a data set previously analyzed under the homoscedastic skew-t nonlinear regression model. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Investigating tree's spatial patterns according to their size classes and according to their more abundant species can provide evidences about the structure of the vegetal community, since the spatial pattern is a key question for forestry ecology studies. The tree spatial organization patterns on the environment depend on several ecological processes and on the specific characteristics of each environment, so that the best understanding of this frame provides important elements for the knowledge on forestry formation. This paper aimed to study tree spatial patterns, according to the diameter classes and from four most abundant species in different forests, in order to provide evidences regarding to the ecology of each vegetal community. The spatial pattern description in each forestry formation was developed using Ripley's K function. The studied forestry formations were: Ombrophilous Forest, Cerradao, Seasonal Forest and Restinga Forest. In this work, a 10.24 ha plot was installed in each forestry formation, and every tree, with a circumference at breast height (CBH) larger than 15 cm were measured, georeferenced and identified. The obtained data highlights the aggregated character in tropical forests, as observed in every studied forest. The 'Cerraddo' and 'Restinga' forest trees showed close aggregate patterns. In the Ombrophilous forest, for all distance scales, the aggregate pattern was meaningful. In the seasonal forest, a random tendency was observed, although a meaningful aggregation was observed in all short distances. The spatial pattern by diameter classes was generally aggregated for trees smaller than 10 cm of diameter and between 10 and 20 cm and random for the others, proving the existence of a tendency which in young trees is more aggregated than in old ones. The spatial pattern of the dominant species is always strongly similar to the general pattern of each forestry formation. The differences between the spatial patterns of two or three coincident species, among the forestry formations, indicate that its pattern is influenced by each different environment. This stands out the importance of its self-ecology and of its ecological processes, intrinsic of each community that can explain the observed patterns.
Resumo:
Effects of roads on wildlife and its habitat have been measured using metrics, such as the nearest road distance, road density, and effective mesh size. In this work we introduce two new indices: (1) Integral Road Effect (IRE), which measured the sum effects of points in a road at a fixed point in the forest; and (2) Average Value of the Infinitesimal Road Effect (AVIRE), which measured the average of the effects of roads at this point. IRE is formally defined as the line integral of a special function (the infinitesimal road effect) along the curves that model the roads, whereas AVIRE is the quotient of IRE by the length of the roads. Combining tools of ArcGIS software with a numerical algorithm, we calculated these and other road and habitat cover indices in a sample of points in a human-modified landscape in the Brazilian Atlantic Forest, where data on the abundance of two groups of small mammals (forest specialists and habitat generalists) were collected in the field. We then compared through the Akaike Information Criterion (AIC) a set of candidate regression models to explain the variation in small mammal abundance, including models with our two new road indices (AVIRE and IRE) or models with other road effect indices (nearest road distance, mesh size, and road density), and reference models (containing only habitat indices, or only the intercept without the effect of any variable). Compared to other road effect indices, AVIRE showed the best performance to explain abundance of forest specialist species, whereas the nearest road distance obtained the best performance to generalist species. AVIRE and habitat together were included in the best model for both small mammal groups, that is, higher abundance of specialist and generalist small mammals occurred where there is lower average road effect (less AVIRE) and more habitat. Moreover, AVIRE was not significantly correlated with habitat cover of specialists and generalists differing from the other road effect indices, except mesh size, which allows for separating the effect of roads from the effect of habitat on small mammal communities. We suggest that the proposed indices and GIS procedures could also be useful to describe other spatial ecological phenomena, such as edge effect in habitat fragments. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The Atlantic Forest is one of the most threatened tropical biomes, with much of the standing forest in small (less than 50 ha), disturbed and isolated patches. The pattern of land-use and land-cover change (LULCC) which has resulted in this critical scenario has not yet been fully investigated. Here, we describe the LULCC in three Atlantic Forest fragmented landscapes (Sao Paulo, Brazil) between 1960-1980s and 1980-2000s. The three studied landscapes differ in the current proportion of forest cover, having 10%, 30% and 50% respectively. Between the 1960s and 1980s. forest cover of two landscapes was reduced while the forest cover in the third landscape increased slightly. The opposite trend was observed between the 1980s and 2000s: forest regeneration was greater than deforestation at the landscapes with 10% and 50% of forest cover and, as a consequence, forest cover increased. By contrast, the percentage of forest cover at the landscape with 30% of forest cover was drastically reduced between the 1980s and 2000s. LULCC deviated from a random trajectory, were not constant through time in two study landscapes and were not constant across space in a given time period. This landscape dynamism in single locations over small temporal scales is a key factor to be considered in models of LULCC to accurately simulate future changes for the Atlantic Forest. In general, forest patches became more isolated when deforestation was greater than forest regeneration and became more connected when forest regeneration was greater than deforestation. As a result of the dynamic experienced by the study landscapes, individual forest patches currently consist of a mosaic of different forest age classes which is likely to impact bio-diversity. Furthermore, landscape dynamics suggests the beginning of a forest transition in some Atlantic Forest regions, what could be of great importance for biodiversity conservation due to the potential effects of young secondary forests in reducing forest isolation and maintaining a significant amount of the original biodiversity. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The aim of this study was to estimate the stock of biomass and organic carbon in a montane mixed shade forest located near General Carneiro, PR. 20 plots of 12 m x 12 m were installed, in which all trees with a CBH (Circumference at Breast Height) >= 31.4 cm were felled. From these the following information was obtained: total height, commercial height (agreed as being the morphological inversion point in the natural forest and the height of the first live branch), CBH, identification and collection of herbarium specimens. For the quantification of biomass in the understory and roots, three subunits 1 m x 1 m in each sampling unit were installed (12 m x 12 m) arranged in the lower left corner, center and diagonal upper right corner. To quantify accumulated litter at random, eight samples in each sampling unit were collected (12 m x 12 m), using a metal device measuring 0.25 m x 0.25 m. The montane mixed shade forest has more than 85% of its total biomass and total organic carbon stored in above ground plant structures. The total stock of organic carbon found in this study (104.7 Mg ha(-1)) demonstrates the importance of maintaining and preserving natural ecosystems as a way of maintaining this stock of organic carbon fixed in plant biomass.