971 resultados para tree-ensemble models
Resumo:
The majority of past and current individual-tree growth modelling methodologies have failed to characterise and incorporate structured stochastic components. Rather, they have relied on deterministic predictions or have added an unstructured random component to predictions. In particular, spatial stochastic structure has been neglected, despite being present in most applications of individual-tree growth models. Spatial stochastic structure (also called spatial dependence or spatial autocorrelation) eventuates when spatial influences such as competition and micro-site effects are not fully captured in models. Temporal stochastic structure (also called temporal dependence or temporal autocorrelation) eventuates when a sequence of measurements is taken on an individual-tree over time, and variables explaining temporal variation in these measurements are not included in the model. Nested stochastic structure eventuates when measurements are combined across sampling units and differences among the sampling units are not fully captured in the model. This review examines spatial, temporal, and nested stochastic structure and instances where each has been characterised in the forest biometry and statistical literature. Methodologies for incorporating stochastic structure in growth model estimation and prediction are described. Benefits from incorporation of stochastic structure include valid statistical inference, improved estimation efficiency, and more realistic and theoretically sound predictions. It is proposed in this review that individual-tree modelling methodologies need to characterise and include structured stochasticity. Possibilities for future research are discussed. (C) 2001 Elsevier Science B.V. All rights reserved.
Predicting the growth response to thinning for Scots pine stands using individual-tree growth models
Resumo:
Summary
Resumo:
The medium term hydropower scheduling (MTHS) problem involves an attempt to determine, for each time stage of the planning period, the amount of generation at each hydro plant which will maximize the expected future benefits throughout the planning period, while respecting plant operational constraints. Besides, it is important to emphasize that this decision-making has been done based mainly on inflow earliness knowledge. To perform the forecast of a determinate basin, it is possible to use some intelligent computational approaches. In this paper one considers the Dynamic Programming (DP) with the inflows given by their average values, thus turning the problem into a deterministic one which the solution can be obtained by deterministic DP (DDP). The performance of the DDP technique in the MTHS problem was assessed by simulation using the ensemble prediction models. Features and sensitivities of these models are discussed. © 2012 IEEE.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
The goal of this paper is to introduce a class of tree-structured models that combines aspects of regression trees and smooth transition regression models. The model is called the Smooth Transition Regression Tree (STR-Tree). The main idea relies on specifying a multiple-regime parametric model through a tree-growing procedure with smooth transitions among different regimes. Decisions about splits are entirely based on a sequence of Lagrange Multiplier (LM) tests of hypotheses.
Resumo:
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building 'under fit' models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building 'over fit' models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.
Resumo:
The objective of this study was to evaluate the performance of stacked species distribution models in predicting the alpha and gamma species diversity patterns of two important plant clades along elevation in the Andes. We modelled the distribution of the species in the Anthurium genus (53 species) and the Bromeliaceae family (89 species) using six modelling techniques. We combined all of the predictions for the same species in ensemble models based on two different criteria: the average of the rescaled predictions by all techniques and the average of the best techniques. The rescaled predictions were then reclassified into binary predictions (presence/absence). By stacking either the original predictions or binary predictions for both ensemble procedures, we obtained four different species richness models per taxa. The gamma and alpha diversity per elevation band (500 m) was also computed. To evaluate the prediction abilities for the four predictions of species richness and gamma diversity, the models were compared with the real data along an elevation gradient that was independently compiled by specialists. Finally, we also tested whether our richness models performed better than a null model of altitudinal changes of diversity based on the literature. Stacking of the ensemble prediction of the individual species models generated richness models that proved to be well correlated with the observed alpha diversity richness patterns along elevation and with the gamma diversity derived from the literature. Overall, these models tend to overpredict species richness. The use of the ensemble predictions from the species models built with different techniques seems very promising for modelling of species assemblages. Stacking of the binary models reduced the over-prediction, although more research is needed. The randomisation test proved to be a promising method for testing the performance of the stacked models, but other implementations may still be developed.
Resumo:
En el presente trabajo se presenta una revisión sobre los modelos forestales desarrollados en España durante los últimos años, tanto para la producción maderable como no maderable y, para la dinámica de los bosques (regeneración, mortalidad). Se presentan modelos tanto de rodal completo como de clases diamétricas y de árbol individual. Los modelos desarrollados hasta la fecha se han desarrollado a partir de datos procedentes de parcelas permanentes, ensayos y el Inventario Forestal Nacional. En el trabajo se muestran los diferentes submodelos desarrollados hasta la fecha, así como las plataformas informáticas que permiten utilizar dichos modelos. Se incluyen las principales perspectivas de desarrollo de la modelización forestal en España.
Resumo:
ABSTRACT The objective of this study was to select allometric models to estimate total and pooled aboveground biomass of 4.5-year-old capixingui trees established in an agrisilvicultural system. Aboveground biomass distribution of capixingui was also evaluated. Single- (diameter at breast height [DBH] or crown diameter or stem diameter as the independent variable) and double-entry (DBH or crown diameter or stem diameter and total height as independent variables) models were studied. The estimated total biomass was 17.3 t.ha-1, corresponding to 86.6 kg per tree. All models showed a good fit to the data (R2ad > 0.85) for bole, branches, and total biomass. DBH-based models presented the best residual distribution. Model lnW = b0 + b1* lnDBH can be recommended for aboveground biomass estimation. Lower coefficients were obtained for leaves (R2ad > 82%). Biomass distribution followed the order: bole>branches>leaves. Bole biomass percentage decreased with increasing DBH of the trees, whereas branch biomass increased.
Resumo:
With the service life of water supply network (WSN) growth, the growing phenomenon of aging pipe network has become exceedingly serious. As urban water supply network is hidden underground asset, it is difficult for monitoring staff to make a direct classification towards the faults of pipe network by means of the modern detecting technology. In this paper, based on the basic property data (e.g. diameter, material, pressure, distance to pump, distance to tank, load, etc.) of water supply network, decision tree algorithm (C4.5) has been carried out to classify the specific situation of water supply pipeline. Part of the historical data was used to establish a decision tree classification model, and the remaining historical data was used to validate this established model. Adopting statistical methods were used to access the decision tree model including basic statistical method, Receiver Operating Characteristic (ROC) and Recall-Precision Curves (RPC). These methods has been successfully used to assess the accuracy of this established classification model of water pipe network. The purpose of classification model was to classify the specific condition of water pipe network. It is important to maintain the pipeline according to the classification results including asset unserviceable (AU), near perfect condition (NPC) and serious deterioration (SD). Finally, this research focused on pipe classification which plays a significant role in maintaining water supply networks in the future.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
Genetic variation and environmental heterogeneity fundamentally shape the interactions between plants of the same species. According to the resource partitioning hypothesis, competition between neighbors intensifies as their similarity increases. Such competition may change in response to increasing supplies of limiting resources. We tested the resource partitioning hypothesis in stands of genetically identical (clone-origin) and genetically diverse (seed-origin) Eucalyptus trees with different water and nutrient supplies, using individual-based tree growth models. We found that genetic variation greatly reduced competitive interactions between neighboring trees, supporting the resource partitioning hypothesis. The importance of genetic variation for Eucalyptus growth patterns depended strongly on local stand structure and focal tree size. This suggests that spatial and temporal variation in the strength of species interactions leads to reversals in the growth rank of seed-origin and clone-origin trees. This study is one of the first to experimentally test the resource partitioning hypothesis for intergenotypic vs. intragenotypic interactions in trees. We provide evidence that variation at the level of genes, and not just species, is functionally important for driving individual and community-level processes in forested ecosystems.
Resumo:
Specific leaf area (SLA; m(leaf)(2) kg(leaf)(-1)) is a key ecophysiological parameter influencing leaf physiology, photosynthesis, and whole plant carbon gain. Both individual tree-based models and other forest process-based models are generally highly sensitive to this parameter, but information on its temporal or within-stand variability is still scarce. In a 2-4-year-old Eucalyptus plantation in Congo, prone to seasonal drought, the within-stand and seasonal variability in SLA were investigated by means of destructive sampling carried out at 2-month intervals, over a 2-year period. Within-crown vertical gradients of SLA were small. Highly significant relationships were found between tree-average SLA (SLA(t)) and tree size (tree height, H(t), or diameter at breast height, DBH): SLA(t) ranged from about 9 m(2) kg(-1) for dominant trees to about 14-15 m(2) kg(-1) for the smallest trees. The decrease in SLA(t) with increasing tree size was accurately predicted from DBH using power functions. Stand-average SLA varied by about 20% during the year, with lowest values at the end of the 5-month dry season, and highest values about 2-3 months after the onset of the wet season. Variability in leaf water status according to tree size and season is discussed as a possible determinant of both the within-stand and seasonal variations in SM. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Nowadays, reducing energy consumption is one of the highest priorities and biggest challenges faced worldwide and in particular in the industrial sector. Given the increasing trend of consumption and the current economical crisis, identifying cost reductions on the most energy-intensive sectors has become one of the main concerns among companies and researchers. Particularly in industrial environments, energy consumption is affected by several factors, namely production factors(e.g. equipments), human (e.g. operators experience), environmental (e.g. temperature), among others, which influence the way of how energy is used across the plant. Therefore, several approaches for identifying consumption causes have been suggested and discussed. However, the existing methods only provide guidelines for energy consumption and have shown difficulties in explaining certain energy consumption patterns due to the lack of structure to incorporate context influence, hence are not able to track down the causes of consumption to a process level, where optimization measures can actually take place. This dissertation proposes a new approach to tackle this issue, by on-line estimation of context-based energy consumption models, which are able to map operating context to consumption patterns. Context identification is performed by regression tree algorithms. Energy consumption estimation is achieved by means of a multi-model architecture using multiple RLS algorithms, locally estimated for each operating context. Lastly, the proposed approach is applied to a real cement plant grinding circuit. Experimental results prove the viability of the overall system, regarding both automatic context identification and energy consumption estimation.
Resumo:
Rare species have restricted geographic ranges, habitat specialization, and/or small population sizes. Datasets on rare species distribution usually have few observations, limited spatial accuracy and lack of valid absences; conversely they provide comprehensive views of species distributions allowing to realistically capture most of their realized environmental niche. Rare species are the most in need of predictive distribution modelling but also the most difficult to model. We refer to this contrast as the "rare species modelling paradox" and propose as a solution developing modelling approaches that deal with a sufficiently large set of predictors, ensuring that statistical models aren't overfitted. Our novel approach fulfils this condition by fitting a large number of bivariate models and averaging them with a weighted ensemble approach. We further propose that this ensemble forecasting is conducted within a hierarchic multi-scale framework. We present two ensemble models for a test species, one at regional and one at local scale, each based on the combination of 630 models. In both cases, we obtained excellent spatial projections, unusual when modelling rare species. Model results highlight, from a statistically sound approach, the effects of multiple drivers in a same modelling framework and at two distinct scales. From this added information, regional models can support accurate forecasts of range dynamics under climate change scenarios, whereas local models allow the assessment of isolated or synergistic impacts of changes in multiple predictors. This novel framework provides a baseline for adaptive conservation, management and monitoring of rare species at distinct spatial and temporal scales.