97 resultados para Boosted regression trees
em Queensland University of Technology - ePrints Archive
Resumo:
This paper presents an approach to predict the operating conditions of machine based on classification and regression trees (CART) and adaptive neuro-fuzzy inference system (ANFIS) in association with direct prediction strategy for multi-step ahead prediction of time series techniques. In this study, the number of available observations and the number of predicted steps are initially determined by using false nearest neighbor method and auto mutual information technique, respectively. These values are subsequently utilized as inputs for prediction models to forecast the future values of the machines’ operating conditions. The performance of the proposed approach is then evaluated by using real trending data of low methane compressor. A comparative study of the predicted results obtained from CART and ANFIS models is also carried out to appraise the prediction capability of these models. The results show that the ANFIS prediction model can track the change in machine conditions and has the potential for using as a tool to machine fault prognosis.
Resumo:
Objectives Demonstrate the application of decision trees – classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs) – to understand structure in missing data. Setting Data taken from employees at three different industry sites in Australia. Participants 7915 observations were included. Materials and Methods The approach was evaluated using an occupational health dataset comprising results of questionnaires, medical tests, and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the Type of data (medical or environmental), the site in which it was collected, the number of visits and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusion Researchers are encouraged to use CART and BRT models to explore and understand missing data.
Resumo:
Spatially explicit information on local perceptions of ecosystem services is needed to inform land use planning within rapidly changing landscapes. In this paper we spatially modelled local people's use and perceptions of benefits from forest ecosystem services in Borneo, from interviews of 1837 people in 185 villages. Questions related to provisioning, cultural/spiritual, regulating and supporting ecosystem services derived from forest, and attitudes towards forest conversion. We used boosted regression trees (BRTs) to combine interview data with social and environmental predictors to understand spatial variation of perceptions across Borneo. Our results show that people use a variety of products from intact and highly degraded forests. Perceptions of benefits from forests were strongest: in human-altered forest landscapes for cultural and spiritual benefits; in human-altered and intact forests landscapes for health benefits; intact forest for direct health benefits, such as medicinal plants; and in regions with little forest and extensive plantations, for environmental benefits, such as climatic impacts from deforestation. Forest clearing for small scale agriculture was predicted to be widely supported yet less so for large-scale agriculture. Understanding perceptions of rural communities in dynamic, multi-use landscapes is important where people are often directly affected by the decline in ecosystem services.
Resumo:
Aim Determining how ecological processes vary across space is a major focus in ecology. Current methods that investigate such effects remain constrained by important limiting assumptions. Here we provide an extension to geographically weighted regression in which local regression and spatial weighting are used in combination. This method can be used to investigate non-stationarity and spatial-scale effects using any regression technique that can accommodate uneven weighting of observations, including machine learning. Innovation We extend the use of spatial weights to generalized linear models and boosted regression trees by using simulated data for which the results are known, and compare these local approaches with existing alternatives such as geographically weighted regression (GWR). The spatial weighting procedure (1) explained up to 80% deviance in simulated species richness, (2) optimized the normal distribution of model residuals when applied to generalized linear models versus GWR, and (3) detected nonlinear relationships and interactions between response variables and their predictors when applied to boosted regression trees. Predictor ranking changed with spatial scale, highlighting the scales at which different species–environment relationships need to be considered. Main conclusions GWR is useful for investigating spatially varying species–environment relationships. However, the use of local weights implemented in alternative modelling techniques can help detect nonlinear relationships and high-order interactions that were previously unassessed. Therefore, this method not only informs us how location and scale influence our perception of patterns and processes, it also offers a way to deal with different ecological interpretations that can emerge as different areas of spatial influence are considered during model fitting.
Resumo:
The benefits of applying tree-based methods to the purpose of modelling financial assets as opposed to linear factor analysis are increasingly being understood by market practitioners. Tree-based models such as CART (classification and regression trees) are particularly well suited to analysing stock market data which is noisy and often contains non-linear relationships and high-order interactions. CART was originally developed in the 1980s by medical researchers disheartened by the stringent assumptions applied by traditional regression analysis (Brieman et al. [1984]). In the intervening years, CART has been successfully applied to many areas of finance such as the classification of financial distress of firms (see Frydman, Altman and Kao [1985]), asset allocation (see Sorensen, Mezrich and Miller [1996]), equity style timing (see Kao and Shumaker [1999]) and stock selection (see Sorensen, Miller and Ooi [2000])...
Resumo:
The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.
Resumo:
The roles of weather variability and sunspots in the occurrence of cyanobacteria blooms, were investigated using cyanobacteria cell data collected from the Fred Haigh Dam, Queensland, Australia. Time series generalized linear model and classification and regression (CART) model were used in the analysis. Data on notified cell numbers of cyanobacteria and weather variables over the periods 2001 and 2005 were provided by the Australian Department of Natural Resources and Water, and Australian Bureau of Meteorology, respectively. The results indicate that monthly minimum temperature (relative risk [RR]: 1.13, 95% confidence interval [CI]: 1.02-1.25) and rainfall (RR: 1.11; 95% CI: 1.03-1.20) had a positive association, but relative humidity (RR: 0.94; 95% CI: 0.91-0.98) and wind speed (RR:0.90; 95% CI: 0.82-0.98) were negatively associated with the cyanobacterial numbers, after adjustment for seasonality and auto-correlation. The CART model showed that the cyanobacteria numbers were best described by an interaction between minimum temperature, relative humidity, and sunspot numbers. When minimum temperature exceeded 18%C and relative humidity was under 66%, the number of cyanobacterial cells rose by 2.15-fold. We conclude that the weather variability and sunspot activity may affect cyanobacterial blooms in dams.
Resumo:
Objectives: The objectives of this study were to specifically investigate the differences in culture, attitudes and social networks between Australian and Taiwanese men and women and identify the factors that predict midlife men and women’s quality of life in both countries. Methods: A stratified random sample strategy based on probability proportional sampling (PPS) was conducted to investigate 278 Australian and 398 Taiwanese midlife men and women’s quality of life. Multiple regression modelling and classification and regression trees (CARTs) were performed to examine the potential differences on culture, attitude, social networks, social demographic factors and religion/spirituality in midlife men and women’s quality of life in both Australia and Taiwan. Results: The results of this study suggest that culture involves multiple functions and interacts with attitudes, social networks and individual factors to influence a person’s quality of life. Significant relationships were found between the interaction between cultural circumstances and a person’s internal and external factors. The research found that good social support networks and a healthy optimistic disposition may significantly enhance midlife men and women’s quality of life. Conclusion: The study indicated that there is a significant relationship between culture, attitude, social networks and quality of life in midlife Australian and Taiwanese men and women. People who had higher levels of horizontal individualism and collectivism, positive attitudes and better social support had better psychological, social, physical and environmental health, while it emerged that vertical individualists with competitive characteristics would experience a lower quality of life. This study has highlighted areas where opportunities exist to further reflect upon contemporary social health policies for Australian and Taiwanese societies and also within the global perspective, in order to provide enhanced quality care for growing midlife populations.
Resumo:
Habitat models are widely used in ecology, however there are relatively few studies of rare species, primarily because of a paucity of survey records and lack of robust means of assessing accuracy of modelled spatial predictions. We investigated the potential of compiled ecological data in developing habitat models for Macadamia integrifolia, a vulnerable mid-stratum tree endemic to lowland subtropical rainforests of southeast Queensland, Australia. We compared performance of two binomial models—Classification and Regression Trees (CART) and Generalised Additive Models (GAM)—with Maximum Entropy (MAXENT) models developed from (i) presence records and available absence data and (ii) developed using presence records and background data. The GAM model was the best performer across the range of evaluation measures employed, however all models were assessed as potentially useful for informing in situ conservation of M. integrifolia, A significant loss in the amount of M. integrifolia habitat has occurred (p < 0.05), with only 37% of former habitat (pre-clearing) remaining in 2003. Remnant patches are significantly smaller, have larger edge-to-area ratios and are more isolated from each other compared to pre-clearing configurations (p < 0.05). Whilst the network of suitable habitat patches is still largely intact, there are numerous smaller patches that are more isolated in the contemporary landscape compared with their connectedness before clearing. These results suggest that in situ conservation of M. integrifolia may be best achieved through a landscape approach that considers the relative contribution of small remnant habitat fragments to the species as a whole, as facilitating connectivity among the entire network of habitat patches.