946 resultados para REGRESSION TREE
Resumo:
The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga
Resumo:
Focuses on a study which introduced an iterative modeling method that combines properties of ordinary least squares (OLS) with hierarchical tree-based regression (HTBR) in transportation engineering. Information on OLS and HTBR; Comparison and contrasts of OLS and HTBR; Conclusions.
Resumo:
Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.
Resumo:
Habitat models are widely used in ecology, however there are relatively few studies of rare species, primarily because of a paucity of survey records and lack of robust means of assessing accuracy of modelled spatial predictions. We investigated the potential of compiled ecological data in developing habitat models for Macadamia integrifolia, a vulnerable mid-stratum tree endemic to lowland subtropical rainforests of southeast Queensland, Australia. We compared performance of two binomial models—Classification and Regression Trees (CART) and Generalised Additive Models (GAM)—with Maximum Entropy (MAXENT) models developed from (i) presence records and available absence data and (ii) developed using presence records and background data. The GAM model was the best performer across the range of evaluation measures employed, however all models were assessed as potentially useful for informing in situ conservation of M. integrifolia, A significant loss in the amount of M. integrifolia habitat has occurred (p < 0.05), with only 37% of former habitat (pre-clearing) remaining in 2003. Remnant patches are significantly smaller, have larger edge-to-area ratios and are more isolated from each other compared to pre-clearing configurations (p < 0.05). Whilst the network of suitable habitat patches is still largely intact, there are numerous smaller patches that are more isolated in the contemporary landscape compared with their connectedness before clearing. These results suggest that in situ conservation of M. integrifolia may be best achieved through a landscape approach that considers the relative contribution of small remnant habitat fragments to the species as a whole, as facilitating connectivity among the entire network of habitat patches.
Resumo:
The benefits of applying tree-based methods to the purpose of modelling financial assets as opposed to linear factor analysis are increasingly being understood by market practitioners. Tree-based models such as CART (classification and regression trees) are particularly well suited to analysing stock market data which is noisy and often contains non-linear relationships and high-order interactions. CART was originally developed in the 1980s by medical researchers disheartened by the stringent assumptions applied by traditional regression analysis (Brieman et al. [1984]). In the intervening years, CART has been successfully applied to many areas of finance such as the classification of financial distress of firms (see Frydman, Altman and Kao [1985]), asset allocation (see Sorensen, Mezrich and Miller [1996]), equity style timing (see Kao and Shumaker [1999]) and stock selection (see Sorensen, Miller and Ooi [2000])...
Resumo:
BACKGROUND: Children with atopic diseases in early life are frequently found with positive IgE tests to peanuts/tree nuts without a history of previous ingestion. We aimed to identify risk factors for reactions to nuts at first introduction. METHODS: A retrospective case-note and database analysis was performed. Recruitment criteria were: patients aged 3-16 yr who had a standardized food challenge to peanut and/or tree nuts due to sensitisation to the peanut/tree nut (positive spIgE or SPT) without previous consumption. A detailed assessment was performed of factors relating to food challenge outcome with univariate and multivariate logistic regression analysis. RESULTS: There were 98 food challenges (47 peanut, 51 tree nut) with 29 positive, 67 negative and 2 inconclusive outcomes. A positive maternal history of allergy and a specific IgE >5 kU/l were strongly associated with a significantly increased risk of a positive food challenge (OR 3.73; 95% CI 1.31-10.59; p = 0.013 and OR 3.35; 95% CI 1.23-9.11; p = 0.007, respectively). Adjusting for age, a three year-old with these criteria has a 67% probability of a positive challenge. There was no significant association between types of peanut/tree nut, other food allergies, atopic conditions or severity of previous food reactions and positive challenges. CONCLUSIONS: We have demonstrated an association between the presence of maternal atopic history and a specific IgE >5 kU/l, with a significant increase in the likelihood of a positive food challenge. Although requiring further prospective validation these easily identifiable components should be considered when deciding the need for a challenge.
Resumo:
Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.
Resumo:
This study extends the current knowledge regarding the use of plants for the passive accumulation of anthropogenic PAHs that are present in the atmospheric total suspended particles (TSP) in the tropics and sub-tropics. It is of major relevance because the anthropic emissions of TSP containing PAHs are significant in these regions, but their monitoring is still scarce. We compared the biomonitor efficiency of Lolium multiflorum 'Lema' and tropical tree species (Tibouchina pukka and Psidium guajava 'Paluma') that were growing in an intensely TSP-polluted site in Cubatao (SE Brazil), and established the species with the highest potential for alternative monitoring of PAHs. PAHs present in the TSP indicated that the region is impacted by various emission sources. L. multiflorum showed a greater efficiency for the accumulation of PAH compounds on their leaves than the tropical trees. The linear regression between the logBCF and logKoa revealed that L. multiflorum is an efficient biomonitor of the profile of light and heavy PAHs present in the particulate phase of the atmosphere during dry weather and mild temperatures. The grass should be used only for indicating the PAHs with higher molecular weight in warmer and wetter periods. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Smoke spikes occurring during transient engine operation have detrimental health effects and increase fuel consumption by requiring more frequent regeneration of the diesel particulate filter. This paper proposes a decision tree approach to real-time detection of smoke spikes for control and on-board diagnostics purposes. A contemporary, electronically controlled heavy-duty diesel engine was used to investigate the deficiencies of smoke control based on the fuel-to-oxygen-ratio limit. With the aid of transient and steady state data analysis and empirical as well as dimensional modeling, it was shown that the fuel-to-oxygen ratio was not estimated correctly during the turbocharger lag period. This inaccuracy was attributed to the large manifold pressure ratios and low exhaust gas recirculation flows recorded during the turbocharger lag period, which meant that engine control module correlations for the exhaust gas recirculation flow and the volumetric efficiency had to be extrapolated. The engine control module correlations were based on steady state data and it was shown that, unless the turbocharger efficiency is artificially reduced, the large manifold pressure ratios observed during the turbocharger lag period cannot be achieved at steady state. Additionally, the cylinder-to-cylinder variation during this period were shown to be sufficiently significant to make the average fuel-to-oxygen ratio a poor predictor of the transient smoke emissions. The steady state data also showed higher smoke emissions with higher exhaust gas recirculation fractions at constant fuel-to-oxygen-ratio levels. This suggests that, even if the fuel-to-oxygen ratios were to be estimated accurately for each cylinder, they would still be ineffective as smoke limiters. A decision tree trained on snap throttle data and pruned with engineering knowledge was able to use the inaccurate engine control module estimates of the fuel-to-oxygen ratio together with information on the engine control module estimate of the exhaust gas recirculation fraction, the engine speed, and the manifold pressure ratio to predict 94% of all spikes occurring over the Federal Test Procedure cycle. The advantages of this non-parametric approach over other commonly used parametric empirical methods such as regression were described. An application of accurate smoke spike detection in which the injection pressure is increased at points with a high opacity to reduce the cumulative particulate matter emissions substantially with a minimum increase in the cumulative nitrogrn oxide emissions was illustrated with dimensional and empirical modeling.
Resumo:
Species selection for forest restoration is often supported by expert knowledge on local distribution patterns of native tree species. This approach is not applicable to largely deforested regions unless enough data on pre-human tree species distribution is available. In such regions, ecological niche models may provide essential information to support species selection in the framework of forest restoration planning. In this study we used ecological niche models to predict habitat suitability for native tree species in "Tierra de Campos" region, an almost totally deforested area of the Duero Basin (Spain). Previously available models provide habitat suitability predictions for dominant native tree species, but including non-dominant tree species in the forest restoration planning may be desirable to promote biodiversity, specially in largely deforested areas were near seed sources are not expected. We used the Forest Map of Spain as species occurrence data source to maximize the number of modeled tree species. Penalized logistic regression was used to train models using climate and lithological predictors. Using model predictions a set of tools were developed to support species selection in forest restoration planning. Model predictions were used to build ordered lists of suitable species for each cell of the study area. The suitable species lists were summarized drawing maps that showed the two most suitable species for each cell. Additionally, potential distribution maps of the suitable species for the study area were drawn. For a scenario with two dominant species, the models predicted a mixed forest (Quercus ilex and a coniferous tree species) for almost one half of the study area. According to the models, 22 non-dominant native tree species are suitable for the study area, with up to six suitable species per cell. The model predictions pointed to Crataegus monogyna, Juniperus communis, J.oxycedrus and J.phoenicea as the most suitable non-dominant native tree species in the study area. Our results encourage further use of ecological niche models for forest restoration planning in largely deforested regions.
Resumo:
The direct application of existing models for seed germination may often be inadequate in the context of ecology and forestry germination experiments. This is because basic model assumptions are violated and variables available to forest managers are rarely used. In this paper, we present a method which addresses the aforementioned shortcomings. The approach is illustrated through a case study of Pinus pinea L. Our findings will also shed light on the role of germination in the general failure of natural regeneration in managed forests of this species. The presented technique consists of a mixed regression model based on survival analysis. Climate and stand covariates were tested. Data for fitting the model were gathered from a 5-year germination experiment in a mature, managed P. pinea stand in the Northern Plateau of Spain in which two different stand densities can be found. The model predictions proved to be unbiased and highly accurate when compared with the training data. Germination in P. pinea was controlled through thermal variables at stand level. At microsite level, low densities negatively affected the probability of germination. A time-lag in the response was also detected. Overall, the proposed technique provides a reliable alternative to germination modelling in ecology/forestry studies by using accessible/ suitable variables. The P. pinea case study highlights the importance of producing unbiased predictions. In this species, the occurrence and timing of germination suggest a very different regeneration strategy from that understood by forest managers until now, which may explain the high failure rate of natural regeneration in managed stands. In addition, these findings provide valuable information for the management of P. pinea under climate-change conditions.