883 resultados para Regression Trees
Resumo:
The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors. In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually. To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.
Resumo:
Um sistema de predição de alarmes com a finalidade de auxiliar a implantação de uma política de manutenção preditiva industrial e de constituir-se em uma ferramenta gerencial de apoio à tomada de decisão é proposto neste trabalho. O sistema adquire leituras de diversos sensores instalados na planta, extrai suas características e avalia a saúde do equipamento. O diagnóstico e prognóstico implica a classificação das condições de operação da planta. Técnicas de árvores de regressão e classificação não-supervisionada são utilizadas neste artigo. Uma amostra das medições de 73 variáveis feitas por sensores instalados em uma usina hidrelétrica foi utilizada para testar e validar a proposta. As medições foram amostradas em um período de 15 meses.
Resumo:
Endogenous and environmental variables are fundamental in explaining variations in fish condition. Based on more than 20 yr of fish weight and length data, relative condition indices were computed for anchovy and sardine caught in the Gulf of Lions. Classification and regression trees (CART) were used to identify endogenous factors affecting fish condition, and to group years of similar condition. Both species showed a similar annual cycle with condition being minimal in February and maximal in July. CART identified 3 groups of years where the fish populations generally showed poor, average and good condition and within which condition differed between age classes but not according to sex. In particular, during the period of poor condition (mostly recent years), sardines older than 1 yr appeared to be more strongly affected than younger individuals. Time-series were analyzed using generalized linear models (GLMs) to examine the effects of oceanographic abiotic (temperature, Western Mediterranean Oscillation [WeMO] and Rhone outflow) and biotic (chlorophyll a and 6 plankton classes) factors on fish condition. The selected models explained 48 and 35% of the variance of anchovy and sardine condition, respectively. Sardine condition was negatively related to temperature but positively related to the WeMO and mesozooplankton and diatom concentrations. A positive effect of mesozooplankton and Rhone runoff on anchovy condition was detected. The importance of increasing temperatures and reduced water mixing in the NW Mediterranean Sea, affecting planktonic productivity and thus fish condition by bottom-up control processes, was highlighted by these results. Changes in plankton quality, quantity and phenology could lead to insufficient or inadequate food supply for both species.
Resumo:
Mestrado em Ciências Actuariais
Resumo:
Abstract Background Smear negative pulmonary tuberculosis (SNPT) accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.
Resumo:
This paper presents a fault diagnosis method based on adaptive neuro-fuzzy inference system (ANFIS) in combination with decision trees. Classification and regression tree (CART) which is one of the decision tree methods is used as a feature selection procedure to select pertinent features from data set. The crisp rules obtained from the decision tree are then converted to fuzzy if-then rules that are employed to identify the structure of ANFIS classifier. The hybrid of back-propagation and least squares algorithm are utilized to tune the parameters of the membership functions. In order to evaluate the proposed algorithm, the data sets obtained from vibration signals and current signals of the induction motors are used. The results indicate that the CART–ANFIS model has potential for fault diagnosis of induction motors.
Resumo:
Nitrogen (N) is an essential nutrient in mango, influencing both productivity and fruit quality. In Australian mango orchards, tree N is traditionally assessed once a year at the dormant pre-flowering stage using laboratory analysis of leaf N. This single assessment is insufficient to determine tree N status at all stages of the annual phenological cycle. Development of a field-based rapid N test would allow more frequent monitoring of tree N status and improved fertiliser management. These experiments examined the accuracy and useability of several devices used in other horticultural crops to rapidly assess mango leaf N in the field; the Konica Minolta 'SPAD-502 chlorophyll meter', Horiba 'Cardy Meter' and the Merck 'RQflex 10.' Regression and correlation analyses were used to determine the relationship between total leaf N and the measurements from the rapid test devices. The relationship between the chlorophyll index measured by the SPAD-502 meter and leaf N was highly significant at late fruit set (R 2=0.72, n=40) and post-harvest (R 2=0.81, n=40) stages and significant at the flowering stage (R 2=0.51, n=40) in the cultivar 'Kensington Pride', indicating the device can be used to rapidly assess mango leaf N in the field. Correlation analysis indicated the relationship between petiole sap measured with the Cardy or Merck devices and leaf N was non-significant.
Resumo:
Aim Species generally become rarer and more patchily distributed as the margins of their ranges are approached. We predicted that in such marginal sites, tree species would tend to occur where some key environmental factors are at particularly favourable levels, compensating in part for the low overall suitability of marginal sites.
Location The article considers the spatial distributions of trees in Southeast Alaska (the Alaskan 'panhandle').
Methods We quantified range marginality using spatial distributions of eight tree species across more than one thousand surveyed sites in Southeast Alaska. For each species we derived a site core/margin index using a three-dimensional trend surface generated from logistic regression on site coordinates. For each species, the relationships between the environmental factors slope, aspect and site marginality were then compared for occupied and unoccupied sets of sites.
Results We found that site slope is important for more Alaskan tree species than aspect. Three out of eight had a significant core/margin by occupied/unoccupied interaction, tending to be present in significantly shallower-sloped (more favourable) sites in the marginal areas than the simple core/margin trend predicted. For site aspect, one species had a significant interaction, selecting potentially more favourable northerly aspects in marginal areas. A finer-scale analysis based on the same data came to the same overall conclusions.
Conclusions There is evidence that several tree species in Alaska tend to occur in especially favourable sites in marginal areas. In these marginal areas, these species amplify habitat preferences shown in core areas.
Resumo:
Psidium guajava ""Paluma"", a tropical tree species, is known to be an efficient ozone indicator in tropical countries. When exposed to ozone, this species displays a characteristic leaf injury identified by inter-veinal red stippling on adaxial leaf surfaces. Following 30 days of three ozone treatments consisting of carbon filtered air (CF - AOT40 = 17 ppb h), ambient non-filtered air (NF - AOT40 = 542 ppb h) and ambient non-filtered air + 40 ppb ozone (NF + O(3) - AOT40 - 7802 ppb h), the amounts of residual anthocyanins and tannins present in 10 P. guajava (""Paluma"") saplings were quantified. Higher amounts of anthocyanins were found in the NF + O(3) treatment (1.6%) when compared to the CF (0.97%) and NF (1.30%) (p < 0.05), and of total tannins in the NF + O(3) treatment (0.16%) compared to the CIF (0.14%). Condensed tannins showed the same tendency as enhanced amounts. Regression analyses using amounts of tannins and anthocyanins, AOT40 and the leaf injury index (LII), showed a correlation between the leaf injury index and quantities of anthocyanins and total tannins. These results are in accordance with the association between the incidence of red-stippled leaves and ozone polluted environments. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The work described was part of the programme, Innovative biological indicators to improve the efficiency of water and nitrogen use and the fruit quality in tree crops Project, a partnership between ISA and INRA. Field studies were conducted in Portugal on different irrigated plots of nectarine trees; a fully irrigated (unstressed plot) and a plot that was not irrigated for some days (stressed plot). The aim of this work was to investigate the effects of plant water stress on canopy temperature, to determine the nonwater-stressed baseline and to observe diurnal and seasonal variations of Crop Water Stress Index (CWSI). Canopy temperature, psychrometric and wind speed data were taken each half-hour, between 9:30 and 15:30 h. Results showed that canopy temperature was higher during the daytime, for both unstressed and stressed plots. A linear regression of canopy-air temperature differential and the vapor pressure deficit (non-water-stress baseline) showed a r2= 0.65. During the stress period, the average canopy temperature of the stressed plot was up to 5.4°C higher than the unstressed plot. Diurnal and seasonal average of CWSI values showed differences between unstressed and stressed plots, during the stress period.