938 resultados para predictive model
Resumo:
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
We have considered a Bayesian approach for the nonlinear regression model by replacing the normal distribution on the error term by some skewed distributions, which account for both skewness and heavy tails or skewness alone. The type of data considered in this paper concerns repeated measurements taken in time on a set of individuals. Such multiple observations on the same individual generally produce serially correlated outcomes. Thus, additionally, our model does allow for a correlation between observations made from the same individual. We have illustrated the procedure using a data set to study the growth curves of a clinic measurement of a group of pregnant women from an obstetrics clinic in Santiago, Chile. Parameter estimation and prediction were carried out using appropriate posterior simulation schemes based in Markov Chain Monte Carlo methods. Besides the deviance information criterion (DIC) and the conditional predictive ordinate (CPO), we suggest the use of proper scoring rules based on the posterior predictive distribution for comparing models. For our data set, all these criteria chose the skew-t model as the best model for the errors. These DIC and CPO criteria are also validated, for the model proposed here, through a simulation study. As a conclusion of this study, the DIC criterion is not trustful for this kind of complex model.
Resumo:
Millions of unconscious calculations are made daily by pedestrians walking through the Colby College campus. I used ArcGIS to make a predictive spatial model that chose paths similar to those that are actually used by people on a regular basis. To make a viable model of how most travelers choose their way, I considered both the distance required and the type of traveling surface. I used an iterative process to develop a scheme for weighting travel costs which resulted in accurate least-cost paths to be predicted by ArcMap. The accuracy was confirmed when the calculated routes were compared to satellite photography and were found to overlap well-worn “shortcuts” taken between the paved paths throughout campus.
Resumo:
Climate model projections show that climate change will further increase the risk of flooding in many regions of the world. There is a need for climate adaptation, but building new infrastructure or additional retention basins has its limits, especially in densely populated areas where open spaces are limited. Another solution is the more efficient use of the existing infrastructure. This research investigates a method for real-time flood control by means of existing gated weirs and retention basins. The method was tested for the specific study area of the Demer basin in Belgium but is generally applicable. Today, retention basins along the Demer River are controlled by means of adjustable gated weirs based on fixed logic rules. However, because of the high complexity of the system, only suboptimal results are achieved by these rules. By making use of precipitation forecasts and combined hydrological-hydraulic river models, the state of the river network can be predicted. To fasten the calculation speed, a conceptual river model was used. The conceptual model was combined with a Model Predictive Control (MPC) algorithm and a Genetic Algorithm (GA). The MPC algorithm predicts the state of the river network depending on the positions of the adjustable weirs in the basin. The GA generates these positions in a semi-random way. Cost functions, based on water levels, were introduced to evaluate the efficiency of each generation, based on flood damage minimization. In the final phase of this research the influence of the most important MPC and GA parameters was investigated by means of a sensitivity study. The results show that the MPC-GA algorithm manages to reduce the total flood volume during the historical event of September 1998 by 46% in comparison with the current regulation. Based on the MPC-GA results, some recommendations could be formulated to improve the logic rules.
Resumo:
Climate change has resulted in substantial variations in annual extreme rainfall quantiles in different durations and return periods. Predicting the future changes in extreme rainfall quantiles is essential for various water resources design, assessment, and decision making purposes. Current Predictions of future rainfall extremes, however, exhibit large uncertainties. According to extreme value theory, rainfall extremes are rather random variables, with changing distributions around different return periods; therefore there are uncertainties even under current climate conditions. Regarding future condition, our large-scale knowledge is obtained using global climate models, forced with certain emission scenarios. There are widely known deficiencies with climate models, particularly with respect to precipitation projections. There is also recognition of the limitations of emission scenarios in representing the future global change. Apart from these large-scale uncertainties, the downscaling methods also add uncertainty into estimates of future extreme rainfall when they convert the larger-scale projections into local scale. The aim of this research is to address these uncertainties in future projections of extreme rainfall of different durations and return periods. We plugged 3 emission scenarios with 2 global climate models and used LARS-WG, a well-known weather generator, to stochastically downscale daily climate models’ projections for the city of Saskatoon, Canada, by 2100. The downscaled projections were further disaggregated into hourly resolution using our new stochastic and non-parametric rainfall disaggregator. The extreme rainfall quantiles can be consequently identified for different durations (1-hour, 2-hour, 4-hour, 6-hour, 12-hour, 18-hour and 24-hour) and return periods (2-year, 10-year, 25-year, 50-year, 100-year) using Generalized Extreme Value (GEV) distribution. By providing multiple realizations of future rainfall, we attempt to measure the extent of total predictive uncertainty, which is contributed by climate models, emission scenarios, and downscaling/disaggregation procedures. The results show different proportions of these contributors in different durations and return periods.
Resumo:
In the field of operational water management, Model Predictive Control (MPC) has gained popularity owing to its versatility and flexibility. The MPC controller, which takes predictions, time delay and uncertainties into account, can be designed for multi-objective management problems and for large-scale systems. Nonetheless, a critical obstacle, which needs to be overcome in MPC, is the large computational burden when a large-scale system is considered or a long prediction horizon is involved. In order to solve this problem, we use an adaptive prediction accuracy (APA) approach that can reduce the computational burden almost by half. The proposed MPC scheme with this scheme is tested on the northern Dutch water system, which comprises Lake IJssel, Lake Marker, the River IJssel and the North Sea Canal. The simulation results show that by using the MPC-APA scheme, the computational time can be reduced to a large extent and a flood protection problem over longer prediction horizons can be well solved.
Resumo:
This work aims to compare the forecast efficiency of different types of methodologies applied to Brazilian Consumer inflation (IPCA). We will compare forecasting models using disaggregated and aggregated data over twelve months ahead. The disaggregated models were estimated by SARIMA and will have different levels of disaggregation. Aggregated models will be estimated by time series techniques such as SARIMA, state-space structural models and Markov-switching. The forecasting accuracy comparison will be made by the selection model procedure known as Model Confidence Set and by Diebold-Mariano procedure. We were able to find evidence of forecast accuracy gains in models using more disaggregated data
Resumo:
This study aims to contribute on the forecasting literature in stock return for emerging markets. We use Autometrics to select relevant predictors among macroeconomic, microeconomic and technical variables. We develop predictive models for the Brazilian market premium, measured as the excess return over Selic interest rate, Itaú SA, Itaú-Unibanco and Bradesco stock returns. We nd that for the market premium, an ADL with error correction is able to outperform the benchmarks in terms of economic performance. For individual stock returns, there is a trade o between statistical properties and out-of-sample performance of the model.
Resumo:
This study aims to contribute on the forecasting literature in stock return for emerging markets. We use Autometrics to select relevant predictors among macroeconomic, microeconomic and technical variables. We develop predictive models for the Brazilian market premium, measured as the excess return over Selic interest rate, Itaú SA, Itaú-Unibanco and Bradesco stock returns. We find that for the market premium, an ADL with error correction is able to outperform the benchmarks in terms of economic performance. For individual stock returns, there is a trade o between statistical properties and out-of-sample performance of the model.
Resumo:
The search for better performance in the structural systems has been taken to more refined models, involving the analysis of a growing number of details, which should be correctly formulated aiming at defining a representative model of the real system. Representative models demand a great detailing of the project and search for new techniques of evaluation and analysis. Model updating is one of this technologies, it can be used to improve the predictive capabilities of computer-based models. This paper presents a FRF-based finite element model updating procedure whose the updating variables are physical parameters of the model. It includes the damping effects in the updating procedure assuming proportional and none proportional damping mechanism. The updating parameters are defined at an element level or macro regions of the model. So, the parameters are adjusted locally, facilitating the physical interpretation of the adjusting of the model. Different tests for simulated and experimental data are discussed aiming at defining the characteristics and potentialities of the methodology.
Resumo:
Background and aims: Staphylococcus epidermidis and other coagulase-negative staphylococci (CoNS) are the most common agents of continuous ambulatory peritoneal dialysis (CAPD) peritonitis. Episodes caused by Staphylococcus aureus evolve with a high method failure rate while CoNS peritonitis is generally benign. The purpose of this study was to compare episodes of peritonitis caused by CoNS species and S. aureus to evaluate the microbiological and host factors that affect outcome. Material and methods: Microbiological and clinical data were retrospectively studied from 86 new episodes of peritonitis caused by staphylococci species between January 1996 and December 2000 in a university dialysis center. The influence of microbiological and host factors (age, sex, diabetes, use of vancomycin, exchange system and treatment time on CAPD) was analyzed by logistic regression model. The clinical outcome was classified into two results (resolution and non-resolution). Results: the odds of peritonitis resolution were not influenced by host factors. Oxacillin susceptibility was present in 30 of 35 S. aureus lineages and 22 of 51 CoNS (p = 0.001). There were 32 of 52 (61.5%) episodes caused by oxacillin-susceptible and 20 of 34 (58.8%) by oxacillin-resistant lineages resolved (p = 0.9713). of the 35 cases caused by S. aureus, 17 (48.6%) resolved and among 51 CoNS episodes 40 (78.4%) resolved. Resolution odds were 7.1 times higher for S. epidermidis than S. aureus (p = 0.0278), while other CoNS had 7.6 times higher odds resolution than S. epidermidis cases (p = 0.052). Episodes caused by S. haemolyticus had similar resolution odds to S. epidermidis (p = 0.859). Conclusions: S. aureus etiology is an independent factor associated with peritonitis non-resolution in CAPD, while S. epidermidis and S. haemolyticus have a lower resolution rate than other CoNS. Possibly the aggressive nature of these agents, particularly S. aureus, can be explained by their recognized pathogenic factors, more than antibiotic resistance.
Resumo:
Purpose - The purpose of this paper is to present designs for an accelerated life test (ALT). Design/methodology/approach - Bayesian methods and simulation Monte Carlo Markov Chain (MCMC) methods were used. Findings - In the paper a Bayesian method based on MCMC for ALT under EW distribution (for life time) and Arrhenius models (relating the stress variable and parameters) was proposed. The paper can conclude that it is a reasonable alternative to the classical statistical methods since the implementation of the proposed method is simple, not requiring advanced computational understanding and inferences on the parameters can be made easily. By the predictive density of a future observation, a procedure was developed to plan ALT and also to verify if the conformance fraction of the manufactured process reaches some desired level of quality. This procedure is useful for statistical process control in many industrial applications. Research limitations/implications - The results may be applied in a semiconductor manufacturer. Originality/value - The Exponentiated-Weibull-Arrhenius model has never before been used to plan an ALT. © Emerald Group Publishing Limited.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Background: Lynch syndrome (LS) is the most common form of inherited predisposition to colorectal cancer (CRC), accounting for 2-5% of all CRC. LS is an autosomal dominant disease characterized by mutations in the mismatch repair genes mutL homolog 1 (MLH1), mutS homolog 2 (MSH2), postmeiotic segregation increased 1 (PMS1), post-meiotic segregation increased 2 (PMS2) and mutS homolog 6 (MSH6). Mutation risk prediction models can be incorporated into clinical practice, facilitating the decision-making process and identifying individuals for molecular investigation. This is extremely important in countries with limited economic resources. This study aims to evaluate sensitivity and specificity of five predictive models for germline mutations in repair genes in a sample of individuals with suspected Lynch syndrome. Methods: Blood samples from 88 patients were analyzed through sequencing MLH1, MSH2 and MSH6 genes. The probability of detecting a mutation was calculated using the PREMM, Barnetson, MMRpro, Wijnen and Myriad models. To evaluate the sensitivity and specificity of the models, receiver operating characteristic curves were constructed. Results: Of the 88 patients included in this analysis, 31 mutations were identified: 16 were found in the MSH2 gene, 15 in the MLH1 gene and no pathogenic mutations were identified in the MSH6 gene. It was observed that the AUC for the PREMM (0.846), Barnetson (0.850), MMRpro (0.821) and Wijnen (0.807) models did not present significant statistical difference. The Myriad model presented lower AUC (0.704) than the four other models evaluated. Considering thresholds of >= 5%, the models sensitivity varied between 1 (Myriad) and 0.87 (Wijnen) and specificity ranged from 0 (Myriad) to 0.38 (Barnetson). Conclusions: The Barnetson, PREMM, MMRpro and Wijnen models present similar AUC. The AUC of the Myriad model is statistically inferior to the four other models.