888 resultados para predictive regression model
Resumo:
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Background and aims: Staphylococcus epidermidis and other coagulase-negative staphylococci (CoNS) are the most common agents of continuous ambulatory peritoneal dialysis (CAPD) peritonitis. Episodes caused by Staphylococcus aureus evolve with a high method failure rate while CoNS peritonitis is generally benign. The purpose of this study was to compare episodes of peritonitis caused by CoNS species and S. aureus to evaluate the microbiological and host factors that affect outcome. Material and methods: Microbiological and clinical data were retrospectively studied from 86 new episodes of peritonitis caused by staphylococci species between January 1996 and December 2000 in a university dialysis center. The influence of microbiological and host factors (age, sex, diabetes, use of vancomycin, exchange system and treatment time on CAPD) was analyzed by logistic regression model. The clinical outcome was classified into two results (resolution and non-resolution). Results: the odds of peritonitis resolution were not influenced by host factors. Oxacillin susceptibility was present in 30 of 35 S. aureus lineages and 22 of 51 CoNS (p = 0.001). There were 32 of 52 (61.5%) episodes caused by oxacillin-susceptible and 20 of 34 (58.8%) by oxacillin-resistant lineages resolved (p = 0.9713). of the 35 cases caused by S. aureus, 17 (48.6%) resolved and among 51 CoNS episodes 40 (78.4%) resolved. Resolution odds were 7.1 times higher for S. epidermidis than S. aureus (p = 0.0278), while other CoNS had 7.6 times higher odds resolution than S. epidermidis cases (p = 0.052). Episodes caused by S. haemolyticus had similar resolution odds to S. epidermidis (p = 0.859). Conclusions: S. aureus etiology is an independent factor associated with peritonitis non-resolution in CAPD, while S. epidermidis and S. haemolyticus have a lower resolution rate than other CoNS. Possibly the aggressive nature of these agents, particularly S. aureus, can be explained by their recognized pathogenic factors, more than antibiotic resistance.
Resumo:
Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included). This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.
Resumo:
A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Purpose: Refractory frontal lobe epilepsy (FLE) remains one of the most challenging surgically remediable epilepsy syndromes. Nevertheless, definition of independent predictors and predictive models of postsurgical seizure outcome remains poorly explored in FLE. Methods: We retrospectively analyzed data from 70 consecutive patients with refractory FLE submitted to surgical treatment at our center from July 1994 to December 2006. Univariate results were submitted to logistic regression models and Cox proportional hazards regression to identify isolated risk factors for poor surgical results and to construct predictive models for surgical outcome in FLE. Results: From 70 patients submitted to surgery, 45 patients (64%) had favorable outcome and 37 (47%) became seizure free. Isolated risk factors for poor surgical outcome are expressed in hazard ratio (H.R.) and were time of epilepsy (H.R.=4.2; 95% C.I.=.1.5-11.7; p=0.006), ictal EEG recruiting rhythm (H.R. = 2.9; 95% C.I. = 1.1-7.7; p=0.033); normal MRI (H.R. = 4.8; 95% C.I. = 1.4-16.6; p = 0.012), and MRI with lesion involving eloquent cortex (H.R. = 3.8; 95% C.I. = 1.2-12.0; p = 0.021). Based on these variables and using a logistic regression model we constructed a model that correctly predicted long-term surgical outcome in up to 80% of patients. Conclusion: Among independent risk factors for postsurgical seizure outcome, epilepsy duration is a potentially modifiable factor that could impact surgical outcome in FLE. Early diagnosis, presence of an MRI lesion not involving eloquent cortex, and ictal EEG without recruited rhythm independently predicted favorable outcome in this series. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.
Resumo:
Abstract Background Patients under haemodialysis are considered at high risk to acquire hepatitis B virus (HBV) infection. Since few data are reported from Brazil, our aim was to assess the frequency and risk factors for HBV infection in haemodialysis patients from 22 Dialysis Centres from Santa Catarina State, south of Brazil. Methods This study includes 813 patients, 149 haemodialysis workers and 772 healthy controls matched by sex and age. Serum samples were assayed for HBV markers and viraemia was detected by nested PCR. HBV was genotyped by partial S gene sequencing. Univariate and multivariate statistical analyses with stepwise logistic regression analysis were carried out to analyse the relationship between HBV infection and the characteristics of patients and their Dialysis Units. Results Frequency of HBV infection was 10.0%, 2.7% and 2.7% among patients, haemodialysis workers and controls, respectively. Amidst patients, the most frequent HBV genotypes were A (30.6%), D (57.1%) and F (12.2%). Univariate analysis showed association between HBV infection and total time in haemodialysis, type of dialysis equipment, hygiene and sterilization of equipment, number of times reusing the dialysis lines and filters, number of patients per care-worker and current HCV infection. The logistic regression model showed that total time in haemodialysis, number of times of reusing the dialysis lines and filters, and number of patients per worker were significantly related to HBV infection. Conclusions Frequency of HBV infection among haemodialysis patients at Santa Catarina state is very high. The most frequent HBV genotypes were A, D and F. The risk for a patient to become HBV positive increase 1.47 times each month of haemodialysis; 1.96 times if the dialysis unit reuses the lines and filters ≥ 10 times compared with haemodialysis units which reuse < 10 times; 3.42 times if the number of patients per worker is more than five. Sequence similarity among the HBV S gene from isolates of different patients pointed out to nosocomial transmission.
Resumo:
The ectoparasitic mite Varroa destructor acting as a virus vector constitutes a central mechanism for losses of managed honey bee, Apis mellifera, colonies. This creates demand for an easy, accurate and cheap diagnostic tool to estimate the impact of viruliferous mites in the field. Here we evaluated whether the clinical signs of the ubiquitous and mite-transmitted deformed wing virus (DWV) can be predictive markers of winter losses. In fall and winter 2007/2008, A.m. carnica workers with apparent wing deformities were counted daily in traps installed on 29 queenright colonies. The data show that colonies which later died had a significantly higher proportion of workers with wing deformities than did those which survived. There was a significant positive correlation between V. destructor infestation levels and the number of workers displaying DWV clinical signs, further supporting the mite's impact on virus infections at the colony level. A logistic regression model suggests that colony size, the number of workers with wing deformities and V. destructor infestation levels constitute predictive markers for winter colony losses in this order of importance and ease of evaluation.
Resumo:
A historical prospective study was designed to assess the man weight status of subjects who participated in a behavioral weight reduction program in 1983 and to determine whether there was an association between the dependent variable weight change and any of 31 independent variables after a 2 year follow-up period. Data was obtained by abstracting the subjects records and from a follow-up questionnaire administered 2 years following program participation. Five hundred nine subjects (386 females and 123 males) of 1460 subjects who participated in the program, completed and returned the questionnaire. Results showed that mean weight was significantly different (p < 0.001) between the measurement at baseline and after a 2 year follow-up period. The mean weight loss of the group was 5.8 pounds, 10.7 pounds for males and 4.2 pounds for females after a 2 year follow-up period. A total of 63.9% of the group, 69.9% of males and 61.9% of females were still below their initial weight after the 2 year follow-up period. Sixteen of the 31 variables assessed utilizing bivariate analyses were found to be significantly (p (LESSTHEQ) 0.05) associated with weight change after a 2 year follow-up period. These variables were then entered into a multivariate linear regression model. A total of 37.9% of the variance of the dependent variable, weight change, was accounted for by all 16 variables. Eight of these variables were found to be significantly (p (LESSTHEQ) 0.05) predictive of weight change in the stepwise multivariate process accounting for 37.1% of the variance. These variables included: Two baseline variables (percent over ideal body weight at enrollment and occupation) and six follow-up variables (feeling in control of eating habits, percent of body weight lost during treatment, frequency of weight measurement, physical activity, eating in response to emotions, and number of pounds of weight gain needed to resume a diet). It was concluded that a greater amount of emphasis should be placed on the six follow-up variables by clinicians involved in the treatment of obesity, and by the subjects themselves to enhance their chances of success at long-term weight loss. ^
Resumo:
Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^
Resumo:
Background: Intravenous (IV) fluid administration is an integral component of clinical care. Errors in administration can cause detrimental patient outcomes and increase healthcare costs, although little is known about medication administration errors associated with continuous IV infusions. Objectives: ( 1) To ascertain the prevalence of medication administration errors for continuous IV infusions and identify the variables that caused them. ( 2) To quantify the probability of errors by fitting a logistic regression model to the data. Methods: A prospective study was conducted on three surgical wards at a teaching hospital in Australia. All study participants received continuous infusions of IV fluids. Parenteral nutrition and non-electrolyte containing intermittent drug infusions ( such as antibiotics) were excluded. Medication administration errors and contributing variables were documented using a direct observational approach. Results: Six hundred and eighty seven observations were made, with 124 (18.0%) having at least one medication administration error. The most common error observed was wrong administration rate. The median deviation from the prescribed rate was 247 ml/h (interquartile range 275 to + 33.8 ml/ h). Errors were more likely to occur if an IV infusion control device was not used and as the duration of the infusion increased. Conclusions: Administration errors involving continuous IV infusions occur frequently. They could be reduced by more common use of IV infusion control devices and regular checking of administration rates.
Resumo:
Aim – To develop and assess the predictive capabilities of a statistical model that relates routinely collected Trauma Injury Severity Score (TRISS) variables to length of hospital stay (LOS) in survivors of traumatic injury. Method – Retrospective cohort study of adults who sustained a serious traumatic injury, and who survived until discharge from Auckland City, Middlemore, Waikato, or North Shore Hospitals between 2002 and 2006. Cubic-root transformed LOS was analysed using two-level mixed-effects regression models. Results – 1498 eligible patients were identified, 1446 (97%) injured from a blunt mechanism and 52 (3%) from a penetrating mechanism. For blunt mechanism trauma, 1096 (76%) were male, average age was 37 years (range: 15-94 years), and LOS and TRISS score information was available for 1362 patients. Spearman’s correlation and the median absolute prediction error between LOS and the original TRISS model was ρ=0.31 and 10.8 days, respectively, and between LOS and the final multivariable two-level mixed-effects regression model was ρ=0.38 and 6.0 days, respectively. Insufficient data were available for the analysis of penetrating mechanism models. Conclusions – Neither the original TRISS model nor the refined model has sufficient ability to accurately or reliably predict LOS. Additional predictor variables for LOS and other indicators for morbidity need to be considered.