44 resultados para REGRESSION TREES
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.
Resumo:
Objective: We used demographic and clinical data to design practical classification models for prediction of neurocognitive impairment (NCI) in people with HIV infection. Methods: The study population comprised 331 HIV-infected patients with available demographic, clinical, and neurocognitive data collected using a comprehensive battery of neuropsychological tests. Classification and regression trees (CART) were developed to btain detailed and reliable models to predict NCI. Following a practical clinical approach, NCI was considered the main variable for study outcomes, and analyses were performed separately in treatment-naïve and treatment-experienced patients. Results: The study sample comprised 52 treatment-naïve and 279 experienced patients. In the first group, the variables identified as better predictors of NCI were CD4 cell count and age (correct classification [CC]: 79.6%, 3 final nodes). In treatment-experienced patients, the variables most closely related to NCI were years of education, nadir CD4 cell count, central nervous system penetration-effectiveness score, age, employment status, and confounding comorbidities (CC: 82.1%, 7 final nodes). In patients with an undetectable viral load and no comorbidities, we obtained a fairly accurate model in which the main variables were nadir CD4 cell count, current CD4 cell count, time on current treatment, and past highest viral load (CC: 88%, 6 final nodes). Conclusion: Practical classification models to predict NCI in HIV infection can be obtained using demographic and clinical variables. An approach based on CART analyses may facilitate screening for HIV-associated neurocognitive disorders and complement clinical information about risk and protective factors for NCI in HIV-infected patients.
Resumo:
Let T be the Cayley graph of a finitely generated free group F. Given two vertices in T consider all the walks of a given length between these vertices that at a certain time must follow a number of predetermined steps. We give formulas for the number of such walks by expressing the problem in terms of equations in F and solving the corresponding equations.
Resumo:
We construct generating trees with with one, two, and three labels for some classes of permutations avoiding generalized patterns of length 3 and 4. These trees are built by adding at each level an entry to the right end of the permutation, which allows us to incorporate the adjacency condition about some entries in an occurrence of a generalized pattern. We use these trees to find functional equations for the generating functions enumerating these classes of permutations with respect to different parameters. In several cases we solve them using the kernel method and some ideas of Bousquet-Mélou [2]. We obtain refinements of known enumerative results and find new ones.
Resumo:
"Vegeu el resum a l’inici del document del fitxer adjunt."
Resumo:
This paper explores the effects of two main sources of innovation -intramural and external R&D- on the productivity level in a sample of 3,267 Catalonian firms. The data set used is based on the official innovation survey of Catalonia which was a part of the Spanish sample of CIS4, covering the years 2002-2004. We compare empirical results by applying usual OLS and quantile regression techniques both in manufacturing and services industries. In quantile regression, results suggest different patterns at both innovation sources as we move across conditional quantiles. The elasticity of intramural R&D activities on productivity decreased when we move up the high productivity levels both in manufacturing and services sectors, while the effects of external R&D rise in high-technology industries but are more ambiguous in low-technology and knowledge-intensive services. JEL codes: O300, C100, O140. Keywords: Innovation sources, R&D, Productivity, Quantile regression
Resumo:
In automobile insurance, it is useful to achieve a priori ratemaking by resorting to gene- ralized linear models, and here the Poisson regression model constitutes the most widely accepted basis. However, insurance companies distinguish between claims with or without bodily injuries, or claims with full or partial liability of the insured driver. This paper exa- mines an a priori ratemaking procedure when including two di®erent types of claim. When assuming independence between claim types, the premium can be obtained by summing the premiums for each type of guarantee and is dependent on the rating factors chosen. If the independence assumption is relaxed, then it is unclear as to how the tari® system might be a®ected. In order to answer this question, bivariate Poisson regression models, suitable for paired count data exhibiting correlation, are introduced. It is shown that the usual independence assumption is unrealistic here. These models are applied to an automobile insurance claims database containing 80,994 contracts belonging to a Spanish insurance company. Finally, the consequences for pure and loaded premiums when the independence assumption is relaxed by using a bivariate Poisson regression model are analysed.
Resumo:
This paper explores the effects of two main sources of innovation —intramural and external R&D— on the productivity level in a sample of 3,267 Catalan firms. The data set used is based on the official innovation survey of Catalonia which was a part of the Spanish sample of CIS4, covering the years 2002-2004. We compare empirical results by applying usual OLS and quantile regression techniques both in manufacturing and services industries. In quantile regression, results suggest different patterns at both innovation sources as we move across conditional quantiles. The elasticity of intramural R&D activities on productivity decreased when we move up the high productivity levels both in manufacturing and services sectors, while the effects of external R&D rise in high-technology industries but are more ambiguous in low-technology and services industries.
Resumo:
Privatization of local public services has been implemented worldwide in the last decades. Why local governments privatize has been the subject of much discussion, and many empirical works have been devoted to analyzing the factors that explain local privatization. Such works have found a great diversity of motivations, and the variation among reported empirical results is large. To investigate this diversity we undertake a meta-regression analysis of the factors explaining the decision to privatize local services. Overall, our results indicate that significant relationships are very dependent upon the characteristics of the studies. Indeed, fiscal stress and political considerations have been found to contribute to local privatization specially in the studies of US cases published in the eighties that consider a broad range of services. Studies that focus on one service capture more accurately the influence of scale economies on privatization. Finally, governments of small towns are more affected by fiscal stress, political considerations and economic efficiency, while ideology seems to play a major role for large cities.
Resumo:
We explore the relationship between polynomial functors and trees. In the first part we characterise trees as certain polynomial functors and obtain a completely formal but at the same time conceptual and explicit construction of two categories of rooted trees, whose main properties we describe in terms of some factorisation systems. The second category is the category Ω of Moerdijk and Weiss. Although the constructions are motivated and explained in terms of polynomial functors, they all amount to elementary manipulations with finite sets. Included in Part 1 is also an explicit construction of the free monad on a polynomial endofunctor, given in terms of trees. In the second part we describe polynomial endofunctors and monads as structures built from trees, characterising the images of several nerve functors from polynomial endofunctors and monads into presheaves on categories of trees. Polynomial endofunctors and monads over a base are characterised by a sheaf condition on categories of decorated trees. In the absolute case, one further condition is needed, a projectivity condition, which serves also to characterise polynomial endofunctors and monads among (coloured) collections and operads.
Resumo:
Lean meat percentage (LMP) is an important carcass quality parameter. The aim of this work is to obtain a calibration equation for the Computed Tomography (CT) scans with the Partial Least Square Regression (PLS) technique in order to predict the LMP of the carcass and the different cuts and to study and compare two different methodologies of the selection of the variables (Variable Importance for Projection — VIP- and Stepwise) to be included in the prediction equation. The error of prediction with cross-validation (RMSEPCV) of the LMP obtained with PLS and selection based on VIP value was 0.82% and for stepwise selection it was 0.83%. The prediction of the LMP scanning only the ham had a RMSEPCV of 0.97% and if the ham and the loin were scanned the RMSEPCV was 0.90%. Results indicate that for CT data both VIP and stepwise selection are good methods. Moreover the scanning of only the ham allowed us to obtain a good prediction of the LMP of the whole carcass.
Resumo:
This paper explores the effects of two main sources of innovation - intramural and external R&D— on the productivity level in a sample of 3,267 Catalonian firms. The data set used is based on the official innovation survey of Catalonia which was a part of the Spanish sample of CIS4, covering the years 2002-2004. We compare empirical results by applying usual OLS and quantile regression techniques both in manufacturing and services industries. In quantile regression, results suggest different patterns at both innovation sources as we move across conditional quantiles. The elasticity of intramural R&D activities on productivity decreased when we move up the high productivity levels both in manufacturing and services sectors, while the effects of external R&D rise in high-technology industries but are more ambiguous in low-technology and knowledge-intensive services. JEL codes: O300, C100, O140 Keywords: Innovation sources, R&D, Productivity, Quantile Regression
Resumo:
When actuaries face with the problem of pricing an insurance contract that contains different types of coverage, such as a motor insurance or homeowner's insurance policy, they usually assume that types of claim are independent. However, this assumption may not be realistic: several studies have shown that there is a positive correlation between types of claim. Here we introduce different regression models in order to relax the independence assumption, including zero-inflated models to account for excess of zeros and overdispersion. These models have been largely ignored to multivariate Poisson date, mainly because of their computational di±culties. Bayesian inference based on MCMC helps to solve this problem (and also lets us derive, for several quantities of interest, posterior summaries to account for uncertainty). Finally, these models are applied to an automobile insurance claims database with three different types of claims. We analyse the consequences for pure and loaded premiums when the independence assumption is relaxed by using different multivariate Poisson regression models and their zero-inflated versions.
Resumo:
"Vegeu el resum a l'inici del document del fitxer adjunt."
Resumo:
In a recent paper Bermúdez [2009] used bivariate Poisson regression models for ratemaking in car insurance, and included zero-inflated models to account for the excess of zeros and the overdispersion in the data set. In the present paper, we revisit this model in order to consider alternatives. We propose a 2-finite mixture of bivariate Poisson regression models to demonstrate that the overdispersion in the data requires more structure if it is to be taken into account, and that a simple zero-inflated bivariate Poisson model does not suffice. At the same time, we show that a finite mixture of bivariate Poisson regression models embraces zero-inflated bivariate Poisson regression models as a special case. Additionally, we describe a model in which the mixing proportions are dependent on covariates when modelling the way in which each individual belongs to a separate cluster. Finally, an EM algorithm is provided in order to ensure the models’ ease-of-fit. These models are applied to the same automobile insurance claims data set as used in Bermúdez [2009] and it is shown that the modelling of the data set can be improved considerably.