936 resultados para Multiple Additive Regression Trees (MART)
Resumo:
Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.
Resumo:
Le but de cette thèse est d’expliquer la délinquance prolifique de certains délinquants. Nous avançons la thèse que la délinquance prolifique s’explique par la formation plus fréquente de situations criminogènes. Ces situations réfèrent au moment où un délinquant entre en interaction avec une opportunité criminelle dans un contexte favorable au crime. Plus exactement, il s’agit du moment où le délinquant fait face à cette opportunité, mais où le crime n’a pas encore été commis. La formation de situations criminogènes est facilitée par l’interaction et l’interdépendance de trois éléments : la propension à la délinquance de la personne, son entourage criminalisé et son style de vie. Ainsi, la délinquance prolifique ne pourrait être expliquée adéquatement sans tenir compte de l’interaction entre le risque individuel et le risque contextuel. L’objectif général de la présente thèse est de faire la démonstration de l’importance d’une modélisation interactionnelle entre le risque individuel et le risque contextuel afin d’expliquer la délinquance plus prolifique de certains contrevenants. Pour ce faire, 155 contrevenants placés sous la responsabilité de deux établissements des Services correctionnels du Québec et de quatre centres jeunesse du Québec ont complété un protocole d’évaluation par questionnaires auto-administrés. Dans un premier temps (chapitre trois), nous avons décrit et comparé la nature de la délinquance autorévélée des contrevenants de notre échantillon. Ce premier chapitre de résultats a permis de mettre en valeur le fait que ce bassin de contrevenants est similaire à d’autres échantillons de délinquants en ce qui a trait à la nature de leur délinquance, plus particulièrement, au volume, à la variété et à la gravité de leurs crimes. En effet, la majorité des participants rapportent un volume faible de crimes contre la personne et contre les biens alors qu’un petit groupe se démarque par un lambda très élevé (13,1 % des délinquants de l’échantillon sont responsables de 60,3% de tous les crimes rapportés). Environ quatre délinquants sur cinq rapportent avoir commis au moins un crime contre la personne et un crime contre les biens. De plus, plus de 50% de ces derniers rapportent dans au moins quatre sous-catégories. Finalement, bien que les délinquants de notre échantillon aient un IGC (indice de gravité de la criminalité) moyen relativement faible (médiane = 77), près de 40% des contrevenants rapportent avoir commis au moins un des deux crimes les plus graves recensés dans cette étude (décharger une arme et vol qualifié). Le second objectif spécifique était d’explorer, au chapitre quatre, l’interaction entre les caractéristiques personnelles, l’entourage et le style de vie des délinquants dans la formation de situations criminogènes. Les personnes ayant une propension à la délinquance plus élevée semblent avoir tendance à être davantage entourées de personnes criminalisées et à avoir un style de vie plus oisif. L’entourage criminalisé semble également influencer le style de vie de ces délinquants. Ainsi, l’interdépendance entre ces trois éléments facilite la formation plus fréquente de situations criminogènes et crée une conjoncture propice à l’émergence de la délinquance prolifique. Le dernier objectif spécifique de la thèse, qui a été couvert dans le chapitre cinq, était d’analyser l’impact de la formation de situations criminogènes sur la nature de la délinquance. Les analyses de régression linéaires multiples et les arbres de régression ont permis de souligner la contribution des caractéristiques personnelles, de l’entourage et du style de vie dans l’explication de la nature de la délinquance. D’un côté, les analyses de régression (modèles additifs) suggèrent que l’ensemble des éléments favorisant la formation de situations criminogènes apporte une contribution unique à l’explication de la délinquance. D’un autre côté, les arbres de régression nous ont permis de mieux comprendre l’interaction entre les éléments dans l’explication de la délinquance prolifique. En effet, un positionnement plus faible sur certains éléments peut être compensé par un positionnement plus élevé sur d’autres. De plus, l’accumulation d’éléments favorisant la formation de situations criminogènes ne se fait pas de façon linéaire. Ces conclusions sont appuyées sur des proportions de variance expliquée plus élevées que celles des régressions linéaires multiples. En conclusion, mettre l’accent que sur un seul élément (la personne et sa propension à la délinquance ou le contexte et ses opportunités) ou leur combinaison de façon simplement additive ne permet pas de rendre justice à la complexité de l’émergence de la délinquance prolifique. En mettant à l’épreuve empiriquement cette idée généralement admise, cette thèse permet donc de souligner l’importance de considérer l’interaction entre le risque individuel et le risque contextuel dans l’explication de la délinquance prolifique.
Resumo:
Multiple regression analysis is a statistical technique which allows to predict a dependent variable from m ore than one independent variable and also to determine influential independent variables. Using experimental data, in this study the multiple regression analysis is applied to predict the room mean velocity and determine the most influencing parameters on the velocity. More than 120 experiments for four different heat source locations were carried out in a test chamber with a high level wall mounted air supply terminal at air change rates 3-6 ach. The influence of the environmental parameters such as supply air momentum, room heat load, Archimedes number and local temperature ratio, were examined by two methods: a simple regression analysis incorporated into scatter matrix plots and multiple stepwise regression analysis. It is concluded that, when a heat source is located along the jet centre line, the supply momentum mainly influences the room mean velocity regardless of the plume strength. However, when the heat source is located outside the jet region, the local temperature ratio (the inverse of the local heat removal effectiveness) is a major influencing parameter.
Resumo:
In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.
Resumo:
To test whether plant species influence greenhouse gas production in diverse ecosystems, we measured wet season soil CO(2) and N(2)O fluxes close to similar to 300 large (>35 cm in diameter at breast height (DBH)) trees of 15 species at three clay-rich forest sites in central Amazonia. We found that soil CO(2) fluxes were 38% higher near large trees than at control sites >10 m away from any tree (P < 0.0001). After adjusting for large tree presence, a multiple linear regression of soil temperature, bulk density, and liana DBH explained 19% of remaining CO(2) flux variability. Soil N(2)O fluxes adjacent to Caryocar villosum, Lecythis lurida, Schefflera morototoni, and Manilkara huberi were 84%-196% greater than Erisma uncinatum and Vochysia maxima, both Vochysiaceae. Tree species identity was the most important explanatory factor for N(2)O fluxes, accounting for more than twice the N(2)O flux variability as all other factors combined. Two observations suggest a mechanism for this finding: (1) sugar addition increased N(2)O fluxes near C. villosum twice as much (P < 0.05) as near Vochysiaceae and (2) species mean N(2)O fluxes were strongly negatively correlated with tree growth rate (P = 0.002). These observations imply that through enhanced belowground carbon allocation liana and tree species can stimulate soil CO(2) and N(2)O fluxes (by enhancing denitrification when carbon limits microbial metabolism). Alternatively, low N(2)O fluxes potentially result from strong competition of tree species with microbes for nutrients. Species-specific patterns in CO(2) and N(2)O fluxes demonstrate that plant species can influence soil biogeochemical processes in a diverse tropical forest.
Resumo:
The ecotoxicological response of the living organisms in an aquatic system depends on the physical, chemical and bacteriological variables, as well as the interactions between them. An important challenge to scientists is to understand the interaction and behaviour of factors involved in a multidimensional process such as the ecotoxicological response.With this aim, multiple linear regression (MLR) and principal component regression were applied to the ecotoxicity bioassay response of Chlorella vulgaris and Vibrio fischeri in water collected at seven sites of Leça river during five monitoring campaigns (February, May, June, August and September of 2006). The river water characterization included the analysis of 22 physicochemical and 3 microbiological parameters. The model that best fitted the data was MLR, which shows: (i) a negative correlation with dissolved organic carbon, zinc and manganese, and a positive one with turbidity and arsenic, regarding C. vulgaris toxic response; (ii) a negative correlation with conductivity and turbidity and a positive one with phosphorus, hardness, iron, mercury, arsenic and faecal coliforms, concerning V. fischeri toxic response. This integrated assessment may allow the evaluation of the effect of future pollution abatement measures over the water quality of Leça River.
Resumo:
Few studies have been conducted to verify how the structure of the forest affects the occurence and abundance of neotropical birds. Our research was undertaken between January 2002 and July 2004 at the Reserva Ducke, near Manaus (02º55',03º01'S; 59º53',59º59'W) in central Amazonia, to verify how the forest structure affects the occurrence and abundance of two bird species: the Plain-brown Woodcreeper Dendrocincla fuliginosa and the White-chinned Woodcreeper Dendrocincla merula. Bird species occurrence was recorded using lines of 20 mist-nets (one sample unit), along 51 1-km transects distributed along 9 pararel 8 km trails covering an area of 6400 ha. Along these transects, we placed 50 x 50m plots where we recorded forest structure components (tree abundance, canopy openness, leaf litter, standing dead trees, logs, proximity to streams, and altitude). We then related these variables to bird occurence and abundance using multiple logistic and multiple linear regression models, respectively. We found that D. fuliginosa frequently used plateau areas; being more abundant in areas with more trees. On the other hand, D. merula occurred more frequently and was more abundant in areas with low tree abundance. Our results suggest that although both species overlap in the reserve (both were recorded in at least 68% of the sampled sites), they differ in the way they use the forest microhabitats. Therefore, local variation in the forest structure may contribute to the coexistence of congeneric species and may help to maintain local alpha diversity.
Resumo:
BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60-80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.
Resumo:
Background: Prolificacy is the most important trait influencing the reproductive efficiency of pig production systems. The low heritability and sex-limited expression of prolificacy have hindered to some extent the improvement of this trait through artificial selection. Moreover, the relative contributions of additive, dominant and epistatic QTL to the genetic variance of pig prolificacy remain to be defined. In this work, we have undertaken this issue by performing one-dimensional and bi-dimensional genome scans for number of piglets born alive (NBA) and total number of piglets born (TNB) in a three generation Iberian by Meishan F2 intercross. Results: The one-dimensional genome scan for NBA and TNB revealed the existence of two genome-wide highly significant QTL located on SSC13 (P < 0.001) and SSC17 (P < 0.01) with effects on both traits. This relative paucity of significant results contrasted very strongly with the wide array of highly significant epistatic QTL that emerged in the bi-dimensional genome-wide scan analysis. As much as 18 epistatic QTL were found for NBA (four at P < 0.01 and five at P < 0.05) and TNB (three at P < 0.01 and six at P < 0.05), respectively. These epistatic QTL were distributed in multiple genomic regions, which covered 13 of the 18 pig autosomes, and they had small individual effects that ranged between 3 to 4% of the phenotypic variance. Different patterns of interactions (a × a, a × d, d × a and d × d) were found amongst the epistatic QTL pairs identified in the current work.Conclusions: The complex inheritance of prolificacy traits in pigs has been evidenced by identifying multiple additive (SSC13 and SSC17), dominant and epistatic QTL in an Iberian × Meishan F2 intercross. Our results demonstrate that a significant fraction of the phenotypic variance of swine prolificacy traits can be attributed to first-order gene-by-gene interactions emphasizing that the phenotypic effects of alleles might be strongly modulated by the genetic background where they segregate.
Resumo:
Is it possible to build predictive models (PMs) of soil particle-size distribution (psd) in a region with complex geology and a young and unstable land-surface? The main objective of this study was to answer this question. A set of 339 soil samples from a small slope catchment in Southern Brazil was used to build PMs of psd in the surface soil layer. Multiple linear regression models were constructed using terrain attributes (elevation, slope, catchment area, convergence index, and topographic wetness index). The PMs explained more than half of the data variance. This performance is similar to (or even better than) that of the conventional soil mapping approach. For some size fractions, the PM performance can reach 70 %. Largest uncertainties were observed in geologically more complex areas. Therefore, significant improvements in the predictions can only be achieved if accurate geological data is made available. Meanwhile, PMs built on terrain attributes are efficient in predicting the particle-size distribution (psd) of soils in regions of complex geology.
Resumo:
Logistic regression is included into the analysis techniques which are valid for observationalmethodology. However, its presence at the heart of thismethodology, and more specifically in physical activity and sports studies, is scarce. With a view to highlighting the possibilities this technique offers within the scope of observational methodology applied to physical activity and sports, an application of the logistic regression model is presented. The model is applied in the context of an observational design which aims to determine, from the analysis of use of the playing area, which football discipline (7 a side football, 9 a side football or 11 a side football) is best adapted to the child"s possibilities. A multiple logistic regression model can provide an effective prognosis regarding the probability of a move being successful (reaching the opposing goal area) depending on the sector in which the move commenced and the football discipline which is being played.
Resumo:
OBJECTIVE: To provide information on the effects of alcohol and tobacco on laryngeal cancer and its subsites. METHODS: This was a case-control study conducted between 1992 and 2000 in northern Italy and Switzerland. A total of 527 cases of incident squamous-cell carcinoma of the larynx and 1297 hospital controls frequency-matched with cases on age, sex, and area of residence were included. Odds ratios (ORs) and corresponding 95% confidence intervals (CIs) were estimated using multiple logistic regression. RESULTS: In comparison with never smokers, ORs were 19.8 for current smokers and 7.0 for ex-smokers. The risk increased in relation to the number of cigarettes (OR = 42.9 for > or = 25 cigarettes/day) and for duration of smoking (OR = 37.2 for > or = 40 years). For alcohol, the risk increased in relation to number of drinks (OR = 5.9 for > or = 56 drinks per week). Combined alcohol and tobacco consumption showed a multiplicative (OR = 177) rather than an additive risk. For current smokers and current drinkers the risk was higher for supraglottis (ORs 54.9 and 2.6, respectively) than for glottis (ORs 7.4 and 1.8) and others subsites (ORs 10.9 and 1.9). CONCLUSIONS: Our study shows that both cigarette smoking and alcohol drinking are independent risk factors for laryngeal cancer. Heavy consumption of alcohol and cigarettes determined a multiplicative risk increase, possibly suggesting biological synergy.
Resumo:
The increasing demand of consumer markets for the welfare of birds in poultry house has motivated many scientific researches to monitor and classify the welfare according to the production environment. Given the complexity between the birds and the environment of the aviary, the correct interpretation of the conduct becomes an important way to estimate the welfare of these birds. This study obtained multiple logistic regression models with capacity of estimating the welfare of broiler breeders in relation to the environment of the aviaries and behaviors expressed by the birds. In the experiment, were observed several behaviors expressed by breeders housed in a climatic chamber under controlled temperatures and three different ammonia concentrations from the air monitored daily. From the analysis of the data it was obtained two logistic regression models, of which the first model uses a value of ammonia concentration measured by unit and the second model uses a binary value to classify the ammonia concentration that is assigned by a person through his olfactory perception. The analysis showed that both models classified the broiler breeder's welfare successfully.
Resumo:
Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i
Resumo:
The goal of this paper is to introduce a class of tree-structured models that combines aspects of regression trees and smooth transition regression models. The model is called the Smooth Transition Regression Tree (STR-Tree). The main idea relies on specifying a multiple-regime parametric model through a tree-growing procedure with smooth transitions among different regimes. Decisions about splits are entirely based on a sequence of Lagrange Multiplier (LM) tests of hypotheses.