946 resultados para REGRESSION TREE
Resumo:
This thesis takes a new data mining approach for analyzing road/crash data by developing models for the whole road network and generating a crash risk profile. Roads with an elevated crash risk due to road surface friction deficit are identified. The regression tree model, predicting road segment crash rate, is applied in a novel deployment coined regression tree extrapolation that produces a skid resistance/crash rate curve. Using extrapolation allows the method to be applied across the network and cope with the high proportion of missing road surface friction values. This risk profiling method can be applied in other domains.
Resumo:
Road surface skid resistance has been shown to have a strong relationship to road crash risk, however, applying the current method of using investigatory levels to identify crash prone roads is problematic as they may fail in identifying risky roads outside of the norm. The proposed method analyses a complex and formerly impenetrable volume of data from roads and crashes using data mining. This method rapidly identifies roads with elevated crash-rate, potentially due to skid resistance deficit, for investigation. A hypothetical skid resistance/crash risk curve is developed for each road segment, driven by the model deployed in a novel regression tree extrapolation method. The method potentially solves the problem of missing skid resistance values which occurs during network-wide crash analysis, and allows risk assessment of the major proportion of roads without skid resistance values.
Resumo:
Dengue virus (DENV) transmission in Australia is driven by weather factors and imported dengue fever (DF) cases. However, uncertainty remains regarding the threshold effects of high-order interactions among weather factors and imported DF cases and the impact of these factors on autochthonous DF. A time-series regression tree model was used to assess the threshold effects of natural temporal variations of weekly weather factors and weekly imported DF cases in relation to incidence of weekly autochthonous DF from 1 January 2000 to 31 December 2009 in Townsville and Cairns, Australia. In Cairns, mean weekly autochthonous DF incidence increased 16.3-fold when the 3-week lagged moving average maximum temperature was <32 °C, the 4-week lagged moving average minimum temperature was ≥24 °C and the sum of imported DF cases in the previous 2 weeks was >0. When the 3-week lagged moving average maximum temperature was ≥32 °C and the other two conditions mentioned above remained the same, mean weekly autochthonous DF incidence only increased 4.6-fold. In Townsville, the mean weekly incidence of autochthonous DF increased 10-fold when 3-week lagged moving average rainfall was ≥27 mm, but it only increased 1.8-fold when rainfall was <27 mm during January to June. Thus, we found different responses of autochthonous DF incidence to weather factors and imported DF cases in Townsville and Cairns. Imported DF cases may also trigger and enhance local outbreaks under favorable climate conditions.
Resumo:
Protocols for bioassessment often relate changes in summary metrics that describe aspects of biotic assemblage structure and function to environmental stress. Biotic assessment using multimetric indices now forms the basis for setting regulatory standards for stream quality and a range of other goals related to water resource management in the USA and elsewhere. Biotic metrics are typically interpreted with reference to the expected natural state to evaluate whether a site is degraded. It is critical that natural variation in biotic metrics along environmental gradients is adequately accounted for, in order to quantify human disturbance-induced change. A common approach used in the IBI is to examine scatter plots of variation in a given metric along a single stream size surrogate and a fit a line (drawn by eye) to form the upper bound, and hence define the maximum likely value of a given metric in a site of a given environmental characteristic (termed the 'maximum species richness line' - MSRL). In this paper we examine whether the use of a single environmental descriptor and the MSRL is appropriate for defining the reference condition for a biotic metric (fish species richness) and for detecting human disturbance gradients in rivers of south-eastern Queensland, Australia. We compare the accuracy and precision of the MSRL approach based on single environmental predictors, with three regression-based prediction methods (Simple Linear Regression, Generalised Linear Modelling and Regression Tree modelling) that use (either singly or in combination) a set of landscape and local scale environmental variables as predictors of species richness. We compared the frequency of classification errors from each method against set biocriteria and contrast the ability of each method to accurately reflect human disturbance gradients at a large set of test sites. The results of this study suggest that the MSRL based upon variation in a single environmental descriptor could not accurately predict species richness at minimally disturbed sites when compared with SLR's based on equivalent environmental variables. Regression-based modelling incorporating multiple environmental variables as predictors more accurately explained natural variation in species richness than did simple models using single environmental predictors. Prediction error arising from the MSRL was substantially higher than for the regression methods and led to an increased frequency of Type I errors (incorrectly classing a site as disturbed). We suggest that problems with the MSRL arise from the inherent scoring procedure used and that it is limited to predicting variation in the dependent variable along a single environmental gradient.
Resumo:
Irregular atrial pressure, defective folate and cholesterol metabolism contribute to the pathogenesis of hypertension. However, little is known about the combined roles of the methylenetetrahydrofolate reductase (MTHFR), apolipoprotein-E (ApoE) and angiotensin-converting enzyme (ACE) genes, which are involved in metabolism and homeostasis. The objective of this study is to investigate the association of the MTHFR 677 C>T and 1298A>C, ACE insertion–deletion (I/D) and ApoE genetic polymorphisms with hypertension and to further explore the epistasis interactions that are involved in these mechanisms. A total of 594 subjects, including 348 normotensive and 246 hypertensive ischemic stroke subjects were recruited. The MTHFR 677 C>T and 1298A>C, ACE I/D and ApoEpolymorphisms were genotyped and the epistasis interaction were analyzed. The MTHFR 677 C>T and ApoE polymorphisms demonstrated significant associations with susceptibility to hypertension in multiple logistic regression models, multifactor dimensionality reduction and a classification and regression tree. In addition, the logistic regression model demonstrated that significant interactions between the ApoE E3E3, E2E4, E2E2 and MTHFR 677 C>T polymorphisms existed. In conclusion, the results of this epistasis study indicated significant association between the ApoE and MTHFR polymorphisms and hypertension.
Resumo:
This study investigated the diarrhoea seasonality and its potential drivers as well as potential opportunities for future diarrhoea control and prevention in China. Data on weekly infectious diarrhoea cases in 31 provinces of China from 2005 to 2012, and data on demographic and geographic characteristics, as well as climatic factors, were complied. A cosinor function combined with a Poisson regression was used to calculate the three seasonal parameters of diarrhoea in different provinces. Regression tree analysis was used to identify the predictors of diarrhoea seasonality. Diarrhoea cases in China showed a bimodal distribution. Diarrhoea in children <5 years was more likely to peak in fall-winter seasons, while diarrhoea in persons > = 5 years peaked in summer. Latitude was significantly associated with spatial pattern of diarrhoea seasonality, with peak and trough times occurring earlier at high latitudes (northern areas), and later at low latitudes (southern areas). The annual amplitudes of diarrhoea in persons > = 5 years increased with latitude (r = 0.62, P<0.001). Latitude 27.8° N and 38.65° N were the latitudinal thresholds for diarrhoea seasonality in China. Regional-specific diarrhoea control and prevention strategies may be optimal for China. More attention should be paid to diarrhoea in children <5 years during fall-winter seasons.
Resumo:
The majority of Australian weeds are exotic plant species that were intentionally introduced for a variety of horticultural and agricultural purposes. A border weed risk assessment system (WRA) was implemented in 1997 in order to reduce the high economic costs and massive environmental damage associated with introducing serious weeds. We review the behaviour of this system with regard to eight years of data collected from the assessment of species proposed for importation or held within genetic resource centres in Australia. From a taxonomic perspective, species from the Chenopodiaceae and Poaceae were most likely to be rejected and those from the Arecaceae and Flacourtiaceae were most likely to be accepted. Dendrogram analysis and classification and regression tree (TREE) models were also used to analyse the data. The latter revealed that a small subset of the 35 variables assessed was highly associated with the outcome of the original assessment. The TREE model examining all of the data contained just five variables: unintentional human dispersal, congeneric weed, weed elsewhere, tolerates or benefits from mutilation, cultivation or fire, and reproduction by vegetative propagation. It gave the same outcome as the full WRA model for 71% of species. Weed elsewhere was not the first splitting variable in this model, indicating that the WRA has a capacity for capturing species that have no history of weediness. A reduced TREE model (in which human-mediated variables had been removed) contained four variables: broad climate suitability, reproduction in less or than equal to 1 year, self-fertilisation, and tolerates and benefits from mutilation, cultivation or fire. It yielded the same outcome as the full WRA model for 65% of species. Data inconsistencies and the relative importance of questions are discussed, with some recommendations made for improving the use of the system.
Resumo:
Background: Consensus development techniques were used in the late 1980s to create explicit criteria for the appropriateness of cataract extraction. We developed a new appropriateness of indications tool for cataract following the RAND method. We tested the validity of our panel results. Methods: Criteria were developed using a modified Delphi panel judgment process. A panel of 12 ophthalmologists was assembled. Ratings were analyzed regarding the level of agreement among panelists. We studied the influence of all variables on the final panel score using linear and logistic regression models. The explicit criteria developed were summarized by classification and regression tree analysis. Results: Of the 765 indications evaluated by the main panel in the second round, 32.9% were found appropriate, 30.1% uncertain, and 37% inappropriate. Agreement was found in 53% of the indications and disagreement in 0.9%. Seven variables were considered to create the indications and divided into three groups: simple cataract, with diabetic retinopathy, or with other ocular pathologies. The preoperative visual acuity in the cataractous eye and visual function were the variables that best explained the panel scoring. The panel results were synthesized and presented in three decision trees. Misclassification error in the decision trees, as compared with the panel original criteria, was 5.3%. Conclusion: The parameters tested showed acceptable validity for an evaluation tool. These results support the use of this indication algorithm as a screening tool for assessing the appropriateness of cataract extraction in field studies and for the development of practice guidelines.
Resumo:
Using data collected simultaneously from a trawl and a hydrophone, we found that temporal and spatial trends in densities of juvenile Atlantic croaker (Micropogonias undulatus) in the Neuse River estuary in North Carolina can be identified by monitoring their sound production. Multivariate analysis of covariance (MA NCOVA) revealed that catch per unit of effort (CPUE) of Atlantic croaker had a significant relationship with the dependent variables of sound level and peak frequency of Atlantic croaker calls. Tests of between-subject correspondence failed to detect relationships between CPUE and either of the call parameters, but statistical power was low. Williamson’s index of spatial overlap indicated that call detection rate (expressed by a 0–3 calling index) was correlated in time and space with Atlantic croaker CPUE. The correspondence between acoustic parameters and trawl catch rates varied by month and by habitat. In general, the calling index had a higher degree of overlap with this species’ density than did the received sound level of their calls. Classification and regression tree analysis identified calling index as the strongest correlate of CPUE. Passive acoustics has the potential to be an inexpensive means of identifying spatial and temporal trends in abundance for soniferous fish species.
Resumo:
Background: Intratumor heterogeneity may be responsible of the unpredictable aggressive clinical behavior that some clear cell renal cell carcinomas display. This clinical uncertainty may be caused by insufficient sampling, leaving out of histological analysis foci of high grade tumor areas. Although molecular approaches are providing important information on renal intratumor heterogeneity, a focus on this topic from the practicing pathologist' perspective is still pending. Methods: Four distant tumor areas of 40 organ-confined clear cell renal cell carcinomas were selected for histopathological and immunohistochemical evaluation. Tumor size, cell type (clear/granular), Fuhrman's grade, Staging, as well as immunostaining with Snail, ZEB1, Twist, Vimentin, E-cadherin, beta-catenin, PTEN, p-Akt, p110 alpha, and SETD2, were analyzed for intratumor heterogeneity using a classification and regression tree algorithm. Results: Cell type and Fuhrman's grade were heterogeneous in 12.5 and 60 % of the tumors, respectively. If cell type was homogeneous (clear cell) then the tumors were low-grade in 88.57 % of cases. Immunostaining heterogeneity was significant in the series and oscillated between 15 % for p110a and 80 % for Snail. When Snail immunostaining was homogeneous the tumor was histologically homogeneous in 100 % of cases. If Snail was heterogeneous, the tumor was heterogeneous in 75 % of the cases. Average tumor diameter was 4.3 cm. Tumors larger than 3.7 cm were heterogeneous for Vimentin immunostaining in 72.5 % of cases. Tumors displaying negative immunostaining for both ZEB1 and Twist were low grade in 100 % of the cases. Conclusions: Intratumor heterogeneity is a common event in clear cell renal cell carcinoma, which can be monitored by immunohistochemistry in routine practice. Snail seems to be particularly useful in the identification of intratumor heterogeneity. The suitability of current sampling protocols in renal cancer is discussed.
Resumo:
研究植被、物种分布与环境的关系一直是生态学中的重点。长期以来,在全球变化与陆地生态系统的研究中,主要研究重点是对大尺度植被分布的模拟和预测,并建立了大量的气候-植被分布关系模型。而对于物种潜在分布的模拟和预测,国内外相关的研究较少。近年来,随着统计技术和地理信息系统的发展,用于预测物种分布的统计模型技术得到了迅速的发展。统计模型技术已被广泛应用于生物地理分布、植物群落、生物多样性、气候变化影响评估等方面。 本论文基于当前在物种分布研究中应用广泛的广义线性模型、广义加法模型及分类回归树3种统计模型技术,对我国常见树种的地理分布进行模拟分析,并比较不同模型模拟精度的优劣,将模拟精度较高的模型应用于预测未来气候情景下我国几种主要树种的未来潜在地理分布。 基于建立的广义线性模型(GLM)、二次项逐步回归广义线性模型(SGLM)、广义加法模型(GAM)和分类回归树(CART)4个模型对我国20种常见树种地理分布进行模拟,结果表明,4个模型均有较高的模拟精度。GAM的模拟精度最高;添加二次项并进行逐步回归有效的提高了GLM的模拟精度;CART是一种基于规则的模型技术,模拟结果比GLM稍好,比GAM略差。 对不同树种的模拟分析表明,4个模型对于主要分布在暖温带落叶阔叶林区域的油松、辽东栎分布的模拟结果较差;GLM对分布在温带针阔混交林中红松、蒙古栎、胡桃楸和糠椴的模拟结果不太理想;4个模型对分布在中国亚热带常绿阔叶林区域的树种均表现出较高的模拟精度;对广布种也表现出很高的模拟精度。 结合地理信息系统,以地图形式将青冈、油松的模拟结果表示出来。结果表明:地理信息系统直观的反映出了模型模拟结果差异。4个模型均能很好模拟青冈的分布,且模拟结果接近;而对油松分布模拟结果4个模型均不甚理想,以GLM最差。这些结果与模型模拟评估结果相吻合。 在未来气候变化情景下,基于4个模型模拟结果优劣,以我国三种主要造林树种马尾松、油松、红松和两种常见树种青冈、蒙古栎为研究对象,分析其未来变化趋势。结果表明,未来气候变化情景下,对于马尾松而言,4个模型均预测马尾松在基本保持原有分布的基础上,其未来潜在分布区域均有所扩大,且有向西和向北扩展的趋势;对于油松而言,基于GLM、SGLM和GAM3个模型,油松的未来潜在分布除有北移的趋势外,其分布区还将向东北和西南两个方向扩展;对于红松而言,基于SGLM、GAM和CART3个模型的预测结果较为接近,即红松的未来潜在分布区域将有所减少;对蒙古栎而言,4个模型预测蒙古栎未来分布均将向西扩展;对青冈而言,4个模型预测青冈能基本保持其原有分布区,并向西和向北扩展,其中CART预测结果还表明,青冈在广东南部及广西南部的分布区域将消失。
Resumo:
Artificial neural network (ANN) methods are used to predict forest characteristics. The data source is the Southeast Alaska (SEAK) Grid Inventory, a ground survey compiled by the USDA Forest Service at several thousand sites. The main objective of this article is to predict characteristics at unsurveyed locations between grid sites. A secondary objective is to evaluate the relative performance of different ANNs. Data from the grid sites are used to train six ANNs: multilayer perceptron, fuzzy ARTMAP, probabilistic, generalized regression, radial basis function, and learning vector quantization. A classification and regression tree method is used for comparison. Topographic variables are used to construct models: latitude and longitude coordinates, elevation, slope, and aspect. The models classify three forest characteristics: crown closure, species land cover, and tree size/structure. Models are constructed using n-fold cross-validation. Predictive accuracy is calculated using a method that accounts for the influence of misclassification as well as measuring correct classifications. The probabilistic and generalized regression networks are found to be the most accurate. The predictions of the ANN models are compared with a classification of the Tongass national forest in southeast Alaska based on the interpretation of satellite imagery and are found to be of similar accuracy.
Resumo:
AIMS: Survival and response rates in metastatic colorectal cancer remain poor, despite advances in drug development. There is increasing evidence to suggest that gender-specific differences may contribute to poor clinical outcome. We tested the hypothesis that genomic profiling of metastatic colorectal cancer is dependent on gender.
MATERIALS & METHODS: A total of 152 patients with metastatic colorectal cancer who were treated with oxaliplatin and continuous infusion 5-fluorouracil were genotyped for 21 polymorphisms in 13 cancer-related genes by PCR. Classification and regression tree analysis tested for gender-related association of polymorphisms with overall survival, progression-free survival and tumor response.
RESULTS: Classification and regression tree analysis of all polymorphisms, age and race resulted in gender-specific predictors of overall survival, progression-free survival and tumor response. Polymorphisms in the following genes were associated with gender-specific clinical outcome: estrogen receptor β, EGF receptor, xeroderma pigmentosum group D, voltage-gated sodium channel and phospholipase A2.
CONCLUSION: Genetic profiling to predict the clinical outcome of patients with metastatic colorectal cancer may depend on gender.
Resumo:
An organism’s home range dictates the spatial scale on which important processes occur (e.g. competition and predation) and directly affects the relationship between individual fitness and local habitat quality. Many reef fish species have very restricted home ranges after settlement and, here, we quantify home-range size in juveniles of a widespread and abundant reef fish in New Zealand, the common triplefin (Forsterygion lapillum). We conducted visual observations on 49 juveniles (mean size = 35-mm total length) within the Wellington harbour, New Zealand. Home ranges were extremely small, 0.053 m2 ± 0.029 (mean ± s.d.) and were unaffected by adult density, body size or substrate composition. A regression tree indicated that home-range size sharply decreased ~4.5 juveniles m–2 and a linear mixed model confirmed that home-range sizes in high-density areas (>4.5 juveniles m–2) were significantly smaller (34%) than those in low-density areas (after accounting for a significant effect of fish movement on our home-range estimates). Our results suggest that conspecific density may have negative and non-linear effects on home-range size, which could shape the spatial distribution of juveniles within a population, as well as influence individual fitness across local density gradients.
Resumo:
Les chutes chez les personnes âgées représentent un problème majeur. Il n’est donc pas étonnant que l’identification des facteurs qui en accroissent le risque ait mobilisé autant d’attention. Les aînés plus fragiles ayant besoin de soutien pour vivre dans la communauté sont néanmoins demeurés le parent pauvre de la recherche, bien que, plus récemment, les autorités québécoises en aient fait une cible d’intervention prioritaire. Les études d’observation prospectives sont particulièrement indiquées pour étudier les facteurs de risque de chutes chez les personnes âgées. Leur identification optimale est cependant compliquée par le fait que l’exposition aux facteurs de risque peut varier au cours du suivi et qu’un même individu peut subir plus d’un événement. Il y a 20 ans, des chercheurs ont tenté de sensibiliser leurs homologues à cet égard, mais leurs efforts sont demeurés vains. On continue aujourd’hui à faire peu de cas de ces considérations, se concentrant sur la proportion des personnes ayant fait une chute ou sur le temps écoulé jusqu’à la première chute. On écarte du coup une quantité importante d’information pertinente. Dans cette thèse, nous examinons les méthodes en usage et nous proposons une extension du modèle de risques de Cox. Nous illustrons cette méthode par une étude des facteurs de risque susceptibles d’être associés à des chutes parmi un groupe de 959 personnes âgées ayant eu recours aux services publics de soutien à domicile. Nous comparons les résultats obtenus avec la méthode de Wei, Lin et Weissfeld à ceux obtenus avec d’autres méthodes, dont la régression logistique conventionnelle, la régression logistique groupée, la régression binomiale négative et la régression d’Andersen et Gill. L’investigation est caractérisée par des prises de mesures répétées des facteurs de risque au domicile des participants et par des relances téléphoniques mensuelles visant à documenter la survenue des chutes. Les facteurs d’exposition étudiés, qu’ils soient fixes ou variables dans le temps, comprennent les caractéristiques sociodémographiques, l’indice de masse corporelle, le risque nutritionnel, la consommation d’alcool, les dangers de l’environnement domiciliaire, la démarche et l’équilibre, et la consommation de médicaments. La quasi-totalité (99,6 %) des usagers présentaient au moins un facteur à haut risque. L’exposition à des risques multiples était répandue, avec une moyenne de 2,7 facteurs à haut risque distincts par participant. Les facteurs statistiquement associés au risque de chutes incluent le sexe masculin, les tranches d’âge inférieures, l’histoire de chutes antérieures, un bas score à l’échelle d’équilibre de Berg, un faible indice de masse corporelle, la consommation de médicaments de type benzodiazépine, le nombre de dangers présents au domicile et le fait de vivre dans une résidence privée pour personnes âgées. Nos résultats révèlent cependant que les méthodes courantes d’analyse des facteurs de risque de chutes – et, dans certains cas, de chutes nécessitant un recours médical – créent des biais appréciables. Les biais pour les mesures d’association considérées proviennent de la manière dont l’exposition et le résultat sont mesurés et définis de même que de la manière dont les méthodes statistiques d’analyse en tiennent compte. Une dernière partie, tout aussi innovante que distincte de par la nature des outils statistiques utilisés, complète l’ouvrage. Nous y identifions des profils d’aînés à risque de devenir des chuteurs récurrents, soit ceux chez qui au moins deux chutes sont survenues dans les six mois suivant leur évaluation initiale. Une analyse par arbre de régression et de classification couplée à une analyse de survie a révélé l’existence de cinq profils distinctifs, dont le risque relatif varie de 0,7 à 5,1. Vivre dans une résidence pour aînés, avoir des antécédents de chutes multiples ou des troubles de l’équilibre et consommer de l’alcool sont les principaux facteurs associés à une probabilité accrue de chuter précocement et de devenir un chuteur récurrent. Qu’il s’agisse d’activité de dépistage des facteurs de risque de chutes ou de la population ciblée, cette thèse s’inscrit dans une perspective de gain de connaissances sur un thème hautement d’actualité en santé publique. Nous encourageons les chercheurs intéressés par l’identification des facteurs de risque de chutes chez les personnes âgées à recourir à la méthode statistique de Wei, Lin et Weissfeld car elle tient compte des expositions variables dans le temps et des événements récurrents. Davantage de recherches seront par ailleurs nécessaires pour déterminer le choix du meilleur test de dépistage pour un facteur de risque donné chez cette clientèle.