962 resultados para Classification Tree Pruning


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of recognition on finite set of events is considered. The generalization ability of classifiers for this problem is studied within the Bayesian approach. The method for non-uniform prior distribution specification on recognition tasks is suggested. It takes into account the assumed degree of intersection between classes. The results of the analysis are applied for pruning of classification trees.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Strategies for cancer reduction and management are targeted at both individual and area levels. Area-level strategies require careful understanding of geographic differences in cancer incidence, in particular the association with factors such as socioeconomic status, ethnicity and accessibility. This study aimed to identify the complex interplay of area-level factors associated with high area-specific incidence of Australian priority cancers using a classification and regression tree (CART) approach. Methods: Area-specific smoothed standardised incidence ratios were estimated for priority-area cancers across 478 statistical local areas in Queensland, Australia (1998-2007, n=186,075). For those cancers with significant spatial variation, CART models were used to identify whether area-level accessibility, socioeconomic status and ethnicity were associated with high area-specific incidence. Results: The accessibility of a person’s residence had the most consistent association with the risk of cancer diagnosis across the specific cancers. Many cancers were likely to have high incidence in more urban areas, although male lung cancer and cervical cancer tended to have high incidence in more remote areas. The impact of socioeconomic status and ethnicity on these associations differed by type of cancer. Conclusions: These results highlight the complex interactions between accessibility, socioeconomic status and ethnicity in determining cancer incidence risk.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Agglomerative cluster analyses encompass many techniques, which have been widely used in various fields of science. In biology, and specifically ecology, datasets are generally highly variable and may contain outliers, which increase the difficulty to identify the number of clusters. Here we present a new criterion to determine statistically the optimal level of partition in a classification tree. The criterion robustness is tested against perturbated data (outliers) using an observation or variable with values randomly generated. The technique, called Random Simulation Test (RST), is tested on (1) the well-known Iris dataset [Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenic. 7, 179–188], (2) simulated data with predetermined numbers of clusters following Milligan and Cooper [Milligan, G.W., Cooper, M.C., 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179] and finally (3) is applied on real copepod communities data previously analyzed in Beaugrand et al. [Beaugrand, G., Ibanez, F., Lindley, J.A., Reid, P.C., 2002. Diversity of calanoid copepods in the North Atlantic and adjacent seas: species associations and biogeography. Mar. Ecol. Prog. Ser. 232, 179–195]. The technique is compared to several standard techniques. RST performed generally better than existing algorithms on simulated data and proved to be especially efficient with highly variable datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this study was to evaluate the physical and mechanical properties of particleboard made with pruning wastes from Ipê (Tabebuia serratifolia) and Chapéu-de-Sol (Terminalia catappa) trees. Particleboards were prepared with both wood species, using all the material produced by grinding the pruning wastes. The particleboards had dimensions of 45×45 cm, a thickness of approximately 11.5 mm and an average density of 664 kg/m3. A urea-formaldehyde adhesive was used in the proportion of 12% of the dry particle mass. The particleboards were pressed at a temperature of 130 C for 10 mins. The physical and mechanical properties analyzed were density, moisture content, thickness swelling, percentage of lignin and cellulose, modulus of resilience, modulus of elasticity and tensile strength parallel to the grain, accordingly to the standards NBR 14810 and CS 236-66 (1968). The particleboards were considered to be of medium density. The particle size significantly affected the static bending strength and tensile strength parallel to the grain. Ipê presented better results, demonstrating a potential for the production and use of particleboard made from this species. © The Author(s) 2013.

Relevância:

90.00% 90.00%

Publicador:

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Discrete Conditional Phase-type (DC-Ph) models consist of a process component (survival distribution) preceded by a set of related conditional discrete variables. This paper introduces a DC-Ph model where the conditional component is a classification tree. The approach is utilised for modelling health service capacities by better predicting service times, as captured by Coxian Phase-type distributions, interfaced with results from a classification tree algorithm. To illustrate the approach, a case-study within the healthcare delivery domain is given, namely that of maternity services. The classification analysis is shown to give good predictors for complications during childbirth. Based on the classification tree predictions, the duration of childbirth on the labour ward is then modelled as either a two or three-phase Coxian distribution. The resulting DC-Ph model is used to calculate the number of patients and associated bed occupancies, patient turnover, and to model the consequences of changes to risk status.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Les milieux humides remplissent plusieurs fonctions écologiques d’importance et contribuent à la biodiversité de la faune et de la flore. Même s’il existe une reconnaissance croissante sur l’importante de protéger ces milieux, il n’en demeure pas moins que leur intégrité est encore menacée par la pression des activités humaines. L’inventaire et le suivi systématique des milieux humides constituent une nécessité et la télédétection est le seul moyen réaliste d’atteindre ce but. L’objectif de cette thèse consiste à contribuer et à améliorer la caractérisation des milieux humides en utilisant des données satellites acquises par des radars polarimétriques en bande L (ALOS-PALSAR) et C (RADARSAT-2). Cette thèse se fonde sur deux hypothèses (chap. 1). La première hypothèse stipule que les classes de physionomies végétales, basées sur la structure des végétaux, sont plus appropriées que les classes d’espèces végétales car mieux adaptées au contenu informationnel des images radar polarimétriques. La seconde hypothèse stipule que les algorithmes de décompositions polarimétriques permettent une extraction optimale de l’information polarimétrique comparativement à une approche multipolarisée basée sur les canaux de polarisation HH, HV et VV (chap. 3). En particulier, l’apport de la décomposition incohérente de Touzi pour l’inventaire et le suivi de milieux humides est examiné en détail. Cette décomposition permet de caractériser le type de diffusion, la phase, l’orientation, la symétrie, le degré de polarisation et la puissance rétrodiffusée d’une cible à l’aide d’une série de paramètres extraits d’une analyse des vecteurs et des valeurs propres de la matrice de cohérence. La région du lac Saint-Pierre a été sélectionnée comme site d’étude étant donné la grande diversité de ses milieux humides qui y couvrent plus de 20 000 ha. L’un des défis posés par cette thèse consiste au fait qu’il n’existe pas de système standard énumérant l’ensemble possible des classes physionomiques ni d’indications précises quant à leurs caractéristiques et dimensions. Une grande attention a donc été portée à la création de ces classes par recoupement de sources de données diverses et plus de 50 espèces végétales ont été regroupées en 9 classes physionomiques (chap. 7, 8 et 9). Plusieurs analyses sont proposées pour valider les hypothèses de cette thèse (chap. 9). Des analyses de sensibilité par diffusiogramme sont utilisées pour étudier les caractéristiques et la dispersion des physionomies végétales dans différents espaces constitués de paramètres polarimétriques ou canaux de polarisation (chap. 10 et 12). Des séries temporelles d’images RADARSAT-2 sont utilisées pour approfondir la compréhension de l’évolution saisonnière des physionomies végétales (chap. 12). L’algorithme de la divergence transformée est utilisé pour quantifier la séparabilité entre les classes physionomiques et pour identifier le ou les paramètres ayant le plus contribué(s) à leur séparabilité (chap. 11 et 13). Des classifications sont aussi proposées et les résultats comparés à une carte existante des milieux humide du lac Saint-Pierre (14). Finalement, une analyse du potentiel des paramètres polarimétrique en bande C et L est proposé pour le suivi de l’hydrologie des tourbières (chap. 15 et 16). Les analyses de sensibilité montrent que les paramètres de la 1re composante, relatifs à la portion dominante (polarisée) du signal, sont suffisants pour une caractérisation générale des physionomies végétales. Les paramètres des 2e et 3e composantes sont cependant nécessaires pour obtenir de meilleures séparabilités entre les classes (chap. 11 et 13) et une meilleure discrimination entre milieux humides et milieux secs (chap. 14). Cette thèse montre qu’il est préférable de considérer individuellement les paramètres des 1re, 2e et 3e composantes plutôt que leur somme pondérée par leurs valeurs propres respectives (chap. 10 et 12). Cette thèse examine également la complémentarité entre les paramètres de structure et ceux relatifs à la puissance rétrodiffusée, souvent ignorée et normalisée par la plupart des décompositions polarimétriques. La dimension temporelle (saisonnière) est essentielle pour la caractérisation et la classification des physionomies végétales (chap. 12, 13 et 14). Des images acquises au printemps (avril et mai) sont nécessaires pour discriminer les milieux secs des milieux humides alors que des images acquises en été (juillet et août) sont nécessaires pour raffiner la classification des physionomies végétales. Un arbre hiérarchique de classification développé dans cette thèse constitue une synthèse des connaissances acquises (chap. 14). À l’aide d’un nombre relativement réduit de paramètres polarimétriques et de règles de décisions simples, il est possible d’identifier, entre autres, trois classes de bas marais et de discriminer avec succès les hauts marais herbacés des autres classes physionomiques sans avoir recours à des sources de données auxiliaires. Les résultats obtenus sont comparables à ceux provenant d’une classification supervisée utilisant deux images Landsat-5 avec une exactitude globale de 77.3% et 79.0% respectivement. Diverses classifications utilisant la machine à vecteurs de support (SVM) permettent de reproduire les résultats obtenus avec l’arbre hiérarchique de classification. L’exploitation d’une plus forte dimensionalitée par le SVM, avec une précision globale maximale de 79.1%, ne permet cependant pas d’obtenir des résultats significativement meilleurs. Finalement, la phase de la décomposition de Touzi apparaît être le seul paramètre (en bande L) sensible aux variations du niveau d’eau sous la surface des tourbières ouvertes (chap. 16). Ce paramètre offre donc un grand potentiel pour le suivi de l’hydrologie des tourbières comparativement à la différence de phase entre les canaux HH et VV. Cette thèse démontre que les paramètres de la décomposition de Touzi permettent une meilleure caractérisation, de meilleures séparabilités et de meilleures classifications des physionomies végétales des milieux humides que les canaux de polarisation HH, HV et VV. Le regroupement des espèces végétales en classes physionomiques est un concept valable. Mais certaines espèces végétales partageant une physionomie similaire, mais occupant un milieu différent (haut vs bas marais), ont cependant présenté des différences significatives quant aux propriétés de leur rétrodiffusion.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The usefulness of motor subtypes of delirium is unclear due to inconsistency in subtyping methods and a lack of validation with objective measures of activity. The activity of 40 patients was measured over 24 h with a discrete accelerometer-based activity monitor. The continuous wavelet transform (CWT) with various mother wavelets were applied to accelerometry data from three randomly selected patients with DSM-IV delirium that were readily divided into hyperactive, hypoactive, and mixed motor subtypes. A classification tree used the periods of overall movement as measured by the discrete accelerometer-based monitor as determining factors for which to classify these delirious patients. This data used to create the classification tree were based upon the minimum, maximum, standard deviation, and number of coefficient values, generated over a range of scales by the CWT. The classification tree was subsequently used to define the remaining motoric subtypes. The use of a classification system shows how delirium subtypes can be categorized in relation to overall motoric behavior. The classification system was also implemented to successfully define other patient motoric subtypes. Motor subtypes of delirium defined by observed ward behavior differ in electronically measured activity levels.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Credit scoring modelling comprises one of the leading formal tools for supporting the granting of credit. Its core objective consists of the generation of a score by means of which potential clients can be listed in the order of the probability of default. A critical factor is whether a credit scoring model is accurate enough in order to provide correct classification of the client as a good or bad payer. In this context the concept of bootstraping aggregating (bagging) arises. The basic idea is to generate multiple classifiers by obtaining the predicted values from the fitted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. In this paper we propose a new bagging-type variant procedure, which we call poly-bagging, consisting of combining predictors over a succession of resamplings. The study is derived by credit scoring modelling. The proposed poly-bagging procedure was applied to some different artificial datasets and to a real granting of credit dataset up to three successions of resamplings. We observed better classification accuracy for the two-bagged and the three-bagged models for all considered setups. These results lead to a strong indication that the poly-bagging approach may promote improvement on the modelling performance measures, while keeping a flexible and straightforward bagging-type structure easy to implement. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The degree to which pruning helps reestablish balance in agroforestry was assessed in a system established in São Carlos, São Paulo, Brazil, in 2008. Seven native tree species were planted at a density of 600 trees/ha in five strips of three rows each, and annual crops were cultivated in the 17-m crop strips between the tree strips. Competition was established after 35 months, decreasing the aboveground biomass production of corn planted close to the trees. An assessment of black oats in the dry season following tree pruning showed that the proximity of trees caused reductions in plant and panicle density, aboveground biomass production, number of grains per panicle and grain weight. Because pruning was not sufficient to maintain crop yields, tree thinning is recommended in order to minimize competition and restore conditions for adequate crop production.