977 resultados para Classification tree
Resumo:
The problem of recognition on finite set of events is considered. The generalization ability of classifiers for this problem is studied within the Bayesian approach. The method for non-uniform prior distribution specification on recognition tasks is suggested. It takes into account the assumed degree of intersection between classes. The results of the analysis are applied for pruning of classification trees.
Resumo:
The vascular and bryophyte floras of subantarctic Heard Island were classified using cluster analysis into six vegetation communities: Open Cushion Carpet, Mossy Feldmark, Wet Mixed Herbfield, Coastal Biotic Vegetation, Saltspray Vegetation, and Closed Cushion Carpet. Multidimensional scaling indicated that the vegetation communities were not well delineated but were continua. Discriminant analysis and a classification tree identified altitude, wind, peat depth, bryophyte cover and extent of bare ground, and particle size as discriminating variables. The combination of small area, glaciation, and harsh climate has resulted in reduced vegetation variety in comparison to those subantarctic islands north of the Antarctic Polar Front Zone. Some of the functional groups and vegetation communities found on warmer subantarctic islands are not present on Heard Island, notably ferns and sedges and fernbrakes and extensive mires, respectively.
Resumo:
Land use/cover classification is one of the most important applications in remote sensing. However, mapping accurate land use/cover spatial distribution is a challenge, particularly in moist tropical regions, due to the complex biophysical environment and limitations of remote sensing data per se. This paper reviews experiments related to land use/cover classification in the Brazilian Amazon for a decade. Through comprehensive analysis of the classification results, it is concluded that spatial information inherent in remote sensing data plays an essential role in improving land use/cover classification. Incorporation of suitable textural images into multispectral bands and use of segmentation‑based method are valuable ways to improve land use/cover classification, especially for high spatial resolution images. Data fusion of multi‑resolution images within optical sensor data is vital for visual interpretation, but may not improve classification performance. In contrast, integration of optical and radar data did improve classification performance when the proper data fusion method was used. Among the classification algorithms available, the maximum likelihood classifier is still an important method for providing reasonably good accuracy, but nonparametric algorithms, such as classification tree analysis, have the potential to provide better results. However, they often require more time to achieve parametric optimization. Proper use of hierarchical‑based methods is fundamental for developing accurate land use/cover classification, mainly from historical remotely sensed data.
Resumo:
Les milieux humides remplissent plusieurs fonctions écologiques d’importance et contribuent à la biodiversité de la faune et de la flore. Même s’il existe une reconnaissance croissante sur l’importante de protéger ces milieux, il n’en demeure pas moins que leur intégrité est encore menacée par la pression des activités humaines. L’inventaire et le suivi systématique des milieux humides constituent une nécessité et la télédétection est le seul moyen réaliste d’atteindre ce but. L’objectif de cette thèse consiste à contribuer et à améliorer la caractérisation des milieux humides en utilisant des données satellites acquises par des radars polarimétriques en bande L (ALOS-PALSAR) et C (RADARSAT-2). Cette thèse se fonde sur deux hypothèses (chap. 1). La première hypothèse stipule que les classes de physionomies végétales, basées sur la structure des végétaux, sont plus appropriées que les classes d’espèces végétales car mieux adaptées au contenu informationnel des images radar polarimétriques. La seconde hypothèse stipule que les algorithmes de décompositions polarimétriques permettent une extraction optimale de l’information polarimétrique comparativement à une approche multipolarisée basée sur les canaux de polarisation HH, HV et VV (chap. 3). En particulier, l’apport de la décomposition incohérente de Touzi pour l’inventaire et le suivi de milieux humides est examiné en détail. Cette décomposition permet de caractériser le type de diffusion, la phase, l’orientation, la symétrie, le degré de polarisation et la puissance rétrodiffusée d’une cible à l’aide d’une série de paramètres extraits d’une analyse des vecteurs et des valeurs propres de la matrice de cohérence. La région du lac Saint-Pierre a été sélectionnée comme site d’étude étant donné la grande diversité de ses milieux humides qui y couvrent plus de 20 000 ha. L’un des défis posés par cette thèse consiste au fait qu’il n’existe pas de système standard énumérant l’ensemble possible des classes physionomiques ni d’indications précises quant à leurs caractéristiques et dimensions. Une grande attention a donc été portée à la création de ces classes par recoupement de sources de données diverses et plus de 50 espèces végétales ont été regroupées en 9 classes physionomiques (chap. 7, 8 et 9). Plusieurs analyses sont proposées pour valider les hypothèses de cette thèse (chap. 9). Des analyses de sensibilité par diffusiogramme sont utilisées pour étudier les caractéristiques et la dispersion des physionomies végétales dans différents espaces constitués de paramètres polarimétriques ou canaux de polarisation (chap. 10 et 12). Des séries temporelles d’images RADARSAT-2 sont utilisées pour approfondir la compréhension de l’évolution saisonnière des physionomies végétales (chap. 12). L’algorithme de la divergence transformée est utilisé pour quantifier la séparabilité entre les classes physionomiques et pour identifier le ou les paramètres ayant le plus contribué(s) à leur séparabilité (chap. 11 et 13). Des classifications sont aussi proposées et les résultats comparés à une carte existante des milieux humide du lac Saint-Pierre (14). Finalement, une analyse du potentiel des paramètres polarimétrique en bande C et L est proposé pour le suivi de l’hydrologie des tourbières (chap. 15 et 16). Les analyses de sensibilité montrent que les paramètres de la 1re composante, relatifs à la portion dominante (polarisée) du signal, sont suffisants pour une caractérisation générale des physionomies végétales. Les paramètres des 2e et 3e composantes sont cependant nécessaires pour obtenir de meilleures séparabilités entre les classes (chap. 11 et 13) et une meilleure discrimination entre milieux humides et milieux secs (chap. 14). Cette thèse montre qu’il est préférable de considérer individuellement les paramètres des 1re, 2e et 3e composantes plutôt que leur somme pondérée par leurs valeurs propres respectives (chap. 10 et 12). Cette thèse examine également la complémentarité entre les paramètres de structure et ceux relatifs à la puissance rétrodiffusée, souvent ignorée et normalisée par la plupart des décompositions polarimétriques. La dimension temporelle (saisonnière) est essentielle pour la caractérisation et la classification des physionomies végétales (chap. 12, 13 et 14). Des images acquises au printemps (avril et mai) sont nécessaires pour discriminer les milieux secs des milieux humides alors que des images acquises en été (juillet et août) sont nécessaires pour raffiner la classification des physionomies végétales. Un arbre hiérarchique de classification développé dans cette thèse constitue une synthèse des connaissances acquises (chap. 14). À l’aide d’un nombre relativement réduit de paramètres polarimétriques et de règles de décisions simples, il est possible d’identifier, entre autres, trois classes de bas marais et de discriminer avec succès les hauts marais herbacés des autres classes physionomiques sans avoir recours à des sources de données auxiliaires. Les résultats obtenus sont comparables à ceux provenant d’une classification supervisée utilisant deux images Landsat-5 avec une exactitude globale de 77.3% et 79.0% respectivement. Diverses classifications utilisant la machine à vecteurs de support (SVM) permettent de reproduire les résultats obtenus avec l’arbre hiérarchique de classification. L’exploitation d’une plus forte dimensionalitée par le SVM, avec une précision globale maximale de 79.1%, ne permet cependant pas d’obtenir des résultats significativement meilleurs. Finalement, la phase de la décomposition de Touzi apparaît être le seul paramètre (en bande L) sensible aux variations du niveau d’eau sous la surface des tourbières ouvertes (chap. 16). Ce paramètre offre donc un grand potentiel pour le suivi de l’hydrologie des tourbières comparativement à la différence de phase entre les canaux HH et VV. Cette thèse démontre que les paramètres de la décomposition de Touzi permettent une meilleure caractérisation, de meilleures séparabilités et de meilleures classifications des physionomies végétales des milieux humides que les canaux de polarisation HH, HV et VV. Le regroupement des espèces végétales en classes physionomiques est un concept valable. Mais certaines espèces végétales partageant une physionomie similaire, mais occupant un milieu différent (haut vs bas marais), ont cependant présenté des différences significatives quant aux propriétés de leur rétrodiffusion.
Resumo:
The usefulness of motor subtypes of delirium is unclear due to inconsistency in subtyping methods and a lack of validation with objective measures of activity. The activity of 40 patients was measured over 24 h with a discrete accelerometer-based activity monitor. The continuous wavelet transform (CWT) with various mother wavelets were applied to accelerometry data from three randomly selected patients with DSM-IV delirium that were readily divided into hyperactive, hypoactive, and mixed motor subtypes. A classification tree used the periods of overall movement as measured by the discrete accelerometer-based monitor as determining factors for which to classify these delirious patients. This data used to create the classification tree were based upon the minimum, maximum, standard deviation, and number of coefficient values, generated over a range of scales by the CWT. The classification tree was subsequently used to define the remaining motoric subtypes. The use of a classification system shows how delirium subtypes can be categorized in relation to overall motoric behavior. The classification system was also implemented to successfully define other patient motoric subtypes. Motor subtypes of delirium defined by observed ward behavior differ in electronically measured activity levels.
Resumo:
Credit scoring modelling comprises one of the leading formal tools for supporting the granting of credit. Its core objective consists of the generation of a score by means of which potential clients can be listed in the order of the probability of default. A critical factor is whether a credit scoring model is accurate enough in order to provide correct classification of the client as a good or bad payer. In this context the concept of bootstraping aggregating (bagging) arises. The basic idea is to generate multiple classifiers by obtaining the predicted values from the fitted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. In this paper we propose a new bagging-type variant procedure, which we call poly-bagging, consisting of combining predictors over a succession of resamplings. The study is derived by credit scoring modelling. The proposed poly-bagging procedure was applied to some different artificial datasets and to a real granting of credit dataset up to three successions of resamplings. We observed better classification accuracy for the two-bagged and the three-bagged models for all considered setups. These results lead to a strong indication that the poly-bagging approach may promote improvement on the modelling performance measures, while keeping a flexible and straightforward bagging-type structure easy to implement. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).
Resumo:
Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.
Resumo:
PURPOSE: To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). METHODS: Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. RESULTS: Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). CONCLUSION: Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.
Frequency of Cardiovascular Involvement in Familial Amyloidotic Polyneuropathy in Brazilian Patients
Resumo:
Background:Familial amyloidotic polyneuropathy (FAP) is a rare disease diagnosed in Brazil and worldwide. The frequency of cardiovascular involvement in Brazilian FAP patients is unknown.Objective:Detect the frequency of cardiovascular involvement and correlate the cardiovascular findings with the modified polyneuropathy disability (PND) score.Methods:In a national reference center, 51 patients were evaluated with clinical examination, electrocardiography (ECG), echocardiography (ECHO), and 24-hour Holter. Patients were classified according to the modified PND score and divided into groups: PND 0, PND I, PND II, and PND > II (which included PND IIIa, IIIb, and IV). We chose the classification tree as the statistical method to analyze the association between findings in cardiac tests with the neurological classification (PND).Results:ECG abnormalities were present in almost 2/3 of the FAP patients, whereas ECHO abnormalities occurred in around 1/3 of them. All patients with abnormal ECHO also had abnormal ECG, but the opposite did not apply. The classification tree identified ECG and ECHO as relevant variables (p < 0.001 and p = 0.08, respectively). The probability of a patient to be allocated to the PND 0 group when having a normal ECG was over 80%. When both ECG and ECHO were abnormal, this probability was null.Conclusions:Brazilian patients with FAP have frequent ECG abnormalities. ECG is an appropriate test to discriminate asymptomatic carriers of the mutation from those who develop the disease, whereas ECHO contributes to this discrimination.
Resumo:
BACKGROUND: Adequate pain assessment is critical for evaluating the efficacy of analgesic treatment in clinical practice and during the development of new therapies. Yet the currently used scores of global pain intensity fail to reflect the diversity of pain manifestations and the complexity of underlying biological mechanisms. We have developed a tool for a standardized assessment of pain-related symptoms and signs that differentiates pain phenotypes independent of etiology. METHODS AND FINDINGS: Using a structured interview (16 questions) and a standardized bedside examination (23 tests), we prospectively assessed symptoms and signs in 130 patients with peripheral neuropathic pain caused by diabetic polyneuropathy, postherpetic neuralgia, or radicular low back pain (LBP), and in 57 patients with non-neuropathic (axial) LBP. A hierarchical cluster analysis revealed distinct association patterns of symptoms and signs (pain subtypes) that characterized six subgroups of patients with neuropathic pain and two subgroups of patients with non-neuropathic pain. Using a classification tree analysis, we identified the most discriminatory assessment items for the identification of pain subtypes. We combined these six interview questions and ten physical tests in a pain assessment tool that we named Standardized Evaluation of Pain (StEP). We validated StEP for the distinction between radicular and axial LBP in an independent group of 137 patients. StEP identified patients with radicular pain with high sensitivity (92%; 95% confidence interval [CI] 83%-97%) and specificity (97%; 95% CI 89%-100%). The diagnostic accuracy of StEP exceeded that of a dedicated screening tool for neuropathic pain and spinal magnetic resonance imaging. In addition, we were able to reproduce subtypes of radicular and axial LBP, underscoring the utility of StEP for discerning distinct constellations of symptoms and signs. CONCLUSIONS: We present a novel method of identifying pain subtypes that we believe reflect underlying pain mechanisms. We demonstrate that this new approach to pain assessment helps separate radicular from axial back pain. Beyond diagnostic utility, a standardized differentiation of pain subtypes that is independent of disease etiology may offer a unique opportunity to improve targeted analgesic treatment.
Resumo:
BACKGROUND: A simple prognostic model could help identify patients with pulmonary embolism who are at low risk of death and are candidates for outpatient treatment. METHODS: We randomly allocated 15,531 retrospectively identified inpatients who had a discharge diagnosis of pulmonary embolism from 186 Pennsylvania hospitals to derivation (67%) and internal validation (33%) samples. We derived our rule to predict 30-day mortality using classification tree analysis and patient data routinely available at initial examination as potential predictor variables. We used data from a European prospective study to externally validate the rule among 221 inpatients with pulmonary embolism. We determined mortality and nonfatal adverse medical outcomes across derivation and validation samples. RESULTS: Our final model consisted of 10 patient factors (age > or = 70 years; history of cancer, heart failure, chronic lung disease, chronic renal disease, and cerebrovascular disease; and clinical variables of pulse rate > or = 110 beats/min, systolic blood pressure < 100 mm Hg, altered mental status, and arterial oxygen saturation < 90%). Patients with none of these factors were defined as low risk. The 30-day mortality rates for low-risk patients were 0.6%, 1.5%, and 0% in the derivation, internal validation, and external validation samples, respectively. The rates of nonfatal adverse medical outcomes were less than 1% among low-risk patients across all study samples. CONCLUSIONS: This simple prediction rule accurately identifies patients with pulmonary embolism who are at low risk of short-term mortality and other adverse medical outcomes. Prospective validation of this rule is important before its implementation as a decision aid for outpatient treatment.
Resumo:
Background: Age is frequently discussed as negative host factor to achieve a sustained virological response (SVR) to antiviral hepatitis C therapy. However, elderly patients often show relevant fibrosis or cirrhosis which is a known negative predictive factor, making it difficult to interpret age as an independent predictive factor. Methods: From the framework of the Swiss hepatitis C cohort (SCCS), we collected data from 545 antiviral hepatitis C therapies, including data from 67 hepatitis C patients ≥ 60 y who had been treated with PEG-interferon and ribavirin. We analyzed host factors (age, gender, fibrosis, haemoglobin, depression, earlier hepatitis C treatment), viral factors (genotype, viral load) and treatment course (early virological response, end of treatment response, SVR). Generalised estimating equations (GEE) regression modelling was used for the primary end point (SVR), with age ≥ 60 y and < 60 y as independent variable and gender, presence of cirrhosis, genotype, earlier treatment and viral load as confounders. SVR was analysed in young and elderly patients after matching for these confounders. Additionally, classification tree analysis was done in elderly patients using these confounders. Results: SVR analyzed in 545 patients was 55%. In genotype 1/4, SVR was 42.9% in 259 patients < 60 y and 26.1% in 46 patients ≥ 60 y. In genotype 2/3, SVR was 74.4% in 215 patients < 60 y and 84% in 25 patients ≥ 60 y. However, GEE model showed that age had no influence on achieving SVR (Odds ratio 0.91). Confounders influenced SVR as known from previous studies (cirrhosis, genotype 1/4, previous treatment and viral load >600'000 IE/ml as negative predictive factors). When young and elderly patients were matched (analysis in 59 elderly patients), SVR was not different in these patient groups (54.2% and 55.9%, resp.; p=0.795 in binomial test). The classification tree-derived best criterion for SVR in elderly patients was genotype, with no further criteria relevant for predicting SVR in genotype 2/3. In patients with genotype 1/4, further criteria were presence of cirrhosis and low viral load <600'000 IE/ml in non-cirrhotic patients. Conclusions: Age is not a relevant predictive factor for achieving SVR, when confounders were taken into account. In terms of effectiveness of antiviral therapy, age does not play a major role and should not be regarded as relevant negative predictive factor. Since life expectancy in Switzerland at age 60 is more than 22 y, hepatitis C therapy is reasonable in elderly patients with known relevant fibrosis or cirrhosis, because interferon-based hepatitis C therapy improves survival and reduces carcinogenesis.
Resumo:
The association between adiposity measures and dyslipidemia has seldom been assessed in a multipopulational setting. 27 populations from Europe, Australia, New Zealand and Canada (WHO MONICA project) using health surveys conducted between 1990 and 1997 in adults aged 35-64 years (n = 40,480). Dyslipidemia was defined as the total/HDL cholesterol ratio >6 (men) and >5 (women). Overall prevalence of dyslipidemia was 25% in men and 23% in women. Logistic regression showed that dyslipidemia was strongly associated with body mass index (BMI) in men and with waist circumference (WC) in women, after adjusting for region, age and smoking. Among normal-weight men and women (BMI<25 kg/m(2)), an increase in the odds for being dyslipidemic was observed between lowest and highest WC quartiles (OR = 3.6, p < 0.001). Among obese men (BMI ≥ 30), the corresponding increase was smaller (OR = 1.2, p = 0.036). A similar weakening was observed among women. Classification tree analysis was performed to assign subjects into classes of risk for dyslipidemia. BMI thresholds (25.4 and 29.2 kg/m(2)) in men and WC thresholds (81.7 and 92.6 cm) in women came out at first stages. High WC (>84.8 cm) in normal-weight men, menopause in women and regular smoking further defined subgroups at increased risk. standard categories of BMI and WC, or their combinations, do not lead to optimal risk stratification for dyslipidemia in middle-age adults. Sex-specific adaptations are necessary, in particular by taking into account abdominal obesity in normal-weight men, post-menopausal age in women and regular smoking in both sexes.