963 resultados para Classification Trees


Relevância:

60.00% 60.00%

Publicador:

Resumo:

O trabalho que a seguir se apresenta tem como objectivo descrever a criação de um modelo que sirva de suporte a um sistema de apoio à decisão sobre o risco inerente à execução de projectos na área das Tecnologias de Informação (TI) recorrendo a técnicas de mineração de dados. Durante o ciclo de vida de um projecto, existem inúmeros factores que contribuem para o seu sucesso ou insucesso. A responsabilidade de monitorizar, antever e mitigar esses factores recai sobre o Gestor de Projecto. A gestão de projectos é uma tarefa difícil e dispendiosa, consome muitos recursos, depende de numerosas variáveis e, muitas vezes, até da própria experiência do Gestor de Projecto. Ao ser confrontado com as previsões de duração e de esforço para a execução de uma determinada tarefa, o Gestor de Projecto, exceptuando a sua percepção e intuição pessoal, não tem um modo objectivo de medir a plausibilidade dos valores que lhe são apresentados pelo eventual executor da tarefa. As referidas previsões são fundamentais para a organização, pois sobre elas são tomadas as decisões de planeamento global estratégico corporativo, de execução, de adiamento, de cancelamento, de adjudicação, de renegociação de âmbito, de adjudicação externa, entre outros. Esta propensão para o desvio, quando detectada numa fase inicial, pode ajudar a gerir melhor o risco associado à Gestão de Projectos. O sucesso de cada projecto terminado foi qualificado tendo em conta a ponderação de três factores: o desvio ao orçamentado, o desvio ao planeado e o desvio ao especificado. Analisando os projectos decorridos, e correlacionando alguns dos seus atributos com o seu grau de sucesso o modelo classifica, qualitativamente, um novo projecto quanto ao seu risco. Neste contexto o risco representa o grau de afastamento do projecto ao sucesso. Recorrendo a algoritmos de mineração de dados, tais como, árvores de classificação e redes neuronais, descreve-se o desenvolvimento de um modelo que suporta um sistema de apoio à decisão baseado na classificação de novos projectos. Os modelos são o resultado de um extensivo conjunto de testes de validação onde se procuram e refinam os indicadores que melhor caracterizam os atributos de um projecto e que mais influenciam o risco. Como suporte tecnológico para o desenvolvimento e teste foi utilizada a ferramenta Weka 3. Uma boa utilização do modelo proposto possibilitará a criação de planos de contingência mais detalhados e uma gestão mais próxima para projectos que apresentem uma maior propensão para o risco. Assim, o resultado final pretende constituir mais uma ferramenta à disposição do Gestor de Projecto.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a heuristic method for learning error correcting output codes matrices based on a hierarchical partition of the class space that maximizes a discriminative criterion. To achieve this goal, the optimal codeword separation is sacrificed in favor of a maximum class discrimination in the partitions. The creation of the hierarchical partition set is performed using a binary tree. As a result, a compact matrix with high discrimination power is obtained. Our method is validated using the UCI database and applied to a real problem, the classification of traffic sign images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate seasonal to interannual streamflow forecasts based on climate information are critical for optimal management and operation of water resources systems. Considering most water supply systems are multipurpose, operating these systems to meet increasing demand under the growing stresses of climate variability and climate change, population and economic growth, and environmental concerns could be very challenging. This study was to investigate improvement in water resources systems management through the use of seasonal climate forecasts. Hydrological persistence (streamflow and precipitation) and large-scale recurrent oceanic-atmospheric patterns such as the El Niño/Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), the Atlantic Multidecadal Oscillation (AMO), the Pacific North American (PNA), and customized sea surface temperature (SST) indices were investigated for their potential to improve streamflow forecast accuracy and increase forecast lead-time in a river basin in central Texas. First, an ordinal polytomous logistic regression approach is proposed as a means of incorporating multiple predictor variables into a probabilistic forecast model. Forecast performance is assessed through a cross-validation procedure, using distributions-oriented metrics, and implications for decision making are discussed. Results indicate that, of the predictors evaluated, only hydrologic persistence and Pacific Ocean sea surface temperature patterns associated with ENSO and PDO provide forecasts which are statistically better than climatology. Secondly, a class of data mining techniques, known as tree-structured models, is investigated to address the nonlinear dynamics of climate teleconnections and screen promising probabilistic streamflow forecast models for river-reservoir systems. Results show that the tree-structured models can effectively capture the nonlinear features hidden in the data. Skill scores of probabilistic forecasts generated by both classification trees and logistic regression trees indicate that seasonal inflows throughout the system can be predicted with sufficient accuracy to improve water management, especially in the winter and spring seasons in central Texas. Lastly, a simplified two-stage stochastic economic-optimization model was proposed to investigate improvement in water use efficiency and the potential value of using seasonal forecasts, under the assumption of optimal decision making under uncertainty. Model results demonstrate that incorporating the probabilistic inflow forecasts into the optimization model can provide a significant improvement in seasonal water contract benefits over climatology, with lower average deficits (increased reliability) for a given average contract amount, or improved mean contract benefits for a given level of reliability compared to climatology. The results also illustrate the trade-off between the expected contract amount and reliability, i.e., larger contracts can be signed at greater risk.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background. The factors behind the reemergence of severe, invasive group A streptococcal (GAS) diseases are unclear, but it could be caused by altered genetic endowment in these organisms. However, data from previous studies assessing the association between single genetic factors and invasive disease are often conflicting, suggesting that other, as-yet unidentified factors are necessary for the development of this class of disease. Methods. In this study, we used a targeted GAS virulence microarray containing 226 GAS genes to determine the virulence gene repertoires of 68 GAS isolates (42 associated with invasive disease and 28 associated with noninvasive disease) collected in a defined geographic location during a contiguous time period. We then employed 3 advanced machine learning methods (genetic algorithm neural network, support vector machines, and classification trees) to identify genes with an increased association with invasive disease. Results. Virulence gene profiles of individual GAS isolates varied extensively among these geographically and temporally related strains. Using genetic algorithm neural network analysis, we identified 3 genes with a marginal overrepresentation in invasive disease isolates. Significantly, 2 of these genes, ssa and mf4, encoded superantigens but were only present in a restricted set of GAS M-types. The third gene, spa, was found in variable distributions in all M-types in the study. Conclusions. Our comprehensive analysis of GAS virulence profiles provides strong evidence for the incongruent relationships among any of the 226 genes represented on the array and the overall propensity of GAS to cause invasive disease, underscoring the pathogenic complexity of these diseases, as well as the importance of multiple bacteria and/ or host factors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An expanding human population and associated demands for goods and services continues to exert an increasing pressure on ecological systems. Although the rate of expansion of agricultural lands has slowed since 1960, rapid deforestation still occurs in many tropical countries, including Colombia. However, the location and extent of deforestation and associated ecological impacts within tropical countries is often not well known. The primary aim of this study was to obtain an understanding of the spatial patterns of forest conversion for agricultural land uses in Colombia. We modeled native forest conversion in Colombia at regional and national-levels using logistic regression and classification trees. We investigated the impact of ignoring the regional variability of model parameters, and identified biophysical and socioeconomic factors that best explain the current spatial pattern and inter-regional variation in forest cover. We validated our predictions for the Amazon region using MODIS satellite imagery. The regional-level classification tree that accounted for regional heterogeneity had the greatest discrimination ability. Factors related to accessibility (distance to roads and towns) were related to the presence of forest cover, although this relationship varied regionally. In order to identify areas with a high risk of deforestation, we used predictions from the best model, refined by areas with rural population growth rates of > 2%. We ranked forest ecosystem types in terms of levels of threat of conversion. Our results provide useful inputs to planning for biodiversity conservation in Colombia, by identifying areas and ecosystem types that are vulnerable to deforestation. Several of the predicted deforestation hotspots coincide with areas that are outstanding in terms of biodiversity value.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of quantitative methods has become increasingly important in the study of neuropathology and especially in neurodegenerative disease. Disorders such as Alzheimer's disease (AD) and the frontotemporal dementias (FTD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This chapter reviews the advantages and limitations of the different methods of quantifying pathological lesions in histological sections including estimates of density, frequency, coverage, and the use of semi-quantitative scores. The sampling strategies by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are described. In addition, data analysis methods commonly used to analysis quantitative data in neuropathology, including analysis of variance (ANOVA), polynomial curve fitting, multiple regression, classification trees, and principal components analysis (PCA), are discussed. These methods are illustrated with reference to quantitative studies of a variety of neurodegenerative disorders.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Effective conservation and management of top predators requires a comprehensive understanding of their distributions and of the underlying biological and physical processes that affect these distributions. The Mid-Atlantic Bight shelf break system is a dynamic and productive region where at least 32 species of cetaceans have been recorded through various systematic and opportunistic marine mammal surveys from the 1970s through 2012. My dissertation characterizes the spatial distribution and habitat of cetaceans in the Mid-Atlantic Bight shelf break system by utilizing marine mammal line-transect survey data, synoptic multi-frequency active acoustic data, and fine-scale hydrographic data collected during the 2011 summer Atlantic Marine Assessment Program for Protected Species (AMAPPS) survey. Although studies describing cetacean habitat and distributions have been previously conducted in the Mid-Atlantic Bight, my research specifically focuses on the shelf break region to elucidate both the physical and biological processes that influence cetacean distribution patterns within this cetacean hotspot.

In Chapter One I review biologically important areas for cetaceans in the Atlantic waters of the United States. I describe the study area, the shelf break region of the Mid-Atlantic Bight, in terms of the general oceanography, productivity and biodiversity. According to recent habitat-based cetacean density models, the shelf break region is an area of high cetacean abundance and density, yet little research is directed at understanding the mechanisms that establish this region as a cetacean hotspot.

In Chapter Two I present the basic physical principles of sound in water and describe the methodology used to categorize opportunistically collected multi-frequency active acoustic data using frequency responses techniques. Frequency response classification methods are usually employed in conjunction with net-tow data, but the logistics of the 2011 AMAPPS survey did not allow for appropriate net-tow data to be collected. Biologically meaningful information can be extracted from acoustic scattering regions by comparing the frequency response curves of acoustic regions to theoretical curves of known scattering models. Using the five frequencies on the EK60 system (18, 38, 70, 120, and 200 kHz), three categories of scatterers were defined: fish-like (with swim bladder), nekton-like (e.g., euphausiids), and plankton-like (e.g., copepods). I also employed a multi-frequency acoustic categorization method using three frequencies (18, 38, and 120 kHz) that has been used in the Gulf of Maine and Georges Bank which is based the presence or absence of volume backscatter above a threshold. This method is more objective than the comparison of frequency response curves because it uses an established backscatter value for the threshold. By removing all data below the threshold, only strong scattering information is retained.

In Chapter Three I analyze the distribution of the categorized acoustic regions of interest during the daytime cross shelf transects. Over all transects, plankton-like acoustic regions of interest were detected most frequently, followed by fish-like acoustic regions and then nekton-like acoustic regions. Plankton-like detections were the only significantly different acoustic detections per kilometer, although nekton-like detections were only slightly not significant. Using the threshold categorization method by Jech and Michaels (2006) provides a more conservative and discrete detection of acoustic scatterers and allows me to retrieve backscatter values along transects in areas that have been categorized. This provides continuous data values that can be integrated at discrete spatial increments for wavelet analysis. Wavelet analysis indicates significant spatial scales of interest for fish-like and nekton-like acoustic backscatter range from one to four kilometers and vary among transects.

In Chapter Four I analyze the fine scale distribution of cetaceans in the shelf break system of the Mid-Atlantic Bight using corrected sightings per trackline region, classification trees, multidimensional scaling, and random forest analysis. I describe habitat for common dolphins, Risso’s dolphins and sperm whales. From the distribution of cetacean sightings, patterns of habitat start to emerge: within the shelf break region of the Mid-Atlantic Bight, common dolphins were sighted more prevalently over the shelf while sperm whales were more frequently found in the deep waters offshore and Risso’s dolphins were most prevalent at the shelf break. Multidimensional scaling presents clear environmental separation among common dolphins and Risso’s dolphins and sperm whales. The sperm whale random forest habitat model had the lowest misclassification error (0.30) and the Risso’s dolphin random forest habitat model had the greatest misclassification error (0.37). Shallow water depth (less than 148 meters) was the primary variable selected in the classification model for common dolphin habitat. Distance to surface density fronts and surface temperature fronts were the primary variables selected in the classification models to describe Risso’s dolphin habitat and sperm whale habitat respectively. When mapped back into geographic space, these three cetacean species occupy different fine-scale habitats within the dynamic Mid-Atlantic Bight shelf break system.

In Chapter Five I present a summary of the previous chapters and present potential analytical steps to address ecological questions pertaining the dynamic shelf break region. Taken together, the results of my dissertation demonstrate the use of opportunistically collected data in ecosystem studies; emphasize the need to incorporate middle trophic level data and oceanographic features into cetacean habitat models; and emphasize the importance of developing more mechanistic understanding of dynamic ecosystems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: The purpose of the present study was to investigate the diagnostic value of T2-mapping in acute myocarditis (ACM) and to define cut-off values for edema detection. METHODS: Cardiovascular magnetic resonance (CMR) data of 31 patients with ACM were retrospectively analyzed. 30 healthy volunteers (HV) served as a control. Additionally to the routine CMR protocol, T2-mapping data were acquired at 1.5 T using a breathhold Gradient-Spin-Echo T2-mapping sequence in six short axis slices. T2-maps were segmented according to the 16-segments AHA-model and segmental T2 values as well as the segmental pixel-standard deviation (SD) were analyzed. RESULTS: Mean differences of global myocardial T2 or pixel-SD between HV and ACM patients were only small, lying in the normal range of HV. In contrast, variation of segmental T2 values and pixel-SD was much larger in ACM patients compared to HV. In random forests and multiple logistic regression analyses, the combination of the highest segmental T2 value within each patient (maxT2) and the mean absolute deviation (MAD) of log-transformed pixel-SD (madSD) over all 16 segments within each patient proved to be the best discriminators between HV and ACM patients with an AUC of 0.85 in ROC-analysis. In classification trees, a combined cut-off of 0.22 for madSD and of 68 ms for maxT2 resulted in 83% specificity and 81% sensitivity for detection of ACM. CONCLUSIONS: The proposed cut-off values for maxT2 and madSD in the setting of ACM allow edema detection with high sensitivity and specificity and therefore have the potential to overcome the hurdles of T2-mapping for its integration into clinical routine.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Méthodologie: Simulation; Analyse discriminante linéaire et logistique; Arbres de classification; Réseaux de neurones en base radiale

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Forest cover of the Maringá municipality, located in northern Parana State, was mapped in this study. Mapping was carried out by using high-resolution HRC sensor imagery and medium resolution CCD sensor imagery from the CBERS satellite. Images were georeferenced and forest vegetation patches (TOFs - trees outside forests) were classified using two methods of digital classification: reflectance-based or the digital number of each pixel, and object-oriented. The areas of each polygon were calculated, which allowed each polygon to be segregated into size classes. Thematic maps were built from the resulting polygon size classes and summary statistics generated from each size class for each area. It was found that most forest fragments in Maringá were smaller than 500 m². There was also a difference of 58.44% in the amount of vegetation between the high-resolution imagery and medium resolution imagery due to the distinct spatial resolution of the sensors. It was concluded that high-resolution geotechnology is essential to provide reliable information on urban greens and forest cover under highly human-perturbed landscapes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background Individual signs and symptoms are of limited value for the diagnosis of influenza. Objective To develop a decision tree for the diagnosis of influenza based on a classification and regression tree (CART) analysis. Methods Data from two previous similar cohort studies were assembled into a single dataset. The data were randomly divided into a development set (70%) and a validation set (30%). We used CART analysis to develop three models that maximize the number of patients who do not require diagnostic testing prior to treatment decisions. The validation set was used to evaluate overfitting of the model to the training set. Results Model 1 has seven terminal nodes based on temperature, the onset of symptoms and the presence of chills, cough and myalgia. Model 2 was a simpler tree with only two splits based on temperature and the presence of chills. Model 3 was developed with temperature as a dichotomous variable (≥38°C) and had only two splits based on the presence of fever and myalgia. The area under the receiver operating characteristic curves (AUROCC) for the development and validation sets, respectively, were 0.82 and 0.80 for Model 1, 0.75 and 0.76 for Model 2 and 0.76 and 0.77 for Model 3. Model 2 classified 67% of patients in the validation group into a high- or low-risk group compared with only 38% for Model 1 and 54% for Model 3. Conclusions A simple decision tree (Model 2) classified two-thirds of patients as low or high risk and had an AUROCC of 0.76. After further validation in an independent population, this CART model could support clinical decision making regarding influenza, with low-risk patients requiring no further evaluation for influenza and high-risk patients being candidates for empiric symptomatic or drug therapy.