23 resultados para decision tree
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
Background: Leptospirosis is an important zoonotic disease associated with poor areas of urban settings of developing countries and early diagnosis and prompt treatment may prevent disease. Although rodents are reportedly considered the main reservoirs of leptospirosis, dogs may develop the disease, may become asymptomatic carriers and may be used as sentinels for disease epidemiology. The use of Geographical Information Systems (GIS) combined with spatial analysis techniques allows the mapping of the disease and the identification and assessment of health risk factors. Besides the use of GIS and spatial analysis, the technique of data mining, decision tree, can provide a great potential to find a pattern in the behavior of the variables that determine the occurrence of leptospirosis. The objective of the present study was to apply Geographical Information Systems and data prospection (decision tree) to evaluate the risk factors for canine leptospirosis in an area of Curitiba, PR.Materials, Methods & Results: The present study was performed on the Vila Pantanal, a urban poor community in the city of Curitiba. A total of 287 dog blood samples were randomly obtained house-by-house in a two-day sampling on January 2010. In addition, a questionnaire was applied to owners at the time of sampling. Geographical coordinates related to each household of tested dog were obtained using a Global Positioning System (GPS) for mapping the spatial distribution of reagent and non-reagent dogs to leptospirosis. For the decision tree, risk factors included results of microagglutination test (MAT) from the serum of dogs, previous disease on the household, contact with rats or other dogs, dog breed, outdoors access, feeding, trash around house or backyard, open sewer proximity and flooding. A total of 189 samples (about 2/3 of overall samples) were randomly selected for the training file and consequent decision rules. The remained 98 samples were used for the testing file. The seroprevalence showed a pattern of spatial distribution that involved all the Pantanal area, without agglomeration of reagent animals. In relation to data mining, from 189 samples used in decision tree, a total of 165 (87.3%) animal samples were correctly classified, generating a Kappa index of 0.413. A total of 154 out of 159 (96.8%) samples were considered non-reagent and were correctly classified and only 5/159 (3.2%) were wrongly identified. on the other hand, only 11 (36.7%) reagent samples were correctly classified, with 19 (63.3%) samples failing diagnosis.Discussion: The spatial distribution that involved all the Pantanal area showed that all the animals in the area are at risk of contamination by Leptospira spp. Although most samples had been classified correctly by the decision tree, a degree of difficulty of separability related to seropositive animals was observed, with only 36.7% of the samples classified correctly. This can occur due to the fact of seronegative animals number is superior to the number of seropositive ones, taking the differences in the pattern of variable behavior. The data mining helped to evaluate the most important risk factors for leptospirosis in an urban poor community of Curitiba. The variables selected by decision tree reflected the important factors about the existence of the disease (default of sewer, presence of rats and rubbish and dogs with free access to street). The analyses showed the multifactorial character of the epidemiology of canine leptospirosis.
Resumo:
We are investigating the combination of wavelets and decision trees to detect ships and other maritime surveillance targets from medium resolution SAR images. Wavelets have inherent advantages to extract image descriptors while decision trees are able to handle different data sources. In addition, our work aims to consider oceanic features such as ship wakes and ocean spills. In this incipient work, Haar and Cohen-Daubechies-Feauveau 9/7 wavelets obtain detailed descriptors from targets and ocean features and are inserted with other statistical parameters and wavelets into an oblique decision tree. © 2011 Springer-Verlag.
Resumo:
The identification of tree species is a key step for sustainable management plans of forest resources, as well as for several other applications that are based on such surveys. However, the present available techniques are dependent on the presence of tree structures, such as flowers, fruits, and leaves, limiting the identification process to certain periods of the year Therefore, this article introduces a study on the application of statistical parameters for texture classification of tree trunk images. For that, 540 samples from five Brazilian native deciduous species were acquired and measures of entropy, uniformity, smoothness, asymmetry (third moment), mean, and standard deviation were obtained from the presented textures. Using a decision tree, a biometric species identification system was constructed and resulted to a 0.84 average precision rate for species classification with 0.83accuracy and 0.79 agreement. Thus, it can be considered that the use of texture presented in trunk images can represent an important advance in tree identification, since the limitations of the current techniques can be overcome.
Resumo:
Layer mortality due to heat stress is an important economic loss for the producer. The aim of this study was to determine the mortality pattern of layers reared in the region of Bastos, SP, Brazil, according to external environment and bird age. Data mining technique were used based on monthly mortality records of hens in production, 135 poultry houses, from January 2004 to August 2008. The external environment was characterized according maximum and minimum temperatures, obtained monthly at the meteorological station CATI in the city of Tupa, SP, Brazil. Mortality was classified as normal (<= 1.2%) or high (> 1.2%), considering the mortality limits mentioned in literature. Data mining technique produced a decision tree with nine levels and 23 leaves, with 62.6% of overall accuracy. The hit rate for the High class was 64.1% and 59.9% for Normal class. The decision tree allowed finding a pattern in the mortality data, generating a model for estimating mortality based on the thermal environment and bird age.
Resumo:
Foliar diagnosis is a method for assessing the nutritional status of agricultural crops, which helps in the understanding of soil fertility and rationalized application of fertilizers taking into account economic and environmental criteria. The study aimed to use the landrelief as criteria to assist in interpreting the spatial variability of nutrient content of the citrus leaf. The leaves were collected at regular intervals of 50 m, totaling 332 sampling points. Data were analyzed by descriptive statistics, geostatistics and induction of decision tree. With the aid of digital elevation model (MDE) and the profile planaltimetric, the area was divided into three different landrelief and sub-strands. The highest values for nutrients from the leaves of citrus were observed at the top (concave area) segments on a half-slope and lower slope. The nutrients from the citrus leaves showed high values of correlation (above 0.5) with the altitude of the study area. The technique of geostatistics and the induction of decision tree show that the relief is the variable with the greatest potential to interpret the maps of spatial variability of nutrients from the citrus leaves.
Resumo:
Nitroaromatic compounds such as nifuroxazide are used in many human enteropathogenic bacteria infections without causing an increase in the plasmidial antibiotic resistance of the aerobic Gram-negative intestinal Enterobacteriaceae. For these reasons, these compounds have been synthesized using the rational approach of Topliss' decision tree. Generally. this approach allows us to obtain the most active derivative from the series in a few steps. These compounds were tested against Mycobacterium tuberculosis in vitro and the most active of the series identified. A new lead for potential tuberculostatic activity has been predicted and will be used in further QSAR studies. (C) 2002 Elsevier B.V. Ltd. All rights reserved.
Resumo:
CONTEXTO E OBJETIVO: Gestações complicadas pelo diabetes estão associadas com aumento de complicações maternas e neonatais. Os custos hospitalares aumentam de acordo com a assistência prestada. O objetivo foi calcular o custo-benefício e a taxa de rentabilidade social da hospitalização comparada ao atendimento ambulatorial em gestantes com diabetes ou com hiperglicemia leve. DESENHO do ESTUDO: Estudo prospectivo, observacional, quantitativo, realizado em hospital universitário, sendo incluídas todas as gestantes com diabetes pregestacional e gestacional ou com hiperglicemia leve que não desenvolveram intercorrências clínicas na gestação e que tiveram parto no Hospital das Clínicas, Faculdade de Medicina de Botucatu, Universidade Estadual Paulista (HC-FMB-Unesp). MÉTODOS: Trinta gestantes tratadas com dieta foram acompanhadas em ambulatório e 20 tratadas com dieta e insulina foram abordadas com hospitalizações curtas e frequentes. Foram obtidos custos diretos (pessoal, material e exames) e indiretos (despesas gerais) a partir de dados contidos no prontuário e no sistema de custo por absorção do hospital e posteriormente calculado o custo-benefício. RESULTADOS: O sucesso do tratamento das gestantes diabéticas evitou o gasto de US$ 1.517,97 e US$ 1.127,43 para pacientes hospitalizadas e ambulatoriais, respectivamente. O custo-benefício da atenção hospitalizada foi US$ 143.719,16 e ambulatorial, US$ 253.267,22, com rentabilidade social 1,87 e 5,35 respectivamente. CONCLUSÃO: A análise árvore de decisão confirma que o sucesso dos tratamentos elimina custos no hospital. A relação custo-benefício indicou que o tratamento ambulatorial é economicamente mais vantajoso do que a hospitalização. A rentabilidade social de ambos os tratamentos foi maior que 1, indicando que ambos os tipos de atendimento à gestante diabética têm benefício positivo.
Resumo:
The identification of genes essential for survival is important for the understanding of the minimal requirements for cellular life and for drug design. As experimental studies with the purpose of building a catalog of essential genes for a given organism are time-consuming and laborious, a computational approach which could predict gene essentiality with high accuracy would be of great value. We present here a novel computational approach, called NTPGE (Network Topology-based Prediction of Gene Essentiality), that relies on the network topology features of a gene to estimate its essentiality. The first step of NTPGE is to construct the integrated molecular network for a given organism comprising protein physical, metabolic and transcriptional regulation interactions. The second step consists in training a decision-tree-based machine-learning algorithm on known essential and non-essential genes of the organism of interest, considering as learning attributes the network topology information for each of these genes. Finally, the decision-tree classifier generated is applied to the set of genes of this organism to estimate essentiality for each gene. We applied the NTPGE approach for discovering the essential genes in Escherichia coli and then assessed its performance. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Background: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.Results: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.Conclusions: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.
Resumo:
Oil spills cause great damage to coastal habitats, especially when rapid and suitable response measures are not taken. Establishing high priority areas is fundamental for the operation of response teams. Under this context and considering the need for keeping all geographical information up-to-date for emergencial use, the present study proposes employing a decision tree coupled with a knowledge-based approach using GIS to assign oil sensitivity indices to Brazilian coastal habitats. The modelled system works based on rules set by the official standards of Brazilian Federal Environment Organ. We tested it on one of the littoral regions of Brazil where transportation of petroleum is most intense: the coast of the municipalities of Sao Sebastiao and Caraguatatuba in the northern littoral of São Paulo state, Brazil. The system automatically ranked the littoral sensitivity index of the study area habitats according to geographical conditions during summer and winter; since index ranks of some habitats varied between these seasons because of sediment alterations. The obtained results illustrate the great potential of the proposed system in generating ESI maps and in aiding response teams during emergency operations. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
This paper describes an investigation of the hybrid PSO/ACO algorithm to classify automatically the well drilling operation stages. The method feasibility is demonstrated by its application to real mud-logging dataset. The results are compared with bio-inspired methods, and rule induction and decision tree algorithms for data mining. © 2009 Springer Berlin Heidelberg.
Resumo:
Protein-protein interactions (PPIs) are essential for understanding the function of biological systems and have been characterized using a vast array of experimental techniques. These techniques detect only a small proportion of all PPIs and are labor intensive and time consuming. Therefore, the development of computational methods capable of predicting PPIs accelerates the pace of discovery of new interactions. This paper reports a machine learning-based prediction model, the Universal In Silico Predictor of Protein-Protein Interactions (UNISPPI), which is a decision tree model that can reliably predict PPIs for all species (including proteins from parasite-host associations) using only 20 combinations of amino acids frequencies from interacting and non-interacting proteins as learning features. UNISPPI was able to correctly classify 79.4% and 72.6% of experimentally supported interactions and non-interacting protein pairs, respectively, from an independent test set. Moreover, UNISPPI suggests that the frequencies of the amino acids asparagine, cysteine and isoleucine are important features for distinguishing between interacting and non-interacting protein pairs. We envisage that UNISPPI can be a useful tool for prioritizing interactions for experimental validation. © 2013 Valente et al.
Resumo:
Breast cancer is the most common cancer among women. In CAD systems, several studies have investigated the use of wavelet transform as a multiresolution analysis tool for texture analysis and could be interpreted as inputs to a classifier. In classification, polynomial classifier has been used due to the advantages of providing only one model for optimal separation of classes and to consider this as the solution of the problem. In this paper, a system is proposed for texture analysis and classification of lesions in mammographic images. Multiresolution analysis features were extracted from the region of interest of a given image. These features were computed based on three different wavelet functions, Daubechies 8, Symlet 8 and bi-orthogonal 3.7. For classification, we used the polynomial classification algorithm to define the mammogram images as normal or abnormal. We also made a comparison with other artificial intelligence algorithms (Decision Tree, SVM, K-NN). A Receiver Operating Characteristics (ROC) curve is used to evaluate the performance of the proposed system. Our system is evaluated using 360 digitized mammograms from DDSM database and the result shows that the algorithm has an area under the ROC curve Az of 0.98 ± 0.03. The performance of the polynomial classifier has proved to be better in comparison to other classification algorithms. © 2013 Elsevier Ltd. All rights reserved.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)