992 resultados para C4.5


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esse trabalho compara os algoritmos C4.5 e MLP (do inglês “Multilayer Perceptron”) aplicados a avaliação de segurança dinâmica ou (DSA, do inglês “Dynamic Security Assessment”) e em projetos de controle preventivo, com foco na estabilidade transitória de sistemas elétricos de potência (SEPs). O C4.5 é um dos algoritmos da árvore de decisão ou (DT, do inglês “Decision Tree”) e a MLP é um dos membros da família das redes neurais artificiais (RNA). Ambos os algoritmos fornecem soluções para o problema da DSA em tempo real, identificando rapidamente quando um SEP está sujeito a uma perturbação crítica (curto-circuito, por exemplo) que pode levar para a instabilidade transitória. Além disso, o conhecimento obtido de ambas as técnicas, na forma de regras, pode ser utilizado em projetos de controle preventivo para restaurar a segurança do SEP contra perturbações críticas. Baseado na formação de base de dados com exaustivas simulações no domínio do tempo, algumas perturbações críticas específicas são tomadas como exemplo para comparar os algoritmos C4.5 e MLP empregadas a DSA e ao auxílio de ações preventivas. O estudo comparativo é testado no sistema elétrico “New England”. Nos estudos de caso, a base de dados é gerada por meio do programa PSTv3 (“Power System Toolbox”). As DTs e as RNAs são treinada e testadas usando o programa Rapidminer. Os resultados obtidos demonstram que os algoritmos C4.5 e MLP são promissores nas aplicações de DSA e em projetos de controle preventivo.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Neste trabalho, verificou-se a aderência de técnicas de mineração de dados voltadas para problemas de classificação de dados na identificação automatizada de áreas cultivadas com cana-de-açúcar, em imagens do satélite Landsat 5/TM. Para essa verificação, foram estudadas imagens de áreas cultivadas com cana-de-açúcar em três fases fenológicas diferentes. Os pixels foram convertidos em valores de refletância de superfície, nas vizinhanças das cidades de Araras, São Carlos e Araraquara, no Estado de São Paulo. Foram gerados cinco modelos de árvores de decisão binária, induzidos pelo algoritmo C4.5, em que todos produziram taxas de acerto superiores a 90%. A introdução de atributos de textura trouxe ganhos significativos na acurácia do modelo de classificação e contribuiu para melhorar a distinção de áreas cultivadas com cana-de-açúcar em meio a tipos diversos de cobertura do solo, como solo exposto, área urbana, lagos e rios. Os índices de vegetação mostraram-se relevantes na distinção da fase e do estado fenológico das culturas. Os resultados reforçam o potencial forte das árvores de decisão no processo de classificação e identificação de áreas cultivadas com cana-de-açúcar, em diferentes cidades produtoras, no Estado de São Paulo.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação apresentada para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Detecção Remota e Sistemas de Informação Geográfica

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Locomotor problems prevent the bird to move freely, jeopardizing the welfare and productivity, besides generating injuries on the legs of chickens. The objective of this study was to evaluate the influence of age, use of vitamin D, the asymmetry of limbs and gait score, the degree of leg injuries in broilers, using data mining. The analysis was performed on a data set obtained from a field experiment in which it was used two groups of birds with 30 birds each, a control group and one treated with vitamin D. It was evaluated the gait score, the asymmetry between the right and left toes, and the degree of leg injuries. The Weka ® software was used in data mining. In particular, C4.5 algorithm (also known as J48 in Weka environment) was used for the generation of a decision tree. The results showed that age is the factor that most influences the degree of leg injuries and that the data from assessments of gait score were not reliable to estimate leg weakness in broilers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study aimed to identify differences in swine vocalization pattern according to animal gender and different stress conditions. A total of 150 barrow males and 150 females (Dalland® genetic strain), aged 100 days, were used in the experiment. Pigs were exposed to different stressful situations: thirst (no access to water), hunger (no access to food), and thermal stress (THI exceeding 74). For the control treatment, animals were kept under a comfort situation (animals with full access to food and water, with environmental THI lower than 70). Acoustic signals were recorded every 30 minutes, totaling six samples for each stress situation. Afterwards, the audios were analyzed by Praat® 5.1.19 software, generating a sound spectrum. For determination of stress conditions, data were processed by WEKA® 3.5 software, using the decision tree algorithm C4.5, known as J48 in the software environment, considering cross-validation with samples of 10% (10-fold cross-validation). According to the Decision Tree, the acoustic most important attribute for the classification of stress conditions was sound Intensity (root node). It was not possible to identify, using the tested attributes, the animal gender by vocal register. A decision tree was generated for recognition of situations of swine hunger, thirst, and heat stress from records of sound intensity, Pitch frequency, and Formant 1.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tallennustekniikan kehittymisen ja internetin murroksen seurauksena tietomäärät ovat kasvaneet dramaattisesti. Tietomäärien yhä kasvaessa on kehitetty erilaisia menetelmiä relevantin tiedon noutamiseksi tällaisesta tietomassasta, prosessia kutsutaan tiedonlouhinnaksi. Erilaisten tiedonlouhinta-algoritmien joukosta tässä tutkielmassa käsitellään päätöspuualgoritmeja. Päätöspuilla on lukuisia etuja muihin tiedonlouhinta-algoritmeihin nähden: Tietoa tarvitsee yleisesti esikäsitellä hyvin minimaalisesti ennen algoritmille syöttämistään, lisäksi päätöspuilla voidaan tarkastella muuttujien välisiä epälineaarisia riippuvuksia. Kenties tärkeimpänä päätöspuiden käyttöä puoltavana asiana voidaan kuitenkin pitää niiden palauttamaa selkeää puumaista esitysmuotoa, josta johtopäätösten tekeminen on suhteellisen helppoa. Tutkielmassa selvitetään ensin korkealla abstraktiotasolla päätöspuualgoritmien perustoiminta ja ongelmat, jonka jälkeen käydään läpi algoritmien toiminnan kannalta olennaisia tilastollisia käsitteitä. Tämän jälkeen analysoidaan relevanteiksi koettuja päätöspuualgoritmeja matalammalla abstraktiotasolla ja lopuksi vertaillaan algoritmien yhtäläisyyksiä ja eroavaisuuksia esimerkiksi laskentatehokkuuden, toimintatarkkuuden ja tuottetujen puiden koon muodossa. Tutkielmassa vastataan siihen minkälaisen ongelman ratkaisuun on suositeltavaa valita minkäkin tyyppinen päätöspuualgoritmi. Apuna käytetään paitsi alan kirjallisuutta, myös omia käytännön kokeita Weka-tiedonlouhintatyökalulla. Tutkielmassa tullaan siihen tulokseen että CHAID-algoritmia suositellaan käytettävän pääsääntöisesti datan piirteiden analysointiin, kun taas muita tutkielmassa esiteltäviä algoritmeja käytetään lähinnä luokittelutehtäviin. ID3 on vanhentunut algoritmi, jota tulee käyttää enää lähinnä opetus- tai demonstraatiotarkoituksissa. Lopputulosten pohjalta voidaan myös sanoa että pääsääntöisesti haluttaessa suoritusnopeutta tulee hyödyntää C4.5:en pohjalta kehitettyä J48-algoritmia ja mikäli taasen halutaan pienempiä malleja suositellaan käytettäväksi CART:in pohjalta kehitettyä SimpleCart-algoritmia.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La butirilcolinesterasa humana (BChE; EC 3.1.1.8) es una enzima polimórfica sintetizada en el hígado y en el tejido adiposo, ampliamente distribuida en el organismo y encargada de hidrolizar algunos ésteres de colina como la procaína, ésteres alifáticos como el ácido acetilsalicílico, fármacos como la metilprednisolona, el mivacurium y la succinilcolina y drogas de uso y/o abuso como la heroína y la cocaína. Es codificada por el gen BCHE (OMIM 177400), habiéndose identificado más de 100 variantes, algunas no estudiadas plenamente, además de la forma más frecuente, llamada usual o silvestre. Diferentes polimorfismos del gen BCHE se han relacionado con la síntesis de enzimas con niveles variados de actividad catalítica. Las bases moleculares de algunas de esas variantes genéticas han sido reportadas, entre las que se encuentra las variantes Atípica (A), fluoruro-resistente del tipo 1 y 2 (F-1 y F-2), silente (S), Kalow (K), James (J) y Hammersmith (H). En este estudio, en un grupo de pacientes se aplicó el instrumento validado Lifetime Severity Index for Cocaine Use Disorder (LSI-C) para evaluar la gravedad del consumo de “cocaína” a lo largo de la vida. Además, se determinaron Polimorfismos de Nucleótido Simple (SNPs) en el gen BCHE conocidos como responsables de reacciones adversas en pacientes consumidores de “cocaína” mediante secuenciación del gen y se predijo el efecto delos SNPs sobre la función y la estructura de la proteína, mediante el uso de herramientas bio-informáticas. El instrumento LSI-C ofreció resultados en cuatro dimensiones: consumo a lo largo de la vida, consumo reciente, dependencia psicológica e intento de abandono del consumo. Los estudios de análisis molecular permitieron observar dos SNPs codificantes (cSNPs) no sinónimos en el 27.3% de la muestra, c.293A>G (p.Asp98Gly) y c.1699G>A (p.Ala567Thr), localizados en los exones 2 y 4, que corresponden, desde el punto de vista funcional, a la variante Atípica (A) [dbSNP: rs1799807] y a la variante Kalow (K) [dbSNP: rs1803274] de la enzima BChE, respectivamente. Los estudios de predicción In silico establecieron para el SNP p.Asp98Gly un carácter patogénico, mientras que para el SNP p.Ala567Thr, mostraron un comportamiento neutro. El análisis de los resultados permite proponer la existencia de una relación entre polimorfismos o variantes genéticas responsables de una baja actividad catalítica y/o baja concentración plasmática de la enzima BChE y algunas de las reacciones adversas ocurridas en pacientes consumidores de cocaína.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Material no publicado

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Scene classification based on latent Dirichlet allocation (LDA) is a more general modeling method known as a bag of visual words, in which the construction of a visual vocabulary is a crucial quantization process to ensure success of the classification. A framework is developed using the following new aspects: Gaussian mixture clustering for the quantization process, the use of an integrated visual vocabulary (IVV), which is built as the union of all centroids obtained from the separate quantization process of each class, and the usage of some features, including edge orientation histogram, CIELab color moments, and gray-level co-occurrence matrix (GLCM). The experiments are conducted on IKONOS images with six semantic classes (tree, grassland, residential, commercial/industrial, road, and water). The results show that the use of an IVV increases the overall accuracy (OA) by 11 to 12% and 6% when it is implemented on the selected and all features, respectively. The selected features of CIELab color moments and GLCM provide a better OA than the implementation over CIELab color moment or GLCM as individuals. The latter increases the OA by only ∼2 to 3%. Moreover, the results show that the OA of LDA outperforms the OA of C4.5 and naive Bayes tree by ∼20%. © 2014 Society of Photo-Optical Instrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.8.083690]

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Parkinson’s disease is a clinical syndrome manifesting with slowness and instability. As it is a progressive disease with varying symptoms, repeated assessments are necessary to determine the outcome of treatment changes in the patient. In the recent past, a computer-based method was developed to rate impairment in spiral drawings. The downside of this method is that it cannot separate the bradykinetic and dyskinetic spiral drawings. This work intends to construct the computer method which can overcome this weakness by using the Hilbert-Huang Transform (HHT) of tangential velocity. The work is done under supervised learning, so a target class is used which is acquired from a neurologist using a web interface. After reducing the dimension of HHT features by using PCA, classification is performed. C4.5 classifier is used to perform the classification. Results of the classification are close to random guessing which shows that the computer method is unsuccessful in assessing the cause of drawing impairment in spirals when evaluated against human ratings. One promising reason is that there is no difference between the two classes of spiral drawings. Displaying patients self ratings along with the spirals in the web application is another possible reason for this, as the neurologist may have relied too much on this in his own ratings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the service life of water supply network (WSN) growth, the growing phenomenon of aging pipe network has become exceedingly serious. As urban water supply network is hidden underground asset, it is difficult for monitoring staff to make a direct classification towards the faults of pipe network by means of the modern detecting technology. In this paper, based on the basic property data (e.g. diameter, material, pressure, distance to pump, distance to tank, load, etc.) of water supply network, decision tree algorithm (C4.5) has been carried out to classify the specific situation of water supply pipeline. Part of the historical data was used to establish a decision tree classification model, and the remaining historical data was used to validate this established model. Adopting statistical methods were used to access the decision tree model including basic statistical method, Receiver Operating Characteristic (ROC) and Recall-Precision Curves (RPC). These methods has been successfully used to assess the accuracy of this established classification model of water pipe network. The purpose of classification model was to classify the specific condition of water pipe network. It is important to maintain the pipeline according to the classification results including asset unserviceable (AU), near perfect condition (NPC) and serious deterioration (SD). Finally, this research focused on pipe classification which plays a significant role in maintaining water supply networks in the future.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aproximadamente, 15% das discopatias em cães acometem a região cervical, sendo a dor o principal sinal clínico. Descreve-se a ocorrência de protrusão de disco cervical em 17 cães, agrupados segundo a raça, sexo, peso, idade, assim como a distribuição quanto à duração dos sintomas, acometimento dos discos intervertebrais (DIV), tempo de recuperação e porcentagem de sucesso, em relação à condição neurológica presente antes da cirurgia. A raça Dachshund representou 29,5% (n=5), cães sem raça definida, Poodle e Cocker Spaniel Inglês, 17,6% cada (n=9), Pinscher, 11,8% (n=2) e Dálmata, 5,9% (n=1). Destes, 58,8% eram machos (n=10) e 41,2%, fêmeas (n=7), com peso entre 2 e 29kg, e idade média igual a 5,8 anos. O quadro neurológico desses animais correspondia à dor e ataxia, com exceção de um cão Dálmata, 11 anos de idade, que apresentava tetraparesia. A duração dos sinais variou de 2 a 90 dias. Os DIV mais acometidos foram C2/3 (40%), C3/4 (25%), C4/5 (15%), C5/6 (10%) e C6/7 (10%), sendo que alguns animais apresentavam lesões múltiplas. O procedimento foi padrão para todos os animais, através da fenestração e curetagem de todos os DIV abordados pelo acesso ventral, ou seja, de C2/3 até C6/7, empregando-se para isso instrumental usado para remoção de tártaro dentário (curetas Gracey, curetas McCall, extratores de tártaro S.S.White e McCall). O tempo médio de recuperação foi de 9 a 38 dias, sendo que 100% deles recuperaram totalmente as funções neurológicas. Conclui-se que a fenestração ventral apresenta excelentes resultados no tratamento das discopatias cervicais, desde que bem selecionados os pacientes, inclusive, com respeito aos diagnósticos diferenciais.