39 resultados para machine learning
Resumo:
Concept drift, which refers to non stationary learning problems over time, has increasing importance in machine learning and data mining. Many concept drift applications require fast response, which means an algorithm must always be (re)trained with the latest available data. But the process of data labeling is usually expensive and/or time consuming when compared to acquisition of unlabeled data, thus usually only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are based on assumptions that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenging task in machine learning. Recently, a particle competition and cooperation approach has been developed to realize graph-based semi-supervised learning from static data. We have extend that approach to handle data streams and concept drift. The result is a passive algorithm which uses a single classifier approach, naturally adapted to concept changes without any explicit drift detection mechanism. It has built-in mechanisms that provide a natural way of learning from new data, gradually "forgetting" older knowledge as older data items are no longer useful for the classification of newer data items. The proposed algorithm is applied to the KDD Cup 1999 Data of network intrusion, showing its effectiveness.
Resumo:
In the pattern recognition research field, Support Vector Machines (SVM) have been an effectiveness tool for classification purposes, being successively employed in many applications. The SVM input data is transformed into a high dimensional space using some kernel functions where linear separation is more likely. However, there are some computational drawbacks associated to SVM. One of them is the computational burden required to find out the more adequate parameters for the kernel mapping considering each non-linearly separable input data space, which reflects the performance of SVM. This paper introduces the Polynomial Powers of Sigmoid for SVM kernel mapping, and it shows their advantages over well-known kernel functions using real and synthetic datasets.
Resumo:
O presente trabalho teve como objetivo determinar quais variáveis dimensionais da folha são mais adequadas para utilização na estimativa da área foliar do antúrio (Anthurium andraeanum), cv. Apalai, por meio de equação de regressão linear, e comparar o desempenho de diferentes funções de regressão obtidas com o uso de aprendizado de máquina (AM). A variável que melhor estimou a área foliar foi o produto das dimensões lineares (comprimento e largura), CxL, sendo a equação proposta Af = 0.9672 *C x L, com coeficiente de determinação (R²) de 0,99. Verificou-se, também, com o uso de AM, que as funções lineares são mais adequadas para a estimação da área foliar dessa espécie vegetal.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The identification of genes essential for survival is important for the understanding of the minimal requirements for cellular life and for drug design. As experimental studies with the purpose of building a catalog of essential genes for a given organism are time-consuming and laborious, a computational approach which could predict gene essentiality with high accuracy would be of great value. We present here a novel computational approach, called NTPGE (Network Topology-based Prediction of Gene Essentiality), that relies on the network topology features of a gene to estimate its essentiality. The first step of NTPGE is to construct the integrated molecular network for a given organism comprising protein physical, metabolic and transcriptional regulation interactions. The second step consists in training a decision-tree-based machine-learning algorithm on known essential and non-essential genes of the organism of interest, considering as learning attributes the network topology information for each of these genes. Finally, the decision-tree classifier generated is applied to the set of genes of this organism to estimate essentiality for each gene. We applied the NTPGE approach for discovering the essential genes in Escherichia coli and then assessed its performance. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Redes neurais pulsadas - redes que utilizam uma codificação temporal da informação - têm despontado como uma promissora abordagem dentro do paradigma conexionista, emergente da ciência cognitiva. Um desses novos modelos é a rede neural pulsada com função de base radial, que é capaz de armazenar informação nos tempos de atraso axonais dos neurônios. Um algoritmo de aprendizado foi aplicado com sucesso nesta rede pulsada, que se mostrou capaz de mapear uma seqüência de pulsos de entrada em uma seqüência de pulsos de saída. Mais recentemente, um método baseado no uso de campos receptivos gaussianos foi proposto para codificar dados constantes em uma seqüência de pulsos temporais. Este método tornou possível a essa rede lidar com dados computacionais. O processo de aprendizado desta nova rede não se encontra plenamente compreendido e investigações mais profundas são necessárias para situar este modelo dentro do contexto do aprendizado de máquinas e também para estabelecer as habilidades e limitações desta rede. Este trabalho apresenta uma investigação desse novo classificador e um estudo de sua capacidade de agrupar dados em três dimensões, particularmente procurando estabelecer seus domínios de aplicação e horizontes no campo da visão computacional.
Resumo:
Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.
Resumo:
Most of the tasks in genome annotation can be at least partially automated. Since this annotation is time-consuming, facilitating some parts of the process - thus freeing the specialist to carry out more valuable tasks - has been the motivation of many tools and annotation environments. In particular, annotation of protein function can benefit from knowledge about enzymatic processes. The use of sequence homology alone is not a good approach to derive this knowledge when there are only a few homologues of the sequence to be annotated. The alternative is to use motifs. This paper uses a symbolic machine learning approach to derive rules for the classification of enzymes according to the Enzyme Commission (EC). Our results show that, for the top class, the average global classification error is 3.13%. Our technique also produces a set of rules relating structural to functional information, which is important to understand the protein tridimensional structure and determine its biological function. © 2009 Springer Berlin Heidelberg.
Resumo:
The applications of Automatic Vowel Recognition (AVR), which is a sub-part of fundamental importance in most of the speech processing systems, vary from automatic interpretation of spoken language to biometrics. State-of-the-art systems for AVR are based on traditional machine learning models such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), however, such classifiers can not deal with efficiency and effectiveness at the same time, existing a gap to be explored when real-time processing is required. In this work, we present an algorithm for AVR based on the Optimum-Path Forest (OPF), which is an emergent pattern recognition technique recently introduced in literature. Adopting a supervised training procedure and using speech tags from two public datasets, we observed that OPF has outperformed ANNs, SVMs, plus other classifiers, in terms of training time and accuracy. ©2010 IEEE.
Resumo:
Musical genre classification has been paramount in the last years, mainly in large multimedia datasets, in which new songs and genres can be added at every moment by anyone. In this context, we have seen the growing of musical recommendation systems, which can improve the benefits for several applications, such as social networks and collective musical libraries. In this work, we have introduced a recent machine learning technique named Optimum-Path Forest (OPF) for musical genre classification, which has been demonstrated to be similar to the state-of-the-art pattern recognition techniques, but much faster for some applications. Experiments in two public datasets were conducted against Support Vector Machines and a Bayesian classifier to show the validity of our work. In addition, we have executed an experiment using very recent hybrid feature selection techniques based on OPF to speed up feature extraction process. © 2011 International Society for Music Information Retrieval.
Resumo:
The spermatogenesis is crucial to the species reproduction, and its monitoring may shed light over some important information of such process. Thus, the germ cells quantification can provide useful tools to improve the reproduction cycle. In this paper, we present the first work that address this problem in fishes with machine learning techniques. We show here how to obtain high recognition accuracies in order to identify fish germ cells with several state-of-the-art supervised pattern recognition techniques. © 2011 IEEE.
Resumo:
Duplex and superduplex stainless steels are class of materials of a high importance for engineering purposes, since they have good mechanical properties combination and also are very resistant to corrosion. It is known as well that the chemical composition of such steels is very important to maintain some desired properties. In the past years, some works have reported that γ 2 precipitation improves the toughness of such steels, and its quantification may reveals some important information about steel quality. Thus, we propose in this work the automatic segmentation of γ 2 precipitation using two pattern recognition techniques: Optimum-Path Forest (OPF) and a Bayesian classifier. To the best of our knowledge, this if the first time that machine learning techniques are applied into this area. The experimental results showed that both techniques achieved similar and good recognition rates. © 2012 Taylor & Francis Group.
Resumo:
Voice-based user interfaces have been actively pursued aiming to help individuals with motor impairments, providing natural interfaces to communicate with machines. In this work, we have introduced a recent machine learning technique named Optimum-Path Forest (OPF) for voice-based robot interface, which has been demonstrated to be similar to the state-of-the-art pattern recognition techniques, but much faster. Experiments were conducted against Support Vector Machines, Neural Networks and a Bayesian classifier to show the OPF robustness. The proposed architecture provides high accuracy rates allied with low computational times. © 2012 IEEE.