813 resultados para MACHINE LEARNING CLASSIFIERS
Resumo:
This thesis presents a low cost non-intrusive home energy monitor built on top of Non-Intrusive Load Monitoring (NILM) concepts and techniques. NILM solutions are already considered low cost alternatives to the big majority of existing commercial energy monitors but the goal here is to make its cost even lower by using a mini netbook as a whole in one solution. The mini netbook is installed in the homes main circuit breaker and computes power consumption by reading current and voltage from the built-in sound card. At the same time, feedback to the users is provided using the 11’’ LCD screen as well as other built-in I/O modules. Our meter is also capable of detecting changes in power and tries to find out which appliance lead to that change and it is being used as part of an eco-feedback platform that was build to study the long terms of energy eco-feedback in individuals. In this thesis the steps that were taken to come up with such a system are presented, from the basics of AC power measurements to the implementation of an event detector and classifier that was used to disaggregate the power load. In the last chapter results from some validation tests that have been performed are presented in order to validate the experiment. It is believed that such a system will not only be important as an energy monitor, but also as an open system than can be easily changed to accommodate and test new or existing nonintrusive load monitoring techniques.
Resumo:
O presente trabalho teve como objetivo determinar quais variáveis dimensionais da folha são mais adequadas para utilização na estimativa da área foliar do antúrio (Anthurium andraeanum), cv. Apalai, por meio de equação de regressão linear, e comparar o desempenho de diferentes funções de regressão obtidas com o uso de aprendizado de máquina (AM). A variável que melhor estimou a área foliar foi o produto das dimensões lineares (comprimento e largura), CxL, sendo a equação proposta Af = 0.9672 *C x L, com coeficiente de determinação (R²) de 0,99. Verificou-se, também, com o uso de AM, que as funções lineares são mais adequadas para a estimação da área foliar dessa espécie vegetal.
Resumo:
The industrial automation is directly linked to the development of information tecnology. Better hardware solutions, as well as improvements in software development methodologies make possible the rapid growth of the productive process control. In this thesis, we propose an architecture that will allow the joining of two technologies in hardware (industrial network) and software field (multiagent systems). The objective of this proposal is to join those technologies in a multiagent architecture to allow control strategies implementations in to field devices. With this, we intend develop an agents architecture to detect and solve problems which may occur in the industrial network environment. Our work ally machine learning with industrial context, become proposed multiagent architecture adaptable to unfamiliar or unexpected production environment. We used neural networks and presented an allocation strategies of these networks in industrial network field devices. With this we intend to improve decision support at plant level and allow operations human intervention independent
Resumo:
The Support Vector Machines (SVM) has attracted increasing attention in machine learning area, particularly on classification and patterns recognition. However, in some cases it is not easy to determinate accurately the class which given pattern belongs. This thesis involves the construction of a intervalar pattern classifier using SVM in association with intervalar theory, in order to model the separation of a pattern set between distinct classes with precision, aiming to obtain an optimized separation capable to treat imprecisions contained in the initial data and generated during the computational processing. The SVM is a linear machine. In order to allow it to solve real-world problems (usually nonlinear problems), it is necessary to treat the pattern set, know as input set, transforming from nonlinear nature to linear problem. The kernel machines are responsible to do this mapping. To create the intervalar extension of SVM, both for linear and nonlinear problems, it was necessary define intervalar kernel and the Mercer s theorem (which caracterize a kernel function) to intervalar function
Resumo:
One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database
Resumo:
This paper presents an evaluative study about the effects of using a machine learning technique on the main features of a self-organizing and multiobjective genetic algorithm (GA). A typical GA can be seen as a search technique which is usually applied in problems involving no polynomial complexity. Originally, these algorithms were designed to create methods that seek acceptable solutions to problems where the global optimum is inaccessible or difficult to obtain. At first, the GAs considered only one evaluation function and a single objective optimization. Today, however, implementations that consider several optimization objectives simultaneously (multiobjective algorithms) are common, besides allowing the change of many components of the algorithm dynamically (self-organizing algorithms). At the same time, they are also common combinations of GAs with machine learning techniques to improve some of its characteristics of performance and use. In this work, a GA with a machine learning technique was analyzed and applied in a antenna design. We used a variant of bicubic interpolation technique, called 2D Spline, as machine learning technique to estimate the behavior of a dynamic fitness function, based on the knowledge obtained from a set of laboratory experiments. This fitness function is also called evaluation function and, it is responsible for determining the fitness degree of a candidate solution (individual), in relation to others in the same population. The algorithm can be applied in many areas, including in the field of telecommunications, as projects of antennas and frequency selective surfaces. In this particular work, the presented algorithm was developed to optimize the design of a microstrip antenna, usually used in wireless communication systems for application in Ultra-Wideband (UWB). The algorithm allowed the optimization of two variables of geometry antenna - the length (Ls) and width (Ws) a slit in the ground plane with respect to three objectives: radiated signal bandwidth, return loss and central frequency deviation. These two dimensions (Ws and Ls) are used as variables in three different interpolation functions, one Spline for each optimization objective, to compose a multiobjective and aggregate fitness function. The final result proposed by the algorithm was compared with the simulation program result and the measured result of a physical prototype of the antenna built in the laboratory. In the present study, the algorithm was analyzed with respect to their success degree in relation to four important characteristics of a self-organizing multiobjective GA: performance, flexibility, scalability and accuracy. At the end of the study, it was observed a time increase in algorithm execution in comparison to a common GA, due to the time required for the machine learning process. On the plus side, we notice a sensitive gain with respect to flexibility and accuracy of results, and a prosperous path that indicates directions to the algorithm to allow the optimization problems with "η" variables
Resumo:
Reinforcement learning is a machine learning technique that, although finding a large number of applications, maybe is yet to reach its full potential. One of the inadequately tested possibilities is the use of reinforcement learning in combination with other methods for the solution of pattern classification problems. It is well documented in the literature the problems that support vector machine ensembles face in terms of generalization capacity. Algorithms such as Adaboost do not deal appropriately with the imbalances that arise in those situations. Several alternatives have been proposed, with varying degrees of success. This dissertation presents a new approach to building committees of support vector machines. The presented algorithm combines Adaboost algorithm with a layer of reinforcement learning to adjust committee parameters in order to avoid that imbalances on the committee components affect the generalization performance of the final hypothesis. Comparisons were made with ensembles using and not using the reinforcement learning layer, testing benchmark data sets widely known in area of pattern classification
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The techniques of Machine Learning are applied in classification tasks to acquire knowledge through a set of data or information. Some learning methods proposed in literature are methods based on semissupervised learning; this is represented by small percentage of labeled data (supervised learning) combined with a quantity of label and non-labeled examples (unsupervised learning) during the training phase, which reduces, therefore, the need for a large quantity of labeled instances when only small dataset of labeled instances is available for training. A commom problem in semi-supervised learning is as random selection of instances, since most of paper use a random selection technique which can cause a negative impact. Much of machine learning methods treat single-label problems, in other words, problems where a given set of data are associated with a single class; however, through the requirement existent to classify data in a lot of domain, or more than one class, this classification as called multi-label classification. This work presents an experimental analysis of the results obtained using semissupervised learning in troubles of multi-label classification using reliability parameter as an aid in the classification data. Thus, the use of techniques of semissupervised learning and besides methods of multi-label classification, were essential to show the results
Resumo:
The identification of genes essential for survival is important for the understanding of the minimal requirements for cellular life and for drug design. As experimental studies with the purpose of building a catalog of essential genes for a given organism are time-consuming and laborious, a computational approach which could predict gene essentiality with high accuracy would be of great value. We present here a novel computational approach, called NTPGE (Network Topology-based Prediction of Gene Essentiality), that relies on the network topology features of a gene to estimate its essentiality. The first step of NTPGE is to construct the integrated molecular network for a given organism comprising protein physical, metabolic and transcriptional regulation interactions. The second step consists in training a decision-tree-based machine-learning algorithm on known essential and non-essential genes of the organism of interest, considering as learning attributes the network topology information for each of these genes. Finally, the decision-tree classifier generated is applied to the set of genes of this organism to estimate essentiality for each gene. We applied the NTPGE approach for discovering the essential genes in Escherichia coli and then assessed its performance. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Redes neurais pulsadas - redes que utilizam uma codificação temporal da informação - têm despontado como uma promissora abordagem dentro do paradigma conexionista, emergente da ciência cognitiva. Um desses novos modelos é a rede neural pulsada com função de base radial, que é capaz de armazenar informação nos tempos de atraso axonais dos neurônios. Um algoritmo de aprendizado foi aplicado com sucesso nesta rede pulsada, que se mostrou capaz de mapear uma seqüência de pulsos de entrada em uma seqüência de pulsos de saída. Mais recentemente, um método baseado no uso de campos receptivos gaussianos foi proposto para codificar dados constantes em uma seqüência de pulsos temporais. Este método tornou possível a essa rede lidar com dados computacionais. O processo de aprendizado desta nova rede não se encontra plenamente compreendido e investigações mais profundas são necessárias para situar este modelo dentro do contexto do aprendizado de máquinas e também para estabelecer as habilidades e limitações desta rede. Este trabalho apresenta uma investigação desse novo classificador e um estudo de sua capacidade de agrupar dados em três dimensões, particularmente procurando estabelecer seus domínios de aplicação e horizontes no campo da visão computacional.
Resumo:
Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.
Resumo:
Most of the tasks in genome annotation can be at least partially automated. Since this annotation is time-consuming, facilitating some parts of the process - thus freeing the specialist to carry out more valuable tasks - has been the motivation of many tools and annotation environments. In particular, annotation of protein function can benefit from knowledge about enzymatic processes. The use of sequence homology alone is not a good approach to derive this knowledge when there are only a few homologues of the sequence to be annotated. The alternative is to use motifs. This paper uses a symbolic machine learning approach to derive rules for the classification of enzymes according to the Enzyme Commission (EC). Our results show that, for the top class, the average global classification error is 3.13%. Our technique also produces a set of rules relating structural to functional information, which is important to understand the protein tridimensional structure and determine its biological function. © 2009 Springer Berlin Heidelberg.
Resumo:
This paper presents a novel, fast and accurate appearance-based method for infrared face recognition. By introducing the Optimum-Path Forest classifier, our objective is to get good recognition rates and effectively reduce the computational effort. The feature extraction procedure is carried out by PCA, and the results are compared to two other well known supervised learning classifiers; Artificial Neural Networks and Support Vector Machines. The achieved performance asserts the promise of the proposed framework. ©2009 IEEE.