928 resultados para Aprendizado de máquina. Aprendizado semissupervisionado.Classificação multirrótulo. Parâmetro de confiabilidade


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The techniques of Machine Learning are applied in classification tasks to acquire knowledge through a set of data or information. Some learning methods proposed in literature are methods based on semissupervised learning; this is represented by small percentage of labeled data (supervised learning) combined with a quantity of label and non-labeled examples (unsupervised learning) during the training phase, which reduces, therefore, the need for a large quantity of labeled instances when only small dataset of labeled instances is available for training. A commom problem in semi-supervised learning is as random selection of instances, since most of paper use a random selection technique which can cause a negative impact. Much of machine learning methods treat single-label problems, in other words, problems where a given set of data are associated with a single class; however, through the requirement existent to classify data in a lot of domain, or more than one class, this classification as called multi-label classification. This work presents an experimental analysis of the results obtained using semissupervised learning in troubles of multi-label classification using reliability parameter as an aid in the classification data. Thus, the use of techniques of semissupervised learning and besides methods of multi-label classification, were essential to show the results

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data classification is a task with high applicability in a lot of areas. Most methods for treating classification problems found in the literature dealing with single-label or traditional problems. In recent years has been identified a series of classification tasks in which the samples can be labeled at more than one class simultaneously (multi-label classification). Additionally, these classes can be hierarchically organized (hierarchical classification and hierarchical multi-label classification). On the other hand, we have also studied a new category of learning, called semi-supervised learning, combining labeled data (supervised learning) and non-labeled data (unsupervised learning) during the training phase, thus reducing the need for a large amount of labeled data when only a small set of labeled samples is available. Thus, since both the techniques of multi-label and hierarchical multi-label classification as semi-supervised learning has shown favorable results with its use, this work is proposed and used to apply semi-supervised learning in hierarchical multi-label classication tasks, so eciently take advantage of the main advantages of the two areas. An experimental analysis of the proposed methods found that the use of semi-supervised learning in hierarchical multi-label methods presented satisfactory results, since the two approaches were statistically similar results

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reinforcement learning is a machine learning technique that, although finding a large number of applications, maybe is yet to reach its full potential. One of the inadequately tested possibilities is the use of reinforcement learning in combination with other methods for the solution of pattern classification problems. It is well documented in the literature the problems that support vector machine ensembles face in terms of generalization capacity. Algorithms such as Adaboost do not deal appropriately with the imbalances that arise in those situations. Several alternatives have been proposed, with varying degrees of success. This dissertation presents a new approach to building committees of support vector machines. The presented algorithm combines Adaboost algorithm with a layer of reinforcement learning to adjust committee parameters in order to avoid that imbalances on the committee components affect the generalization performance of the final hypothesis. Comparisons were made with ensembles using and not using the reinforcement learning layer, testing benchmark data sets widely known in area of pattern classification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Significant advances have emerged in research related to the topic of Classifier Committees. The models that receive the most attention in the literature are those of the static nature, also known as ensembles. The algorithms that are part of this class, we highlight the methods that using techniques of resampling of the training data: Bagging, Boosting and Multiboosting. The choice of the architecture and base components to be recruited is not a trivial task and has motivated new proposals in an attempt to build such models automatically, and many of them are based on optimization methods. Many of these contributions have not shown satisfactory results when applied to more complex problems with different nature. In contrast, the thesis presented here, proposes three new hybrid approaches for automatic construction for ensembles: Increment of Diversity, Adaptive-fitness Function and Meta-learning for the development of systems for automatic configuration of parameters for models of ensemble. In the first one approach, we propose a solution that combines different diversity techniques in a single conceptual framework, in attempt to achieve higher levels of diversity in ensembles, and with it, the better the performance of such systems. In the second one approach, using a genetic algorithm for automatic design of ensembles. The contribution is to combine the techniques of filter and wrapper adaptively to evolve a better distribution of the feature space to be presented for the components of ensemble. Finally, the last one approach, which proposes new techniques for recommendation of architecture and based components on ensemble, by techniques of traditional meta-learning and multi-label meta-learning. In general, the results are encouraging and corroborate with the thesis that hybrid tools are a powerful solution in building effective ensembles for pattern classification problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pós-graduação em Agronomia (Energia na Agricultura) - FCA

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta tese apresenta contribuições ao processo de Descoberta de Conhecimento em Bases de Dados (DCBD). DCBD pode ser entendido como um conjunto de técnicas automatizadas – ou semi-automatizadas – otimizadas para extrair conhecimento a partir de grandes bases de dados. Assim, o já, de longa data, praticado processo de descoberta de conhecimento passa a contar com aprimoramentos que o tornam mais fácil de ser realizado. A partir dessa visão, bem conhecidos algoritmos de Estatística e de Aprendizado de Máquina passam a funcionar com desempenho aceitável sobre bases de dados cada vez maiores. Da mesma forma, tarefas como coleta, limpeza e transformação de dados e seleção de atributos, parâmetros e modelos recebem um suporte que facilita cada vez mais a sua execução. A contribuição principal desta tese consiste na aplicação dessa visão para a otimização da descoberta de conhecimento a partir de dados não-classificados. Adicionalmente, são apresentadas algumas contribuições sobre o Modelo Neural Combinatório (MNC), um sistema híbrido neurossimbólico para classificação que elegemos como foco de trabalho. Quanto à principal contribuição, percebeu-se que a descoberta de conhecimento a partir de dados não-classificados, em geral, é dividida em dois subprocessos: identificação de agrupamentos (aprendizado não-supervisionado) seguida de classificação (aprendizado supervisionado). Esses subprocessos correspondem às tarefas de rotulagem dos itens de dados e obtenção das correlações entre os atributos da entrada e os rótulos. Não encontramos outra razão para que haja essa separação que as limitações inerentes aos algoritmos específicos. Uma dessas limitações, por exemplo, é a necessidade de iteração de muitos deles buscando a convergência para um determinado modelo. Isto obriga a que o algoritmo realize várias leituras da base de dados, o que, para Mineração de Dados, é proibitivo. A partir dos avanços em DCBD, particularmente com o desenvolvimento de algoritmos de aprendizado que realizam sua tarefa em apenas uma leitura dos dados, fica evidente a possibilidade de se reduzir o número de acessos na realização do processo completo. Nossa contribuição, nesse caso, se materializa na proposta de uma estrutura de trabalho para integração dos dois paradigmas e a implementação de um protótipo dessa estrutura utilizando-se os algoritmos de aprendizado ART1, para identificação de agrupamentos, e MNC, para a tarefa de classificação. É também apresentada uma aplicação no mapeamento de áreas homogêneas de plantio de trigo no Brasil, de 1975 a 1999. Com relação às contribuições sobre o MNC são apresentados: (a) uma variante do algoritmo de treinamento que permite uma redução significativa do tamanho do modelo após o aprendizado; (b) um estudo sobre a redução da complexidade do modelo com o uso de máquinas de comitê; (c) uma técnica, usando o método do envoltório, para poda controlada do modelo final e (d) uma abordagem para tratamento de inconsistências e perda de conhecimento que podem ocorrer na construção do modelo.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O presente trabalho investiga a relação entre aprendizado e dinâmica em sistemas complexos multiagentes. Fazemos isso através de estudos experimentais em um cenário de racionalidade limitada que situa-se na interesecção entre Inteligência Artificial, Economia e Física Estatística, conhecido como “Minority Game”. Apresentamos resultados experimentais sobre o jogo focando o estudo do cenário sob uma perspectiva de Aprendizado de Máquina. Introduzimos um novo algoritmo de aprendizado para os agentes no jogo, que chamamos de aprendizado criativo, e mostramos que este algoritmo induz uma distribuição mais eficiente de recursos entre os agentes. Este aumento de eficiência mostra-se resultante de uma busca irrestrita no espaço de estratégias que permitem uma maximização mais eficiente das distâncias entre estratégias. Analisamos então os efeitos dos parâmetros deste algoritmo no desempenho de um agente, comparando os resultados com o algoritmo tradicional de aprendizado e mostramos que o algoritmo proposto é mais eficiente que o tradicional na maioria das situações. Finalmente, investigamos como o tamanho de memória afeta o desempenho de agentes utilizando ambos algoritmos e concluímos que agentes individuais com tamanhos de memória maiores apenas obtém um aumento no desempenho se o sistema se encontrar em uma região ineficiente, enquanto que nas demais fases tais aumentos são irrelevantes - e mesmo danosos - à performance desses agentes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The skin cancer is the most common of all cancers and the increase of its incidence must, in part, caused by the behavior of the people in relation to the exposition to the sun. In Brazil, the non-melanoma skin cancer is the most incident in the majority of the regions. The dermatoscopy and videodermatoscopy are the main types of examinations for the diagnosis of dermatological illnesses of the skin. The field that involves the use of computational tools to help or follow medical diagnosis in dermatological injuries is seen as very recent. Some methods had been proposed for automatic classification of pathology of the skin using images. The present work has the objective to present a new intelligent methodology for analysis and classification of skin cancer images, based on the techniques of digital processing of images for extraction of color characteristics, forms and texture, using Wavelet Packet Transform (WPT) and learning techniques called Support Vector Machine (SVM). The Wavelet Packet Transform is applied for extraction of texture characteristics in the images. The WPT consists of a set of base functions that represents the image in different bands of frequency, each one with distinct resolutions corresponding to each scale. Moreover, the characteristics of color of the injury are also computed that are dependants of a visual context, influenced for the existing colors in its surround, and the attributes of form through the Fourier describers. The Support Vector Machine is used for the classification task, which is based on the minimization principles of the structural risk, coming from the statistical learning theory. The SVM has the objective to construct optimum hyperplanes that represent the separation between classes. The generated hyperplane is determined by a subset of the classes, called support vectors. For the used database in this work, the results had revealed a good performance getting a global rightness of 92,73% for melanoma, and 86% for non-melanoma and benign injuries. The extracted describers and the SVM classifier became a method capable to recognize and to classify the analyzed skin injuries

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an evaluative study about the effects of using a machine learning technique on the main features of a self-organizing and multiobjective genetic algorithm (GA). A typical GA can be seen as a search technique which is usually applied in problems involving no polynomial complexity. Originally, these algorithms were designed to create methods that seek acceptable solutions to problems where the global optimum is inaccessible or difficult to obtain. At first, the GAs considered only one evaluation function and a single objective optimization. Today, however, implementations that consider several optimization objectives simultaneously (multiobjective algorithms) are common, besides allowing the change of many components of the algorithm dynamically (self-organizing algorithms). At the same time, they are also common combinations of GAs with machine learning techniques to improve some of its characteristics of performance and use. In this work, a GA with a machine learning technique was analyzed and applied in a antenna design. We used a variant of bicubic interpolation technique, called 2D Spline, as machine learning technique to estimate the behavior of a dynamic fitness function, based on the knowledge obtained from a set of laboratory experiments. This fitness function is also called evaluation function and, it is responsible for determining the fitness degree of a candidate solution (individual), in relation to others in the same population. The algorithm can be applied in many areas, including in the field of telecommunications, as projects of antennas and frequency selective surfaces. In this particular work, the presented algorithm was developed to optimize the design of a microstrip antenna, usually used in wireless communication systems for application in Ultra-Wideband (UWB). The algorithm allowed the optimization of two variables of geometry antenna - the length (Ls) and width (Ws) a slit in the ground plane with respect to three objectives: radiated signal bandwidth, return loss and central frequency deviation. These two dimensions (Ws and Ls) are used as variables in three different interpolation functions, one Spline for each optimization objective, to compose a multiobjective and aggregate fitness function. The final result proposed by the algorithm was compared with the simulation program result and the measured result of a physical prototype of the antenna built in the laboratory. In the present study, the algorithm was analyzed with respect to their success degree in relation to four important characteristics of a self-organizing multiobjective GA: performance, flexibility, scalability and accuracy. At the end of the study, it was observed a time increase in algorithm execution in comparison to a common GA, due to the time required for the machine learning process. On the plus side, we notice a sensitive gain with respect to flexibility and accuracy of results, and a prosperous path that indicates directions to the algorithm to allow the optimization problems with "η" variables

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Redes neurais pulsadas - redes que utilizam uma codificação temporal da informação - têm despontado como uma promissora abordagem dentro do paradigma conexionista, emergente da ciência cognitiva. Um desses novos modelos é a rede neural pulsada com função de base radial, que é capaz de armazenar informação nos tempos de atraso axonais dos neurônios. Um algoritmo de aprendizado foi aplicado com sucesso nesta rede pulsada, que se mostrou capaz de mapear uma seqüência de pulsos de entrada em uma seqüência de pulsos de saída. Mais recentemente, um método baseado no uso de campos receptivos gaussianos foi proposto para codificar dados constantes em uma seqüência de pulsos temporais. Este método tornou possível a essa rede lidar com dados computacionais. O processo de aprendizado desta nova rede não se encontra plenamente compreendido e investigações mais profundas são necessárias para situar este modelo dentro do contexto do aprendizado de máquinas e também para estabelecer as habilidades e limitações desta rede. Este trabalho apresenta uma investigação desse novo classificador e um estudo de sua capacidade de agrupar dados em três dimensões, particularmente procurando estabelecer seus domínios de aplicação e horizontes no campo da visão computacional.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work combines symbolic machine learning and multiscale fractal techniques to generate models that characterize cellular rejection in myocardial biopsies and that can base a diagnosis support system. The models express the knowledge by the features threshold, fractal dimension, lacunarity, number of clusters, spatial percolation and percolation probability, all obtained with myocardial biopsies processing. Models were evaluated and the most significant was the one generated by the C4.5 algorithm for the features spatial percolation and number of clusters. The result is relevant and contributes to the specialized literature since it determines a standard diagnosis protocol. © 2013 Springer.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Agronomia (Energia na Agricultura) - FCA

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A predição da resposta do tumor a radioterapia e a questão mais importante durante o tratamento de pacientes com câncer. Como consequência, a predição de genes que sejam responsivos a radiação ionizante e uma possibilidade para a melhoria dos resultados clínicos e a otimização das doses as quais os pacientes são submetidos ao longo do tratamento. Juntamente com esses dados, é possível obter respostas sobre os mecanismos de resistência a radiação dos tumores e até mesmo a identificação de biomarcadores responsáveis pela resistência a radiação ionizante que podem ser potenciais para o desenvolvimento de novas drogas visando a proteção de tecidos saudáveis. A determinação experimental dos genes que sejam responsivos à radiação ionizante é algo caro e que demanda muito tempo e trabalho; porém, se utilizarmos uma forma computacional de direcionar os estudos experimentais diretamente aos genes que têm mais potencial para serem responsivos à radiação ionizante, as pesquisas podem ser mais direcionadas e específicas. Para determinar essa característica, construímos, analisamos e determinamos os dados da topologia da rede integrada de interações moleculares entre genes humanos, contendo interações físicas entre proteínas, interações metabólicas e interações de regulação transcricional. Os dados topológicos foram utilizados como atributos de treinamento para o aprendizado de máquina, no qual os genes conhecidamente responsivos à radiação ionizante foram apresentados a um algoritmo de árvore de decisão que gerou modelos de predição com índices de sensibilidade e precisão de 5% e 72%, respectivamente. Os índices de acerto obtidos para os conjuntos de teste foram satisfatórios, retornando 91% dos genes conhecidos como responsiveis à radiação ionizante utilizados para o treinamento da árvore de decisão. Nós aplicamos o modelo de predição na rede integrada e atribuímos probabilidades ...