813 resultados para Supervised machine learning


Relevância:

80.00% 80.00%

Publicador:

Resumo:

A challenge for the clinical management of advanced Parkinson’s disease (PD) patients is the emergence of fluctuations in motor performance, which represents a significant source of disability during activities of daily living of the patients. There is a lack of objective measurement of treatment effects for in-clinic and at-home use that can provide an overview of the treatment response. The objective of this paper was to develop a method for objective quantification of advanced PD motor symptoms related to off episodes and peak dose dyskinesia, using spiral data gathered by a touch screen telemetry device. More specifically, the aim was to objectively characterize motor symptoms (bradykinesia and dyskinesia), to help in automating the process of visual interpretation of movement anomalies in spirals as rated by movement disorder specialists. Digitized upper limb movement data of 65 advanced PD patients and 10 healthy (HE) subjects were recorded as they performed spiral drawing tasks on a touch screen device in their home environment settings. Several spatiotemporal features were extracted from the time series and used as inputs to machine learning methods. The methods were validated against ratings on animated spirals scored by four movement disorder specialists who visually assessed a set of kinematic features and the motor symptom. The ability of the method to discriminate between PD patients and HE subjects and the test-retest reliability of the computed scores were also evaluated. Computed scores correlated well with mean visual ratings of individual kinematic features. The best performing classifier (Multilayer Perceptron) classified the motor symptom (bradykinesia or dyskinesia) with an accuracy of 84% and area under the receiver operating characteristics curve of 0.86 in relation to visual classifications of the raters. In addition, the method provided high discriminating power when distinguishing between PD patients and HE subjects as well as had good test-retest reliability. This study demonstrated the potential of using digital spiral analysis for objective quantification of PD-specific and/or treatment-induced motor symptoms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O objetivo deste trabalho é testar a aplicação de um modelo gráfico probabilístico, denominado genericamente de Redes Bayesianas, para desenvolver modelos computacionais que possam ser utilizados para auxiliar a compreensão de problemas e/ou na previsão de variáveis de natureza econômica. Com este propósito, escolheu-se um problema amplamente abordado na literatura e comparou-se os resultados teóricos e experimentais já consolidados com os obtidos utilizando a técnica proposta. Para tanto,foi construído um modelo para a classificação da tendência do "risco país" para o Brasil a partir de uma base de dados composta por variáveis macroeconômicas e financeiras. Como medida do risco adotou-se o EMBI+ (Emerging Markets Bond Index Plus), por ser um indicador amplamente utilizado pelo mercado.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modelos para detecção de fraude são utilizados para identificar se uma transação é legítima ou fraudulenta com base em informações cadastrais e transacionais. A técnica proposta no estudo apresentado, nesta dissertação, consiste na de Redes Bayesianas (RB); seus resultados foram comparados à técnica de Regressão Logística (RL), amplamente utilizada pelo mercado. As Redes Bayesianas avaliadas foram os classificadores bayesianos, com a estrutura Naive Bayes. As estruturas das redes bayesianas foram obtidas a partir de dados reais, fornecidos por uma instituição financeira. A base de dados foi separada em amostras de desenvolvimento e validação por cross validation com dez partições. Naive Bayes foram os classificadores escolhidos devido à simplicidade e a sua eficiência. O desempenho do modelo foi avaliado levando-se em conta a matriz de confusão e a área abaixo da curva ROC. As análises dos modelos revelaram desempenho, levemente, superior da regressão logística quando comparado aos classificadores bayesianos. A regressão logística foi escolhida como modelo mais adequado por ter apresentado melhor desempenho na previsão das operações fraudulentas, em relação à matriz de confusão. Baseada na área abaixo da curva ROC, a regressão logística demonstrou maior habilidade em discriminar as operações que estão sendo classificadas corretamente, daquelas que não estão.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A resistência a múltiplos fármacos é um grande problema na terapia anti-cancerígena, sendo a glicoproteína-P (P-gp) uma das responsáveis por esta resistência. A realização deste trabalho incidiu principalmente no desenvolvimento de modelos matemáticos/estatísticos e “químicos”. Para os modelos matemáticos/estatísticos utilizamos métodos de Machine Learning como o Support Vector Machine (SVM) e o Random Forest, (RF) em relação aos modelos químicos utilizou-se farmacóforos. Os métodos acima mencionados foram aplicados a diversas proteínas P-gp, p53 e complexo p53-MDM2, utilizando duas famílias: as pifitrinas para a p53 e flavonóides para P-gp e, em menor medida, um grupo diversificado de moléculas de diversas famílias químicas. Nos modelos obtidos pelo SVM quando aplicados à P-gp e à família dos flavonóides, obtivemos bons valores através do kernel Radial Basis Function (RBF), com precisão de conjunto de treino de 94% e especificidade de 96%. Quanto ao conjunto de teste com previsão de 70% e especificidade de 67%, sendo que o número de falsos negativos foi o mais baixo comparativamente aos restantes kernels. Aplicando o RF à família dos flavonóides verificou-se que o conjunto de treino apresenta 86% de precisão e uma especificidade de 90%, quanto ao conjunto de teste obtivemos uma previsão de 70% e uma especificidade de 60%, existindo a particularidade de o número de falsos negativos ser o mais baixo. Repetindo o procedimento anterior (RF) e utilizando um total de 63 descritores, os resultados apresentaram valores inferiores obtendo-se para o conjunto de treino 79% de precisão e 82% de especificidade. Aplicando o modelo ao conjunto de teste obteve-se 70% de previsão e 60% de especificidade. Comparando os dois métodos, escolhemos o método SVM com o kernel RBF como modelo que nos garante os melhores resultados de classificação. Aplicamos o método SVM à P-gp e a um conjunto de moléculas não flavonóides que são transportados pela P-gp, obteve-se bons valores através do kernel RBF, com precisão de conjunto de treino de 95% e especificidade de 93%. Quanto ao conjunto de teste, obtivemos uma previsão de 70% e uma especificidade de 69%, existindo a particularidade de o número de falsos negativos ser o mais baixo. Aplicou-se o método do farmacóforo a três alvos, sendo estes, um conjunto de inibidores flavonóides e de substratos não flavonóides para a P-gp, um grupo de piftrinas para a p53 e um conjunto diversificado de estruturas para a ligação da p53-MDM2. Em cada um dos quatro modelos de farmacóforos obtidos identificou-se três características, sendo que as características referentes ao anel aromático e ao dador de ligações de hidrogénio estão presentes em todos os modelos obtidos. Realizando o rastreio em diversas bases de dados utilizando os modelos, obtivemos hits com uma grande diversidade estrutural.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis presents a low cost non-intrusive home energy monitor built on top of Non-Intrusive Load Monitoring (NILM) concepts and techniques. NILM solutions are already considered low cost alternatives to the big majority of existing commercial energy monitors but the goal here is to make its cost even lower by using a mini netbook as a whole in one solution. The mini netbook is installed in the homes main circuit breaker and computes power consumption by reading current and voltage from the built-in sound card. At the same time, feedback to the users is provided using the 11’’ LCD screen as well as other built-in I/O modules. Our meter is also capable of detecting changes in power and tries to find out which appliance lead to that change and it is being used as part of an eco-feedback platform that was build to study the long terms of energy eco-feedback in individuals. In this thesis the steps that were taken to come up with such a system are presented, from the basics of AC power measurements to the implementation of an event detector and classifier that was used to disaggregate the power load. In the last chapter results from some validation tests that have been performed are presented in order to validate the experiment. It is believed that such a system will not only be important as an energy monitor, but also as an open system than can be easily changed to accommodate and test new or existing nonintrusive load monitoring techniques.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O presente trabalho teve como objetivo determinar quais variáveis dimensionais da folha são mais adequadas para utilização na estimativa da área foliar do antúrio (Anthurium andraeanum), cv. Apalai, por meio de equação de regressão linear, e comparar o desempenho de diferentes funções de regressão obtidas com o uso de aprendizado de máquina (AM). A variável que melhor estimou a área foliar foi o produto das dimensões lineares (comprimento e largura), CxL, sendo a equação proposta Af = 0.9672 *C x L, com coeficiente de determinação (R²) de 0,99. Verificou-se, também, com o uso de AM, que as funções lineares são mais adequadas para a estimação da área foliar dessa espécie vegetal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The industrial automation is directly linked to the development of information tecnology. Better hardware solutions, as well as improvements in software development methodologies make possible the rapid growth of the productive process control. In this thesis, we propose an architecture that will allow the joining of two technologies in hardware (industrial network) and software field (multiagent systems). The objective of this proposal is to join those technologies in a multiagent architecture to allow control strategies implementations in to field devices. With this, we intend develop an agents architecture to detect and solve problems which may occur in the industrial network environment. Our work ally machine learning with industrial context, become proposed multiagent architecture adaptable to unfamiliar or unexpected production environment. We used neural networks and presented an allocation strategies of these networks in industrial network field devices. With this we intend to improve decision support at plant level and allow operations human intervention independent

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Support Vector Machines (SVM) has attracted increasing attention in machine learning area, particularly on classification and patterns recognition. However, in some cases it is not easy to determinate accurately the class which given pattern belongs. This thesis involves the construction of a intervalar pattern classifier using SVM in association with intervalar theory, in order to model the separation of a pattern set between distinct classes with precision, aiming to obtain an optimized separation capable to treat imprecisions contained in the initial data and generated during the computational processing. The SVM is a linear machine. In order to allow it to solve real-world problems (usually nonlinear problems), it is necessary to treat the pattern set, know as input set, transforming from nonlinear nature to linear problem. The kernel machines are responsible to do this mapping. To create the intervalar extension of SVM, both for linear and nonlinear problems, it was necessary define intervalar kernel and the Mercer s theorem (which caracterize a kernel function) to intervalar function

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Equipment maintenance is the major cost factor in industrial plants, it is very important the development of fault predict techniques. Three-phase induction motors are key electrical equipments used in industrial applications mainly because presents low cost and large robustness, however, it isn t protected from other fault types such as shorted winding and broken bars. Several acquisition ways, processing and signal analysis are applied to improve its diagnosis. More efficient techniques use current sensors and its signature analysis. In this dissertation, starting of these sensors, it is to make signal analysis through Park s vector that provides a good visualization capability. Faults data acquisition is an arduous task; in this way, it is developed a methodology for data base construction. Park s transformer is applied into stationary reference for machine modeling of the machine s differential equations solution. Faults detection needs a detailed analysis of variables and its influences that becomes the diagnosis more complex. The tasks of pattern recognition allow that systems are automatically generated, based in patterns and data concepts, in the majority cases undetectable for specialists, helping decision tasks. Classifiers algorithms with diverse learning paradigms: k-Neighborhood, Neural Networks, Decision Trees and Naïves Bayes are used to patterns recognition of machines faults. Multi-classifier systems are used to improve classification errors. It inspected the algorithms homogeneous: Bagging and Boosting and heterogeneous: Vote, Stacking and Stacking C. Results present the effectiveness of constructed model to faults modeling, such as the possibility of using multi-classifiers algorithm on faults classification

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents an evaluative study about the effects of using a machine learning technique on the main features of a self-organizing and multiobjective genetic algorithm (GA). A typical GA can be seen as a search technique which is usually applied in problems involving no polynomial complexity. Originally, these algorithms were designed to create methods that seek acceptable solutions to problems where the global optimum is inaccessible or difficult to obtain. At first, the GAs considered only one evaluation function and a single objective optimization. Today, however, implementations that consider several optimization objectives simultaneously (multiobjective algorithms) are common, besides allowing the change of many components of the algorithm dynamically (self-organizing algorithms). At the same time, they are also common combinations of GAs with machine learning techniques to improve some of its characteristics of performance and use. In this work, a GA with a machine learning technique was analyzed and applied in a antenna design. We used a variant of bicubic interpolation technique, called 2D Spline, as machine learning technique to estimate the behavior of a dynamic fitness function, based on the knowledge obtained from a set of laboratory experiments. This fitness function is also called evaluation function and, it is responsible for determining the fitness degree of a candidate solution (individual), in relation to others in the same population. The algorithm can be applied in many areas, including in the field of telecommunications, as projects of antennas and frequency selective surfaces. In this particular work, the presented algorithm was developed to optimize the design of a microstrip antenna, usually used in wireless communication systems for application in Ultra-Wideband (UWB). The algorithm allowed the optimization of two variables of geometry antenna - the length (Ls) and width (Ws) a slit in the ground plane with respect to three objectives: radiated signal bandwidth, return loss and central frequency deviation. These two dimensions (Ws and Ls) are used as variables in three different interpolation functions, one Spline for each optimization objective, to compose a multiobjective and aggregate fitness function. The final result proposed by the algorithm was compared with the simulation program result and the measured result of a physical prototype of the antenna built in the laboratory. In the present study, the algorithm was analyzed with respect to their success degree in relation to four important characteristics of a self-organizing multiobjective GA: performance, flexibility, scalability and accuracy. At the end of the study, it was observed a time increase in algorithm execution in comparison to a common GA, due to the time required for the machine learning process. On the plus side, we notice a sensitive gain with respect to flexibility and accuracy of results, and a prosperous path that indicates directions to the algorithm to allow the optimization problems with "η" variables