797 resultados para Machine Learning Algorithms
Resumo:
Dysfunction of Autonomic Nervous System (ANS) is a typical feature of chronic heart failure and other cardiovascular disease. As a simple non-invasive technology, heart rate variability (HRV) analysis provides reliable information on autonomic modulation of heart rate. The aim of this thesis was to research and develop automatic methods based on ANS assessment for evaluation of risk in cardiac patients. Several features selection and machine learning algorithms have been combined to achieve the goals. Automatic assessment of disease severity in Congestive Heart Failure (CHF) patients: a completely automatic method, based on long-term HRV was proposed in order to automatically assess the severity of CHF, achieving a sensitivity rate of 93% and a specificity rate of 64% in discriminating severe versus mild patients. Automatic identification of hypertensive patients at high risk of vascular events: a completely automatic system was proposed in order to identify hypertensive patients at higher risk to develop vascular events in the 12 months following the electrocardiographic recordings, achieving a sensitivity rate of 71% and a specificity rate of 86% in identifying high-risk subjects among hypertensive patients. Automatic identification of hypertensive patients with history of fall: it was explored whether an automatic identification of fallers among hypertensive patients based on HRV was feasible. The results obtained in this thesis could have implications both in clinical practice and in clinical research. The system has been designed and developed in order to be clinically feasible. Moreover, since 5-minute ECG recording is inexpensive, easy to assess, and non-invasive, future research will focus on the clinical applicability of the system as a screening tool in non-specialized ambulatories, in order to identify high-risk patients to be shortlisted for more complex investigations.
Resumo:
Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers.
Resumo:
This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori.
Resumo:
Automatic blood glucose classification may help specialists to provide a better interpretation of blood glucose data, downloaded directly from patients glucose meter and will contribute in the development of decision support systems for gestational diabetes. This paper presents an automatic blood glucose classifier for gestational diabetes that compares 6 different feature selection methods for two machine learning algorithms: neural networks and decision trees. Three searching algorithms, Greedy, Best First and Genetic, were combined with two different evaluators, CSF and Wrapper, for the feature selection. The study has been made with 6080 blood glucose measurements from 25 patients. Decision trees with a feature set selected with the Wrapper evaluator and the Best first search algorithm obtained the best accuracy: 95.92%.
Resumo:
Durante los últimos años ha aumentado la presencia de personas pertenecientes al mundo de la política en la red debido a la proliferación de las redes sociales, siendo Twitter la que mayor repercusión mediática tiene en este ámbito. El estudio del comportamiento de los políticos en Twitter y de la acogida que tienen entre los ciudadanos proporciona información muy valiosa a la hora de analizar las campañas electorales. De esta forma, se puede estudiar la repercusión real que tienen sus mensajes en los resultados electorales, así como distinguir aquellos comportamientos que tienen una mayor aceptación por parte de la la ciudadaná. Gracias a los avances desarrollados en el campo de la minería de textos, se poseen las herramientas necesarias para analizar un gran volumen de textos y extraer de ellos información de utilidad. Este proyecto tiene como finalidad recopilar una muestra significativa de mensajes de Twitter pertenecientes a los candidatos de los principales partidos políticos que se presentan a las elecciones autonómicas de Madrid en 2015. Estos mensajes, junto con las respuestas de otros usuarios, se han analizado usando algoritmos de aprendizaje automático y aplicando las técnicas de minería de textos más oportunas. Los resultados obtenidos para cada político se han examinado en profundidad y se han presentado mediante tablas y gráficas para facilitar su comprensión.---ABSTRACT---During the past few years the presence on the Internet of people related with politics has increased, due to the proliferation of social networks. Among all existing social networks, Twitter is the one which has the greatest media impact in this field. Therefore, an analysis of the behaviour of politicians in this social network, along with the response from the citizens, gives us very valuable information when analysing electoral campaigns. This way it is possible to know their messages impact in the election results. Moreover, it can be inferred which behaviours have better acceptance among the citizenship. Thanks to the advances achieved in the text mining field, its tools can be used to analyse a great amount of texts and extract from them useful information. The present project aims to collect a significant sample of Twitter messages from the candidates of the principal political parties for the 2015 autonomic elections in Madrid. These messages, as well as the answers received by the other users, have been analysed using machine learning algorithms and applying the most suitable data mining techniques. The results obtained for each politician have been examined in depth and have been presented using tables and graphs to make its understanding easier.
Resumo:
The extension to new languages is a well known bottleneck for rule-based systems. Considerable human effort, which typically consists in re-writing from scratch huge amounts of rules, is in fact required to transfer the knowledge available to the system from one language to a new one. Provided sufficient annotated data, machine learning algorithms allow to minimize the costs of such knowledge transfer but, up to date, proved to be ineffective for some specific tasks. Among these, the recognition and normalization of temporal expressions still remains out of their reach. Focusing on this task, and still adhering to the rule-based framework, this paper presents a bunch of experiments on the automatic porting to Italian of a system originally developed for Spanish. Different automatic rule translation strategies are evaluated and discussed, providing a comprehensive overview of the challenge.
Resumo:
Paper submitted to MML 2013, 6th International Workshop on Machine Learning and Music, Prague, September 23, 2013.
Resumo:
We present an application of Mathematical Morphology (MM) for the classification of astronomical objects, both for star/galaxy differentiation and galaxy morphology classification. We demonstrate that, for CCD images, 99.3 +/- 3.8% of galaxies can be separated from stars using MM, with 19.4 +/- 7.9% of the stars being misclassified. We demonstrate that, for photographic plate images, the number of galaxies correctly separated from the stars can be increased using our MM diffraction spike tool, which allows 51.0 +/- 6.0% of the high-brightness galaxies that are inseparable in current techniques to be correctly classified, with only 1.4 +/- 0.5% of the high-brightness stars contaminating the population. We demonstrate that elliptical (E) and late-type spiral (Sc-Sd) galaxies can be classified using MM with an accuracy of 91.4 +/- 7.8%. It is a method involving fewer 'free parameters' than current techniques, especially automated machine learning algorithms. The limitation of MM galaxy morphology classification based on seeing and distance is also presented. We examine various star/galaxy differentiation and galaxy morphology classification techniques commonly used today, and show that our MM techniques compare very favourably.
Resumo:
Prediction of peroxisomal matrix proteins generally depends on the presence of one of two distinct motifs at the end of the amino acid sequence. PTS1 peroxisomal proteins have a well conserved tripeptide at the C-terminal end. However, the preceding residues in the sequence arguably play a crucial role in targeting the protein to the peroxisome. Previous work in applying machine learning to the prediction of peroxisomal matrix proteins has failed W capitalize on the full extent of these dependencies. We benchmark a range of machine learning algorithms, and show that a classifier - based on the Support Vector Machine - produces more accurate results when dependencies between the conserved motif and the preceding section are exploited. We publish an updated and rigorously curated data set that results in increased prediction accuracy of most tested models.
Resumo:
Traditionally, machine learning algorithms have been evaluated in applications where assumptions can be reliably made about class priors and/or misclassification costs. In this paper, we consider the case of imprecise environments, where little may be known about these factors and they may well vary significantly when the system is applied. Specifically, the use of precision-recall analysis is investigated and compared to the more well known performance measures such as error-rate and the receiver operating characteristic (ROC). We argue that while ROC analysis is invariant to variations in class priors, this invariance in fact hides an important factor of the evaluation in imprecise environments. Therefore, we develop a generalised precision-recall analysis methodology in which variation due to prior class probabilities is incorporated into a multi-way analysis of variance (ANOVA). The increased sensitivity and reliability of this approach is demonstrated in a remote sensing application.
Resumo:
The blood types determination is essential to perform safe blood transfusions. In emergency situations isadministrated the “universal donor” blood type. However, sometimes, this blood type can cause incom-patibilities in the transfusion receptor. A mechatronic prototype was developed to solve this problem.The prototype was built to meet specific goals, incorporating all the necessary components. The obtainedsolution is close to the final system that will be produced later, at industrial scale, as a medical device.The prototype is a portable and low cost device, and can be used in remote locations. A computer appli-cation, previously developed is used to operate with the developed mechatronic prototype, and obtainautomatically test results. It allows image acquisition, processing and analysis, based on Computer Visionalgorithms, Machine Learning algorithms and deterministic algorithms. The Machine Learning algorithmsenable the classification of occurrence, or alack of agglutination in the mixture (blood/reagents), and amore reliable and a safer methodology as test data are stored in a database. The work developed allowsthe administration of a compatible blood type in emergency situations, avoiding the discontinuity of the“universal donor” blood type stocks, and reducing the occurrence of human errors in the transfusion practice.
Resumo:
This thesis introduces a flexible visual data exploration framework which combines advanced projection algorithms from the machine learning domain with visual representation techniques developed in the information visualisation domain to help a user to explore and understand effectively large multi-dimensional datasets. The advantage of such a framework to other techniques currently available to the domain experts is that the user is directly involved in the data mining process and advanced machine learning algorithms are employed for better projection. A hierarchical visualisation model guided by a domain expert allows them to obtain an informed segmentation of the input space. Two other components of this thesis exploit properties of these principled probabilistic projection algorithms to develop a guided mixture of local experts algorithm which provides robust prediction and a model to estimate feature saliency simultaneously with the training of a projection algorithm.Local models are useful since a single global model cannot capture the full variability of a heterogeneous data space such as the chemical space. Probabilistic hierarchical visualisation techniques provide an effective soft segmentation of an input space by a visualisation hierarchy whose leaf nodes represent different regions of the input space. We use this soft segmentation to develop a guided mixture of local experts (GME) algorithm which is appropriate for the heterogeneous datasets found in chemoinformatics problems. Moreover, in this approach the domain experts are more involved in the model development process which is suitable for an intuition and domain knowledge driven task such as drug discovery. We also derive a generative topographic mapping (GTM) based data visualisation approach which estimates feature saliency simultaneously with the training of a visualisation model.
Resumo:
GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.
Resumo:
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.