778 resultados para machine learning modelli lineari missing data biomarcatori
Resumo:
Electroencephalographic (EEG) signals of the human brains represent electrical activities for a number of channels recorded over a the scalp. The main purpose of this thesis is to investigate the interactions and causality of different parts of a brain using EEG signals recorded during a performance subjects of verbal fluency tasks. Subjects who have Parkinson's Disease (PD) have difficulties with mental tasks, such as switching between one behavior task and another. The behavior tasks include phonemic fluency, semantic fluency, category semantic fluency and reading fluency. This method uses verbal generation skills, activating different Broca's areas of the Brodmann's areas (BA44 and BA45). Advanced signal processing techniques are used in order to determine the activated frequency bands in the granger causality for verbal fluency tasks. The graph learning technique for channel strength is used to characterize the complex graph of Granger causality. Also, the support vector machine (SVM) method is used for training a classifier between two subjects with PD and two healthy controls. Neural data from the study was recorded at the Colorado Neurological Institute (CNI). The study reveals significant difference between PD subjects and healthy controls in terms of brain connectivities in the Broca's Area BA44 and BA45 corresponding to EEG electrodes. The results in this thesis also demonstrate the possibility to classify based on the flow of information and causality in the brain of verbal fluency tasks. These methods have the potential to be applied in the future to identify pathological information flow and causality of neurological diseases.
Resumo:
Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
The extension to new languages is a well known bottleneck for rule-based systems. Considerable human effort, which typically consists in re-writing from scratch huge amounts of rules, is in fact required to transfer the knowledge available to the system from one language to a new one. Provided sufficient annotated data, machine learning algorithms allow to minimize the costs of such knowledge transfer but, up to date, proved to be ineffective for some specific tasks. Among these, the recognition and normalization of temporal expressions still remains out of their reach. Focusing on this task, and still adhering to the rule-based framework, this paper presents a bunch of experiments on the automatic porting to Italian of a system originally developed for Spanish. Different automatic rule translation strategies are evaluated and discussed, providing a comprehensive overview of the challenge.
Resumo:
EmotiBlog is a corpus labelled with the homonymous annotation schema designed for detecting subjectivity in the new textual genres. Preliminary research demonstrated its relevance as a Machine Learning resource to detect opinionated data. In this paper we compare EmotiBlog with the JRC corpus in order to check the EmotiBlog robustness of annotation. For this research we concentrate on its coarse-grained labels. We carry out a deep ML experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
El campo de procesamiento de lenguaje natural (PLN), ha tenido un gran crecimiento en los últimos años; sus áreas de investigación incluyen: recuperación y extracción de información, minería de datos, traducción automática, sistemas de búsquedas de respuestas, generación de resúmenes automáticos, análisis de sentimientos, entre otras. En este artículo se presentan conceptos y algunas herramientas con el fin de contribuir al entendimiento del procesamiento de texto con técnicas de PLN, con el propósito de extraer información relevante que pueda ser usada en un gran rango de aplicaciones. Se pueden desarrollar clasificadores automáticos que permitan categorizar documentos y recomendar etiquetas; estos clasificadores deben ser independientes de la plataforma, fácilmente personalizables para poder ser integrados en diferentes proyectos y que sean capaces de aprender a partir de ejemplos. En el presente artículo se introducen estos algoritmos de clasificación, se analizan algunas herramientas de código abierto disponibles actualmente para llevar a cabo estas tareas y se comparan diversas implementaciones utilizando la métrica F en la evaluación de los clasificadores.
Resumo:
Paper submitted to MML 2013, 6th International Workshop on Machine Learning and Music, Prague, September 23, 2013.
Resumo:
Este artículo presenta la aplicación y resultados obtenidos de la investigación en técnicas de procesamiento de lenguaje natural y tecnología semántica en Brand Rain y Anpro21. Se exponen todos los proyectos relacionados con las temáticas antes mencionadas y se presenta la aplicación y ventajas de la transferencia de la investigación y nuevas tecnologías desarrolladas a la herramienta de monitorización y cálculo de reputación Brand Rain.
Resumo:
La incorporación del EEES provocó una infinidad de desafíos y retos a las Universidades que a día de hoy aún están siendo solucionados. Además, ha conllevado nuevas oportunidades para la formación de estudiantes pero también para las Universidades. Entre ellas, la formación interuniversitaria entre estados miembro de la UE. El EEES permite unificar a través del sistema ECTS la carga de trabajo de los estudiantes facilitando la propuesta de planes de estudios interuniversitarios. Sin embargo, surgen desafíos a la hora de llevarlos a la práctica. Independientemente de los retos en la propuesta de los planes de estudio, es necesario implementar procesos de enseñanza-aprendizaje que salven la distancia en el espacio físico entre el alumnado y el profesorado. En este artículo se presenta la experiencia docente de la asignatura e-home del Máster Machine Learning and Data Mining de la Universidad de Alicante y la Universidad Jean Monnet (Francia). En este caso, se combina la formación en aula presencial con formación en aula virtual a través de videoconferencia. La evaluación del método de enseñanza-aprendizaje propuesto utiliza la propia experiencia docente y encuestas realizadas a los alumnos para poner de manifiesto la ruptura de barreras espaciales y un éxito a nivel docente.
Resumo:
Tema 6. Text Mining con Topic Modeling.
Resumo:
Inspirados por las estrategias de detección precoz aplicadas en medicina, proponemos el diseño y construcción de un sistema de predicción que permita detectar los problemas de aprendizaje de los estudiantes de forma temprana. Partimos de un sistema gamificado para el aprendizaje de Lógica Computacional, del que se recolectan masivamente datos de uso y, sobre todo, resultados de aprendizaje de los estudiantes en la resolución de problemas. Todos estos datos se analizan utilizando técnicas de Machine Learning que ofrecen, como resultado, una predicción del rendimiento de cada alumno. La información se presenta semanalmente en forma de un gráfico de progresión, de fácil interpretación pero con información muy valiosa. El sistema resultante tiene un alto grado de automatización, es progresivo, ofrece resultados desde el principio del curso con predicciones cada vez más precisas, utiliza resultados de aprendizaje y no solo datos de uso, permite evaluar y hacer predicciones sobre las competencias y habilidades adquiridas y contribuye a una evaluación realmente formativa. En definitiva, permite a los profesores guiar a los estudiantes en una mejora de su rendimiento desde etapas muy tempranas, pudiendo reconducir a tiempo los posibles fracasos y motivando a los estudiantes.
Resumo:
Background and objective: In this paper, we have tested the suitability of using different artificial intelligence-based algorithms for decision support when classifying the risk of congenital heart surgery. In this sense, classification of those surgical risks provides enormous benefits as the a priori estimation of surgical outcomes depending on either the type of disease or the type of repair, and other elements that influence the final result. This preventive estimation may help to avoid future complications, or even death. Methods: We have evaluated four machine learning algorithms to achieve our objective: multilayer perceptron, self-organizing map, radial basis function networks and decision trees. The architectures implemented have the aim of classifying among three types of surgical risk: low complexity, medium complexity and high complexity. Results: Accuracy outcomes achieved range between 80% and 99%, being the multilayer perceptron method the one that offered a higher hit ratio. Conclusions: According to the results, it is feasible to develop a clinical decision support system using the evaluated algorithms. Such system would help cardiology specialists, paediatricians and surgeons to forecast the level of risk related to a congenital heart disease surgery.
Resumo:
With the quick advance of web service technologies, end-users can conduct various on-line tasks, such as shopping on-line. Usually, end-users compose a set of services to accomplish a task, and need to enter values to services to invoke the composite services. Quite often, users re-visit websites and use services to perform re-occurring tasks. The users are required to enter the same information into various web services to accomplish such re-occurring tasks. However, repetitively typing the same information into services is a tedious job for end-users. It can negatively impact user experience when an end-user needs to type the re-occurring information repetitively into web services. Recent studies have proposed several approaches to help users fill in values to services automatically. However, prior studies mainly suffer the following drawbacks: (1) limited support of collecting and analyzing user inputs; (2) poor accuracy of filling values to services; (3) not designed for service composition. To overcome the aforementioned drawbacks, we need maximize the reuse of previous user inputs across services and end-users. In this thesis, we introduce our approaches that prevent end-users from entering the same information into repetitive on-line tasks. More specifically, we improve the process of filling out services in the following 4 aspects: First, we investigate the characteristics of input parameters. We propose an ontology-based approach to automatically categorize parameters and fill values to the categorized input parameters. Second, we propose a comprehensive framework that leverages user contexts and usage patterns into the process of filling values to services. Third, we propose an approach for maximizing the value propagation among services and end-users by linking a set of semantically related parameters together and similar end-users. Last, we propose a ranking-based framework that ranks a list of previous user inputs for an input parameter to save a user from unnecessary data entries. Our framework learns and analyzes interactions of user inputs and input parameters to rank user inputs for input parameters under different contexts.
Resumo:
Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.
Resumo:
Tese de mestrado, Bioinformática e Biologia Computacional (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2016
Resumo:
L’évolution continue des besoins d’apprentissage vers plus d’efficacité et plus de personnalisation a favorisé l’émergence de nouveaux outils et dimensions dont l’objectif est de rendre l’apprentissage accessible à tout le monde et adapté aux contextes technologiques et sociaux. Cette évolution a donné naissance à ce que l’on appelle l'apprentissage social en ligne mettant l'accent sur l’interaction entre les apprenants. La considération de l’interaction a apporté de nombreux avantages pour l’apprenant, à savoir établir des connexions, échanger des expériences personnelles et bénéficier d’une assistance lui permettant d’améliorer son apprentissage. Cependant, la quantité d'informations personnelles que les apprenants divulguent parfois lors de ces interactions, mène, à des conséquences souvent désastreuses en matière de vie privée comme la cyberintimidation, le vol d’identité, etc. Malgré les préoccupations soulevées, la vie privée en tant que droit individuel représente une situation idéale, difficilement reconnaissable dans le contexte social d’aujourd’hui. En effet, on est passé d'une conceptualisation de la vie privée comme étant un noyau des données sensibles à protéger des pénétrations extérieures à une nouvelle vision centrée sur la négociation de la divulgation de ces données. L’enjeu pour les environnements sociaux d’apprentissage consiste donc à garantir un niveau maximal d’interaction pour les apprenants tout en préservant leurs vies privées. Au meilleur de nos connaissances, la plupart des innovations dans ces environnements ont porté sur l'élaboration des techniques d’interaction, sans aucune considération pour la vie privée, un élément portant nécessaire afin de créer un environnement favorable à l’apprentissage. Dans ce travail, nous proposons un cadre de vie privée que nous avons appelé « gestionnaire de vie privée». Plus précisément, ce gestionnaire se charge de gérer la protection des données personnelles et de la vie privée de l’apprenant durant ses interactions avec ses co-apprenants. En s’appuyant sur l’idée que l’interaction permet d’accéder à l’aide en ligne, nous analysons l’interaction comme une activité cognitive impliquant des facteurs contextuels, d’autres apprenants, et des aspects socio-émotionnels. L'objectif principal de cette thèse est donc de revoir les processus d’entraide entre les apprenants en mettant en oeuvre des outils nécessaires pour trouver un compromis entre l’interaction et la protection de la vie privée. ii Ceci a été effectué selon trois niveaux : le premier étant de considérer des aspects contextuels et sociaux de l’interaction telle que la confiance entre les apprenants et les émotions qui ont initié le besoin d’interagir. Le deuxième niveau de protection consiste à estimer les risques de cette divulgation et faciliter la décision de protection de la vie privée. Le troisième niveau de protection consiste à détecter toute divulgation de données personnelles en utilisant des techniques d’apprentissage machine et d’analyse sémantique.