911 resultados para Machine Learning,Natural Language Processing,Descriptive Text Mining,POIROT,Transformer


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Durante los últimos años ha aumentado la presencia de personas pertenecientes al mundo de la política en la red debido a la proliferación de las redes sociales, siendo Twitter la que mayor repercusión mediática tiene en este ámbito. El estudio del comportamiento de los políticos en Twitter y de la acogida que tienen entre los ciudadanos proporciona información muy valiosa a la hora de analizar las campañas electorales. De esta forma, se puede estudiar la repercusión real que tienen sus mensajes en los resultados electorales, así como distinguir aquellos comportamientos que tienen una mayor aceptación por parte de la la ciudadaná. Gracias a los avances desarrollados en el campo de la minería de textos, se poseen las herramientas necesarias para analizar un gran volumen de textos y extraer de ellos información de utilidad. Este proyecto tiene como finalidad recopilar una muestra significativa de mensajes de Twitter pertenecientes a los candidatos de los principales partidos políticos que se presentan a las elecciones autonómicas de Madrid en 2015. Estos mensajes, junto con las respuestas de otros usuarios, se han analizado usando algoritmos de aprendizaje automático y aplicando las técnicas de minería de textos más oportunas. Los resultados obtenidos para cada político se han examinado en profundidad y se han presentado mediante tablas y gráficas para facilitar su comprensión.---ABSTRACT---During the past few years the presence on the Internet of people related with politics has increased, due to the proliferation of social networks. Among all existing social networks, Twitter is the one which has the greatest media impact in this field. Therefore, an analysis of the behaviour of politicians in this social network, along with the response from the citizens, gives us very valuable information when analysing electoral campaigns. This way it is possible to know their messages impact in the election results. Moreover, it can be inferred which behaviours have better acceptance among the citizenship. Thanks to the advances achieved in the text mining field, its tools can be used to analyse a great amount of texts and extract from them useful information. The present project aims to collect a significant sample of Twitter messages from the candidates of the principal political parties for the 2015 autonomic elections in Madrid. These messages, as well as the answers received by the other users, have been analysed using machine learning algorithms and applying the most suitable data mining techniques. The results obtained for each politician have been examined in depth and have been presented using tables and graphs to make its understanding easier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La forma de consumir contenidos en Internet ha cambiado durante los últimos años. Inicialmente se empleaban webs estáticas y con contenidos pobres visualmente. Con la evolución de las redes de comunicación, esta tendencia ha variado. A día de hoy, deseamos páginas agradables, accesibles y que nos presenten temas variados. Todo esto ha cambiado la forma de crear páginas web y en todos los casos se persigue el objetivo de atraer a los usuarios. El gran auge de los smartphones y las aplicaciones móviles que invaden el mercado actual han revolucionado el mundo del estudio de los idiomas permitiendo compatibilizar los recursos punteros con el aprendizaje tradicional. La popularidad de los dispositivos móviles y de las aplicaciones ha sido el principal motivo de la realización de este proyecto. En él se realizará un análisis de las diferentes tecnologías existentes y se elegirá la mejor opción que se ajuste a nuestras necesidades para poder desarrollar un sistema que implemente el enfoque llamado Mobile Assisted Language Learning (MALL) que supone una aproximación innovadora al aprendizaje de idiomas con la ayuda de un dispositivo móvil. En este documento se va a ofrecer una panorámica general acerca del desarrollo de aplicaciones para dispositivos móviles en el entorno del e-learning. Se estudiarán características técnicas de diferentes plataformas seleccionando la mejor opción para la implementación de un sistema que proporcione los contenidos básicos para el aprendizaje de un idioma, en este caso del inglés, de forma intuitiva y divertida. Dicho sistema permitirá al usuario mejorar su nivel de inglés mediante una interfaz web de forma dinámica y cercana empleando los recursos que ofrecen los dispositivos móviles y haciendo uso del diseño adaptativo. Este proyecto está pensado para los usuarios que dispongan de poco tiempo libre para realizar un curso de forma presencial o, mejor aún, para reforzar o repasar contenidos ya aprendidos por otros medios más tradicionales o no. La aplicación ofrece la posibilidad de que se haga uso del sistema de forma fácil y sencilla desde cualquier dispositivo móvil del que se disponga como es un smartphone, tablet o un ordenador personal, compitiendo con otros usuarios o contra uno mismo y mejorando así el nivel de partida a través de las actividades propuestas. Durante el proyecto se han comparado diversas soluciones, la mayoría de código abierto y de libre distribución que permiten desplegar servicios de almacenamiento accesibles mediante Internet. Se concluirá con un caso práctico analizando los requisitos técnicos y llevando a cabo las fases de análisis, diseño, creación de la base de datos, implementación y pruebas dentro del ciclo de vida del software. Finalmente, se migrará la aplicación con toda la información a un servidor en la nube. ABSTRACT. The way of consuming content on the Internet has changed over the past years. Initially, static websites were used with poor visual contents. Nevertheless, with the evolution of communication networks this trend has changed. Nowadays, we expect pleasant, accessible and varied topic pages and such expectations have changed the way to create web pages generally aiming at appealing and therefore, attracting users. The great boom of smartphones and mobile applications in the current market, have revolutionized the world of language learning as they make it possible to combine computing with traditional learning resources. The popularity of mobile devices and applications has been the main reason for the development of this project. Here, the different existing technologies will be examined and we will try to select the best option that adapts to our needs in order to develop a system that implements Mobile Assisted Language Learning (MALL) that in broad terms implies an approach to language learning with the help of a mobile device. This report provides an overview of the development of applications for mobile devices in the e-learning environment. We will study the technical characteristics of different platforms and we will select the best option for the implementation of a system that provide the basic content for learning a language, in this case English, by means of an intuitive and fun method. This system will allow the user to improve their level of English with a web interface in a dynamic and close way employing the resources offered by mobile devices using the adaptive design. This project is intended for users who do not have enough free time to make a classroom course or to review contents from more traditional courses as it offers the possibility to make use of the system quickly and easily from any mobile device available such as a smartphone, a tablet or a personal computer, competing with other users or against oneself and thus improving their departing level through different activities. During the project, different solutions have been compared. Most of them, open source and free distribution that allow to deploy storage services accessible via the Internet. It will conclude with a case study analyzing the technical requirements and conducting phases of analysis, design and creation of a database, implementation and testing in the software lifecycle. Finally, the application will be migrated with all the information to a server in the cloud.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article reviews attempts to characterize the mental operations mediated by left inferior prefrontal cortex, especially the anterior and inferior portion of the gyrus, with the functional neuroimaging techniques of positron emission tomography and functional magnetic resonance imaging. Activations in this region occur during semantic, relative to nonsemantic, tasks for the generation of words to semantic cues or the classification of words or pictures into semantic categories. This activation appears in the right prefrontal cortex of people known to be atypically right-hemisphere dominant for language. In this region, activations are associated with meaningful encoding that leads to superior explicit memory for stimuli and deactivations with implicit semantic memory (repetition priming) for words and pictures. New findings are reported showing that patients with global amnesia show deactivations in the same region associated with repetition priming, that activation in this region reflects selection of a response from among numerous relative to few alternatives, and that activations in a portion of this region are associated specifically with semantic relative to phonological processing. It is hypothesized that activations in left inferior prefrontal cortex reflect a domain-specific semantic working memory capacity that is invoked more for semantic than nonsemantic analyses regardless of stimulus modality, more for initial than for repeated semantic analysis of a word or picture, more when a response must be selected from among many than few legitimate alternatives, and that yields superior later explicit memory for experiences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Em virtude de uma elevada expectativa de vida mundial, faz-se crescente a probabilidade de ocorrer acidentes naturais e traumas físicos no cotidiano, o que ocasiona um aumento na demanda por reabilitação. A terapia física, sob o paradigma da reabilitação robótica com serious games, oferece maior motivação e engajamento do paciente ao tratamento, cujo emprego foi recomendado pela American Heart Association (AHA), apontando a mais alta avaliação (Level A) para pacientes internados e ambulatoriais. No entanto, o potencial de análise dos dados coletados pelos dispositivos robóticos envolvidos é pouco explorado, deixando de extrair informações que podem ser de grande valia para os tratamentos. O foco deste trabalho consiste na aplicação de técnicas para descoberta de conhecimento, classificando o desempenho de pacientes diagnosticados com hemiparesia crônica. Os pacientes foram inseridos em um ambiente de reabilitação robótica, fazendo uso do InMotion ARM, um dispositivo robótico para reabilitação de membros superiores e coleta dos dados de desempenho. Foi aplicado sobre os dados um roteiro para descoberta de conhecimento em bases de dados, desempenhando pré-processamento, transformação (extração de características) e então a mineração de dados a partir de algoritmos de aprendizado de máquina. A estratégia do presente trabalho culminou em uma classificação de padrões com a capacidade de distinguir lados hemiparéticos sob uma precisão de 94%, havendo oito atributos alimentando a entrada do mecanismo obtido. Interpretando esta coleção de atributos, foi observado que dados de força são mais significativos, os quais abrangem metade da composição de uma amostra.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A anotação geográfica de documentos consiste na adoção de metadados para a identificação de nomes de locais e a posição de suas ocorrências no texto. Esta informação é útil, por exemplo, para mecanismos de busca. A partir dos topônimos mencionados no texto é possível identificar o contexto espacial em que o assunto do texto está inserido, o que permite agrupar documentos que se refiram a um mesmo contexto, atribuindo ao documento um escopo geográfico. Esta Dissertação de Mestrado apresenta um novo método, batizado de Geofier, para determinação do escopo geográfico de documentos. A novidade apresentada pelo Geofier é a possibilidade da identificação do escopo geográfico de um documento por meio de classificadores de aprendizagem de máquina treinados sem o uso de um gazetteer e sem premissas quanto à língua dos textos analisados. A Wikipédia foi utilizada como fonte de um conjunto de documentos anotados geograficamente para o treinamento de uma hierarquia de Classificadores Naive Bayes e Support Vector Machines (SVMs). Uma comparação de desempenho entre o Geofier e uma reimplementação do sistema Web-a-Where foi realizada em relação à determinação do escopo geográfico dos textos da Wikipédia. A hierarquia do Geofier foi treinada e avaliada de duas formas: usando topônimos do mesmo gazetteer que o Web-a-Where e usando n-gramas extraídos dos documentos de treinamento. Como resultado, o Geofier manteve desempenho superior ao obtido pela reimplementação do Web-a-Where.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electroencephalographic (EEG) signals of the human brains represent electrical activities for a number of channels recorded over a the scalp. The main purpose of this thesis is to investigate the interactions and causality of different parts of a brain using EEG signals recorded during a performance subjects of verbal fluency tasks. Subjects who have Parkinson's Disease (PD) have difficulties with mental tasks, such as switching between one behavior task and another. The behavior tasks include phonemic fluency, semantic fluency, category semantic fluency and reading fluency. This method uses verbal generation skills, activating different Broca's areas of the Brodmann's areas (BA44 and BA45). Advanced signal processing techniques are used in order to determine the activated frequency bands in the granger causality for verbal fluency tasks. The graph learning technique for channel strength is used to characterize the complex graph of Granger causality. Also, the support vector machine (SVM) method is used for training a classifier between two subjects with PD and two healthy controls. Neural data from the study was recorded at the Colorado Neurological Institute (CNI). The study reveals significant difference between PD subjects and healthy controls in terms of brain connectivities in the Broca's Area BA44 and BA45 corresponding to EEG electrodes. The results in this thesis also demonstrate the possibility to classify based on the flow of information and causality in the brain of verbal fluency tasks. These methods have the potential to be applied in the future to identify pathological information flow and causality of neurological diseases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses the impact of machine translation on the language industry, specifically addressing its effect on translators. It summarizes the history of the development of machine translation, explains the underlying theory that ties machine translation to its practical applications, and describes the different types of machine translation as well as other tools familiar to translators. There are arguments for and against its use, as well as evaluation methods for testing it. Internet and real-time communication are featured for their role in the increase of machine translation use. The potential that this technology has in the future of professional translation is examined. This paper shows that machine translation will continue to be increasingly used whether translators like it or not.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El análisis de textos de la Web 2.0 es un tema de investigación relevante hoy en día. Sin embargo, son muchos los problemas que se plantean a la hora de utilizar las herramientas actuales en este tipo de textos. Para ser capaces de medir estas dificultades primero necesitamos conocer los diferentes registros o grados de informalidad que podemos encontrar. Por ello, en este trabajo intentaremos caracterizar niveles de informalidad para textos en inglés en la Web 2.0 mediante técnicas de aprendizaje automático no supervisado, obteniendo resultados del 68 % en F1.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Comunicación presentada en las IV Jornadas TIMM, Torres (Jaén), 7-8 abril 2011.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El foco geográfico de un documento identifica el lugar o lugares en los que se centra el contenido del texto. En este trabajo se presenta una aproximación basada en corpus para la detección del foco geográfico en el texto. Frente a otras aproximaciones que se centran en el uso de información puramente geográfica para la detección del foco, nuestra propuesta emplea toda la información textual existente en los documentos del corpus de trabajo, partiendo de la hipótesis de que la aparición de determinados personajes, eventos, fechas e incluso términos comunes, pueden resultar fundamentales para esta tarea. Para validar nuestra hipótesis, se ha realizado un estudio sobre un corpus de noticias geolocalizadas que tuvieron lugar entre los años 2008 y 2011. Esta distribución temporal nos ha permitido, además, analizar la evolución del rendimiento del clasificador y de los términos más representativos de diferentes localidades a lo largo del tiempo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a Text Summarisation tool, compendium, capable of generating the most common types of summaries. Regarding the input, single- and multi-document summaries can be produced; as the output, the summaries can be extractive or abstractive-oriented; and finally, concerning their purpose, the summaries can be generic, query-focused, or sentiment-based. The proposed architecture for compendium is divided in various stages, making a distinction between core and additional stages. The former constitute the backbone of the tool and are common for the generation of any type of summary, whereas the latter are used for enhancing the capabilities of the tool. The main contributions of compendium with respect to the state-of-the-art summarisation systems are that (i) it specifically deals with the problem of redundancy, by means of textual entailment; (ii) it combines statistical and cognitive-based techniques for determining relevant content; and (iii) it proposes an abstractive-oriented approach for facing the challenge of abstractive summarisation. The evaluation performed in different domains and textual genres, comprising traditional texts, as well as texts extracted from the Web 2.0, shows that compendium is very competitive and appropriate to be used as a tool for generating summaries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tema 6. Text Mining con Topic Modeling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette thèse contribue a la recherche vers l'intelligence artificielle en utilisant des méthodes connexionnistes. Les réseaux de neurones récurrents sont un ensemble de modèles séquentiels de plus en plus populaires capable en principe d'apprendre des algorithmes arbitraires. Ces modèles effectuent un apprentissage en profondeur, un type d'apprentissage machine. Sa généralité et son succès empirique en font un sujet intéressant pour la recherche et un outil prometteur pour la création de l'intelligence artificielle plus générale. Le premier chapitre de cette thèse donne un bref aperçu des sujets de fonds: l'intelligence artificielle, l'apprentissage machine, l'apprentissage en profondeur et les réseaux de neurones récurrents. Les trois chapitres suivants couvrent ces sujets de manière de plus en plus spécifiques. Enfin, nous présentons quelques contributions apportées aux réseaux de neurones récurrents. Le chapitre \ref{arxiv1} présente nos travaux de régularisation des réseaux de neurones récurrents. La régularisation vise à améliorer la capacité de généralisation du modèle, et joue un role clé dans la performance de plusieurs applications des réseaux de neurones récurrents, en particulier en reconnaissance vocale. Notre approche donne l'état de l'art sur TIMIT, un benchmark standard pour cette tâche. Le chapitre \ref{cpgp} présente une seconde ligne de travail, toujours en cours, qui explore une nouvelle architecture pour les réseaux de neurones récurrents. Les réseaux de neurones récurrents maintiennent un état caché qui représente leurs observations antérieures. L'idée de ce travail est de coder certaines dynamiques abstraites dans l'état caché, donnant au réseau une manière naturelle d'encoder des tendances cohérentes de l'état de son environnement. Notre travail est fondé sur un modèle existant; nous décrivons ce travail et nos contributions avec notamment une expérience préliminaire.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Internet traffic classification is a relevant and mature research field, anyway of growing importance and with still open technical challenges, also due to the pervasive presence of Internet-connected devices into everyday life. We claim the need for innovative traffic classification solutions capable of being lightweight, of adopting a domain-based approach, of not only concentrating on application-level protocol categorization but also classifying Internet traffic by subject. To this purpose, this paper originally proposes a classification solution that leverages domain name information extracted from IPFIX summaries, DNS logs, and DHCP leases, with the possibility to be applied to any kind of traffic. Our proposed solution is based on an extension of Word2vec unsupervised learning techniques running on a specialized Apache Spark cluster. In particular, learning techniques are leveraged to generate word-embeddings from a mixed dataset composed by domain names and natural language corpuses in a lightweight way and with general applicability. The paper also reports lessons learnt from our implementation and deployment experience that demonstrates that our solution can process 5500 IPFIX summaries per second on an Apache Spark cluster with 1 slave instance in Amazon EC2 at a cost of $ 3860 year. Reported experimental results about Precision, Recall, F-Measure, Accuracy, and Cohen's Kappa show the feasibility and effectiveness of the proposal. The experiments prove that words contained in domain names do have a relation with the kind of traffic directed towards them, therefore using specifically trained word embeddings we are able to classify them in customizable categories. We also show that training word embeddings on larger natural language corpuses leads improvements in terms of precision up to 180%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette thèse contribue a la recherche vers l'intelligence artificielle en utilisant des méthodes connexionnistes. Les réseaux de neurones récurrents sont un ensemble de modèles séquentiels de plus en plus populaires capable en principe d'apprendre des algorithmes arbitraires. Ces modèles effectuent un apprentissage en profondeur, un type d'apprentissage machine. Sa généralité et son succès empirique en font un sujet intéressant pour la recherche et un outil prometteur pour la création de l'intelligence artificielle plus générale. Le premier chapitre de cette thèse donne un bref aperçu des sujets de fonds: l'intelligence artificielle, l'apprentissage machine, l'apprentissage en profondeur et les réseaux de neurones récurrents. Les trois chapitres suivants couvrent ces sujets de manière de plus en plus spécifiques. Enfin, nous présentons quelques contributions apportées aux réseaux de neurones récurrents. Le chapitre \ref{arxiv1} présente nos travaux de régularisation des réseaux de neurones récurrents. La régularisation vise à améliorer la capacité de généralisation du modèle, et joue un role clé dans la performance de plusieurs applications des réseaux de neurones récurrents, en particulier en reconnaissance vocale. Notre approche donne l'état de l'art sur TIMIT, un benchmark standard pour cette tâche. Le chapitre \ref{cpgp} présente une seconde ligne de travail, toujours en cours, qui explore une nouvelle architecture pour les réseaux de neurones récurrents. Les réseaux de neurones récurrents maintiennent un état caché qui représente leurs observations antérieures. L'idée de ce travail est de coder certaines dynamiques abstraites dans l'état caché, donnant au réseau une manière naturelle d'encoder des tendances cohérentes de l'état de son environnement. Notre travail est fondé sur un modèle existant; nous décrivons ce travail et nos contributions avec notamment une expérience préliminaire.