Biblioteca Digital

965 resultados para Natural language processing (Computer science)

Classifying Written Texts Through Rhythmic Features

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.

Expressing Sentiments in Game Reviews

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Opinion mining and sentiment analysis are important research areas of Natural Language Processing (NLP) tools and have become viable alternatives for automatically extracting the affective information found in texts. Our aim is to build an NLP model to analyze gamers’ sentiments and opinions expressed in a corpus of 9750 game reviews. A Principal Component Analysis using sentiment analysis features explained 51.2 % of the variance of the reviews and provides an integrated view of the major sentiment and topic related dimensions expressed in game reviews. A Discriminant Function Analysis based on the emerging components classified game reviews into positive, neutral and negative ratings with a 55 % accuracy.

Combining Taxonomies using Word2vec

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Taxonomies have gained a broad usage in a variety of fields due to their extensibility, as well as their use for classification and knowledge organization. Of particular interest is the digital document management domain in which their hierarchical structure can be effectively employed in order to organize documents into content-specific categories. Common or standard taxonomies (e.g., the ACM Computing Classification System) contain concepts that are too general for conceptualizing specific knowledge domains. In this paper we introduce a novel automated approach that combines sub-trees from general taxonomies with specialized seed taxonomies by using specific Natural Language Processing techniques. We provide an extensible and generalizable model for combining taxonomies in the practical context of two very large European research projects. Because the manual combination of taxonomies by domain experts is a highly time consuming task, our model measures the semantic relatedness between concept labels in CBOW or skip-gram Word2vec vector spaces. A preliminary quantitative evaluation of the resulting taxonomies is performed after applying a greedy algorithm with incremental thresholds used for matching and combining topic labels.

D6.3 – Semantic Content Annotation Support

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Semantic Annotation component is a software application that provides support for automated text classification, a process grounded in a cohesion-centered representation of discourse that facilitates topic extraction. The component enables the semantic meta-annotation of text resources, including automated classification, thus facilitating information retrieval within the RAGE ecosystem. It is available in the ReaderBench framework (http://readerbench.com/) which integrates advanced Natural Language Processing (NLP) techniques. The component makes use of Cohesion Network Analysis (CNA) in order to ensure an in-depth representation of discourse, useful for mining keywords and performing automated text categorization. Our component automatically classifies documents into the categories provided by the ACM Computing Classification System (http://dl.acm.org/ccs_flat.cfm), but also into the categories from a high level serious games categorization provisionally developed by RAGE. English and French languages are already covered by the provided web service, whereas the entire framework can be extended in order to support additional languages.

Software Components for Serious Game Development

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The large upfront investments required for game development pose a severe barrier for the wider uptake of serious games in education and training. Also, there is a lack of well-established methods and tools that support game developers at preserving and enhancing the games’ pedagogical effectiveness. The RAGE project, which is a Horizon 2020 funded research project on serious games, addresses these issues by making available reusable software components that aim to support the pedagogical qualities of serious games. In order to easily deploy and integrate these game components in a multitude of game engines, platforms and programming languages, RAGE has developed and validated a hybrid component-based software architecture that preserves component portability and interoperability. While a first set of software components is being developed, this paper presents selected examples to explain the overall system’s concept and its practical benefits. First, the Emotion Detection component uses the learners’ webcams for capturing their emotional states from facial expressions. Second, the Performance Statistics component is an add-on for learning analytics data processing, which allows instructors to track and inspect learners’ progress without bothering about the required statistics computations. Third, a set of language processing components accommodate the analysis of textual inputs of learners, facilitating comprehension assessment and prediction. Fourth, the Shared Data Storage component provides a technical solution for data storage - e.g. for player data or game world data - across multiple software components. The presented components are exemplary for the anticipated RAGE library, which will include up to forty reusable software components for serious gaming, addressing diverse pedagogical dimensions.

MixKMeans: Clustering Question-Answer Archives

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, {\it MixKMeans}, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.

Erweiterung des Dokumentenservers DSpace: Internationalisierung und Implementation des URN-Systems zur permanenten Dokumentenadressierung

Relevância:

100.00% 100.00%

Publicador:

Resumo:

At first a small overview is given about the disposition of document ser- vers in the scientific publication process. Then, institutional repositories are introduced by their key features and the benefits of establishing them as a central repository in the university context. A specific solution was chosen on behalf of the requirements of the Uni- versity Library of Kassel, Germany. The software Dspace was chosen but needs to be extended by • internationalization • use of the urn:nbn scheme as persisten identifier. DSpace’s features are shortly described, followed by the process of rever- se engeneering to achieve requirements needed for the implementation of the missing functionality. Adjacent tasks implement the needed featu- res using SUN’s Standard Tag Library for internationalization and some modifications in two classes for use of the urn:nbn scheme as persistent identifier. At the end, a short view on the future of institutional repositories is taken, furthermore some local long-term objectives on DSpace are dis- cussed.

Model-driven development of sensor network applications with optimization of non-functional constraints

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wireless sensor networks (WSNs) differ from conventional distributed systems in many aspects. The resource limitation of sensor nodes, the ad-hoc communication and topology of the network, coupled with an unpredictable deployment environment are difficult non-functional constraints that must be carefully taken into account when developing software systems for a WSN. Thus, more research needs to be done on designing, implementing and maintaining software for WSNs. This thesis aims to contribute to research being done in this area by presenting an approach to WSN application development that will improve the reusability, flexibility, and maintainability of the software. Firstly, we present a programming model and software architecture aimed at describing WSN applications, independently of the underlying operating system and hardware. The proposed architecture is described and realized using the Model-Driven Architecture (MDA) standard in order to achieve satisfactory levels of encapsulation and abstraction when programming sensor nodes. Besides, we study different non-functional constrains of WSN application and propose two approaches to optimize the application to satisfy these constrains. A real prototype framework was built to demonstrate the developed solutions in the thesis. The framework implemented the programming model and the multi-layered software architecture as components. A graphical interface, code generation components and supporting tools were also included to help developers design, implement, optimize, and test the WSN software. Finally, we evaluate and critically assess the proposed concepts. Two case studies are provided to support the evaluation. The first case study, a framework evaluation, is designed to assess the ease at which novice and intermediate users can develop correct and power efficient WSN applications, the portability level achieved by developing applications at a high-level of abstraction, and the estimated overhead due to usage of the framework in terms of the footprint and executable code size of the application. In the second case study, we discuss the design, implementation and optimization of a real-world application named TempSense, where a sensor network is used to monitor the temperature within an area.

Sentiment classification with case-base approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’augmentation de la croissance des réseaux, des blogs et des utilisateurs des sites d’examen sociaux font d’Internet une énorme source de données, en particulier sur la façon dont les gens pensent, sentent et agissent envers différentes questions. Ces jours-ci, les opinions des gens jouent un rôle important dans la politique, l’industrie, l’éducation, etc. Alors, les gouvernements, les grandes et petites industries, les instituts universitaires, les entreprises et les individus cherchent à étudier des techniques automatiques fin d’extraire les informations dont ils ont besoin dans les larges volumes de données. L’analyse des sentiments est une véritable réponse à ce besoin. Elle est une application de traitement du langage naturel et linguistique informatique qui se compose de techniques de pointe telles que l’apprentissage machine et les modèles de langue pour capturer les évaluations positives, négatives ou neutre, avec ou sans leur force, dans des texte brut. Dans ce mémoire, nous étudions une approche basée sur les cas pour l’analyse des sentiments au niveau des documents. Notre approche basée sur les cas génère un classificateur binaire qui utilise un ensemble de documents classifies, et cinq lexiques de sentiments différents pour extraire la polarité sur les scores correspondants aux commentaires. Puisque l’analyse des sentiments est en soi une tâche dépendante du domaine qui rend le travail difficile et coûteux, nous appliquons une approche «cross domain» en basant notre classificateur sur les six différents domaines au lieu de le limiter à un seul domaine. Pour améliorer la précision de la classification, nous ajoutons la détection de la négation comme une partie de notre algorithme. En outre, pour améliorer la performance de notre approche, quelques modifications innovantes sont appliquées. Il est intéressant de mentionner que notre approche ouvre la voie à nouveaux développements en ajoutant plus de lexiques de sentiment et ensembles de données à l’avenir.

Validation of Electronic Health Record Phenotyping of Bipolar Disorder Cases and Controls

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: The study was designed to validate use of elec-tronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. Method: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype di- agnoses was calculated against diagnoses from direct semi- structured interviews of 190 patients by trained clinicians blind to EHR diagnosis. Results: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR- classified control subject received a diagnosis of bipolar dis- order on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based clas- sifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. Conclusions: Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.

Ein Programm für die Parallelisierung dynamisch adaptiver Mehrgitterverfahren

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In dieser Arbeit werden dynamisch adaptive Mehrgitterverfahren parallelisiert. Bei dynamisch adaptiven Mehrgitterverfahren wird ein Gebiet mit einem Gitter überdeckt, und auf diesem Gitter wird gerechnet, indem Gitterpunkte in der Umgebung herangezogen werden, um den Wert des nächsten Zeitpunktes zu bestimmen. Dann werden gröbere und feinere Gitter erzeugt und verwendet, wobei die feineren Gitter sich auf Teilgebiete konzentrieren. Diese Teilgebiete ändern sich im Verlauf der Zeit. Durch die Verwendung der zusätzlichen Gitter werden die numerischen Eigenschaften verbessert. Die Parallelisierung solcher Verfahren geschieht in der Regel durch Bisektion. In der vorliegenden Arbeit wird die Umverteilung der Gebiete realisiert, indem Mengen von einzelnen Gitterpunkten verschickt werden. Das ist ein Scheduling-Verfahren. Die Mehrgitterstrukturen sind so aufgebaut, dass fast beliebige Gitterpunktverteilungen auf den Gitterebenen vorliegen können. Die Strukturen werden einmal erzeugt, und nur bei Bedarf geändert, sodass keine Speicherallokationen während der Iterationen nötig sind. Neben dem Gitter sind zusätzliche Strukturen, wie zum Beispiel die Randstrukturen, erforderlich. Eine Struktur Farbenfeld verzeichnet, auf welchem Kern sich ein Außenrandpunkt befindet. In der parallelen adaptiven Verfeinerung werden für einzelne durch ein Entscheidungskriterium ausgewählte Gitterpunkte 5 x 5 Punktüberdeckungen vorgenommen. Dazu werden die verfügbaren Entscheidungsinformationen zur Bestimmung von komplexeren Strukturen herangezogen. Damit muss das Verfeinerungsgitter nicht komplett abgebaut und dann wieder aufgebaut werden, sondern nur die Änderungen am Gitter sind vorzunehmen. Das spart viel Berechnungszeit. Der letzte Schritt besteht darin, den Lastausgleich durchzuführen. Zunächst werden die Lasttransferwerte bestimmt, die angeben, wie viele Gitterpunkte von wo nach wo zu verschicken sind. Das geschieht mit Hilfe einer PLB genannten Methode bzw. einer Variante. PLB wurde bisher vor allem für kombinatorische Probleme eingesetzt. Dann erfolgt eine Auswahl der zu verschickenden Gitterpunkte mit einer Strategie, welche Punkte eines Kerns zu welchen Nachbarkernen transferiert werden sollen. Im letzten Schritt werden schließlich die ausgewählten Punkte migriert, wobei alle Gitterpunktstrukturen umgebaut werden und solche Informationen gepackt werden müssen, sodass ein Umbau seiner Gitterpunktstrukturen bei dem Empfänger möglich wird. Neben den Gitterpunktstrukturen müssen auch Strukturen für die parallele adaptive Verfeinerung verändert werden. Es muss ein Weiterverschicken von Gitterpunkten möglich sein, wenn über die Lastkanten in mehreren Runden Last verschickt wird. Während des Lastausgleichs wird noch Arbeit durch eine Struktur Zwischenkorrektur durchgeführt, die es ermöglicht, das Farbenfeld intakt zu halten, wenn benachbarte Gitterpunkte gleichzeitig verschickt werden.

Exploration des réseaux de neurones à base d'autoencodeur dans le cadre de la modélisation des données textuelles

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Depuis le milieu des années 2000, une nouvelle approche en apprentissage automatique, l'apprentissage de réseaux profonds (deep learning), gagne en popularité. En effet, cette approche a démontré son efficacité pour résoudre divers problèmes en améliorant les résultats obtenus par d'autres techniques qui étaient considérées alors comme étant l'état de l'art. C'est le cas pour le domaine de la reconnaissance d'objets ainsi que pour la reconnaissance de la parole. Sachant cela, l’utilisation des réseaux profonds dans le domaine du Traitement Automatique du Langage Naturel (TALN, Natural Language Processing) est donc une étape logique à suivre. Cette thèse explore différentes structures de réseaux de neurones dans le but de modéliser le texte écrit, se concentrant sur des modèles simples, puissants et rapides à entraîner.

KeyCrime: il “conclusive reasoning” nell’attività anticrimine della Polizia di Stato

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parlare di KeyCrime significa rapportarsi con un software che si fonda prioritariamente su di un metodo scientifico che fa proprio il ragionamento conclusivo (conclusive reasoning), applicato al decison making , pertanto all’intelligence investigativa e alla predictive policing. Potremmo pensare a KeyCrime come un paradigma operativo che si pone in sinergia tra la filosofia, il cognitivismo giuridico e le scienze applicate (Romeo F., 2006). Quando analisi e decision making trovano in un unico contesto il terreno fertile dove svilupparsi, ma ancor più, creare presupposti di ragionamento, ecco che da queste è facile comprendere da quale altra condizione sono nate, attivate e soprattutto utilizzate ai fini di un risultato: questa non è altro che “l’osservazione”; se ben fatta, profonda e scientifica offre una sistematica quanto utile predisposizione alle indagini, specialmente di carattere preventivo per l’anticrimine e la sicurezza.

Motor de recomendación construido sobre implicaciones

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este Trabajo Fin de Grado (TFG) tiene como objetivo la creación de un framework para su uso en sistemas de recomendación. Se ha realizado por dos personas en la modalidad de trabajo en equipo. Las tareas de este TFG están divididas en dos partes, una realizada conjuntamente y la otra de manera individual. La parte conjunta se centra en construir un sistema que sea capaz de, a partir de comentarios y opiniones sobre puntos de interés (POIs) y haciendo uso de la herramienta de procesamiento de lenguaje natural AlchemyAPI, construir contextos formales y contextos formales multivaluados. Para crear este último es necesario hacer uso de ontologías. El context formal multivaluado es el punto de partida de la segunda parte (individual), que consistirá en, haciendo uso del contexto multivaluado, obtener un conjunto de dependencias funcionales mediante la implementación en Java del algoritmo FDMine. Estas dependencias podrán ser usados en un motor de recomendación. El sistema se ha implementado como una aplicación web Java EE versión 6 y una API para trabajar con contextos formales multivaluados. Para el desarrollo web se han empleado tecnologías actuales como Spring y jQuery. Este proyecto se presenta como un trabajo inicial en el que se expondrán, además del sistema construido, diversos problemas relacionados con la creacion de conjuntos de datos validos. Por último, también se propondrán líneas para futuros TFGs.

Testu arteko koherentziazko erlazio-egitura: lehen urratsak euskaraz

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EU]Testu bat koherente egiten duten arrazoiak ulertzea oso baliagarria da testuaren beraren ulermenerako, koherentzia eta koherentzia-erlazioak testu bat edo gehiago koherente diren ondorioztatzen laguntzen baitigu. Lan honetan gai bera duten testu ezberdinen arteko koherentziazko 3 Cross Document Structure Theory edo CST (Radev, 2000) erlazio aztertu eta sailkatu dira. Hori egin ahal izateko, euskaraz idatziriko gai berari buruzko testuak segmentatzeko eta beraien arteko erlazioak etiketatzeko gidalerroak proposatzen dira. 10 testuz osaturiko corpusa etiketatu da; horietako 3 cluster bi etiketatzailek aztertu dute. Etiketatzaileen arteko adostasunaren berri ematen dugu. Koherentzia-erlazioak garatzea oso garrantzitsua da Hizkuntzaren Prozesamenduko hainbat sistementzat, hala nola, informazioa erauzteko sistementzat, itzulpen automatikoarentzat, galde-erantzun sistementzat eta laburpen automatikoarentzat. Etorkizunean CSTko erlazio guztiak corpus esanguratsuan aztertuko balira, testuen arteko koherentzia- erlazioak euskarazko testuen prozesaketa automatikoa bideratzeko lehenengo pausua litzateke hemen egindakoa.

«
1
2
...
35
36
37
38
39
40
41
...
64
65
»