960 resultados para Word Sense Disambguaion, WSD, Natural Language Processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research studies the application of syntagmatic analysis of written texts in the language of Brazilian Portuguese as a methodology for the automatic creation of extractive summaries. The automation of abstracts, while linked to the area of natural language processing (PLN) is studying ways the computer can autonomously construct summaries of texts. For this we use as presupposed the idea that switch to the computer the way a language is structured, in our case the Brazilian Portuguese, it will help in the discovery of the most relevant sentences, and consequently build extractive summaries with higher informativeness. In this study, we propose the definition of a summarization method that automatically perform the syntagmatic analysis of texts and through them, to build an automatic summary. The phrases that make up the syntactic structures are then used to analyze the sentences of the text, so the count of these elements determines whether or not a sentence will compose the summary to be generated

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The semantic model described in this paper is based on ones developed for arithmetic (e.g. McCloskey et al. 1985, Cohene and Dehaene 1995), natural language processing (Fodor 1975, Chomsky 1981) and work by the author on how learners parse mathematical structures. The semantic model highlights the importance of the parsing process and the relationship between this process and the mathematical lexicon/grammar. It concludes by demonstrating that for a learner to become an efficient, competent mathematician a process of top-down parsing is essential.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The semantic model developed in this research was in response to the difficulty a group of mathematics learners had with conventional mathematical language and their interpretation of mathematical constructs. In order to develop the model ideas from linguistics, psycholinguistics, cognitive psychology, formal languages and natural language processing were investigated. This investigation led to the identification of four main processes: the parsing process, syntactic processing, semantic processing and conceptual processing. The model showed the complex interdependency between these four processes and provided a theoretical framework in which the behaviour of the mathematics learner could be analysed. The model was then extended to include the use of technological artefacts into the learning process. To facilitate this aspect of the research, the theory of instrumentation was incorporated into the semantic model. The conclusion of this research was that although the cognitive processes were interdependent, they could develop at different rates until mastery of a topic was achieved. It also found that the introduction of a technological artefact into the learning environment introduced another layer of complexity, both in terms of the learning process and the underlying relationship between the four cognitive processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

People recommenders are a widespread feature of social networking sites and educational social learning platforms alike. However, when these systems are used to extend learners’ Personal Learning Networks, they often fall short of providing recommendations of learning value to their users. This paper proposes a design of a people recommender based on content-based user profiles, and a matching method based on dissimilarity therein. It presents the results of an experiment conducted with curators of the content curation site Scoop.it!, where curators rated personalized recommendations for contacts. The study showed that matching dissimilarity of interpretations of shared interests is more successful in providing positive experiences of breakdown for the curator than is matching on similarity. The main conclusion of this paper is that people recommenders should aim to trigger constructive experiences of breakdown for their users, as the prospect and potential of such experiences encourage learners to connect to their recommended peers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The current study builds upon a previous study, which examined the degree to which the lexical properties of students’ essays could predict their vocabulary scores. We expand on this previous research by incorporating new natural language processing indices related to both the surface- and discourse-levels of students’ essays. Additionally, we investigate the degree to which these NLP indices can be used to account for variance in students’ reading comprehension skills. We calculated linguistic essay features using our framework, ReaderBench, which is an automated text analysis tools that calculates indices related to linguistic and rhetorical features of text. University students (n = 108) produced timed (25 minutes), argumentative essays, which were then analyzed by ReaderBench. Additionally, they completed the Gates-MacGinitie Vocabulary and Reading comprehension tests. The results of this study indicated that two indices were able to account for 32.4% of the variance in vocabulary scores and 31.6% of the variance in reading comprehension scores. Follow-up analyses revealed that these models further improved when only considering essays that contained multiple paragraph (R2 values = .61 and .49, respectively). Overall, the results of the current study suggest that natural language processing techniques can help to inform models of individual differences among student writers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Text cohesion is an important element of discourse processing. This paper presents a new approach to modeling, quantifying, and visualizing text cohesion using automated cohesion flow indices that capture semantic links among paragraphs. Cohesion flow is calculated by applying Cohesion Network Analysis, a combination of semantic distances, Latent Semantic Analysis, and Latent Dirichlet Allocation, as well as Social Network Analysis. Experiments performed on 315 timed essays indicated that cohesion flow indices are significantly correlated with human ratings of text coherence and essay quality. Visualizations of the global cohesion indices are also included to support a more facile understanding of how cohesion flow impacts coherence in terms of semantic dependencies between paragraphs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This presentation summarizes experience with the automated speech recognition and translation approach realised in the context of the European project EMMA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Opinion mining and sentiment analysis are important research areas of Natural Language Processing (NLP) tools and have become viable alternatives for automatically extracting the affective information found in texts. Our aim is to build an NLP model to analyze gamers’ sentiments and opinions expressed in a corpus of 9750 game reviews. A Principal Component Analysis using sentiment analysis features explained 51.2 % of the variance of the reviews and provides an integrated view of the major sentiment and topic related dimensions expressed in game reviews. A Discriminant Function Analysis based on the emerging components classified game reviews into positive, neutral and negative ratings with a 55 % accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Taxonomies have gained a broad usage in a variety of fields due to their extensibility, as well as their use for classification and knowledge organization. Of particular interest is the digital document management domain in which their hierarchical structure can be effectively employed in order to organize documents into content-specific categories. Common or standard taxonomies (e.g., the ACM Computing Classification System) contain concepts that are too general for conceptualizing specific knowledge domains. In this paper we introduce a novel automated approach that combines sub-trees from general taxonomies with specialized seed taxonomies by using specific Natural Language Processing techniques. We provide an extensible and generalizable model for combining taxonomies in the practical context of two very large European research projects. Because the manual combination of taxonomies by domain experts is a highly time consuming task, our model measures the semantic relatedness between concept labels in CBOW or skip-gram Word2vec vector spaces. A preliminary quantitative evaluation of the resulting taxonomies is performed after applying a greedy algorithm with incremental thresholds used for matching and combining topic labels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Semantic Annotation component is a software application that provides support for automated text classification, a process grounded in a cohesion-centered representation of discourse that facilitates topic extraction. The component enables the semantic meta-annotation of text resources, including automated classification, thus facilitating information retrieval within the RAGE ecosystem. It is available in the ReaderBench framework (http://readerbench.com/) which integrates advanced Natural Language Processing (NLP) techniques. The component makes use of Cohesion Network Analysis (CNA) in order to ensure an in-depth representation of discourse, useful for mining keywords and performing automated text categorization. Our component automatically classifies documents into the categories provided by the ACM Computing Classification System (http://dl.acm.org/ccs_flat.cfm), but also into the categories from a high level serious games categorization provisionally developed by RAGE. English and French languages are already covered by the provided web service, whereas the entire framework can be extended in order to support additional languages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The large upfront investments required for game development pose a severe barrier for the wider uptake of serious games in education and training. Also, there is a lack of well-established methods and tools that support game developers at preserving and enhancing the games’ pedagogical effectiveness. The RAGE project, which is a Horizon 2020 funded research project on serious games, addresses these issues by making available reusable software components that aim to support the pedagogical qualities of serious games. In order to easily deploy and integrate these game components in a multitude of game engines, platforms and programming languages, RAGE has developed and validated a hybrid component-based software architecture that preserves component portability and interoperability. While a first set of software components is being developed, this paper presents selected examples to explain the overall system’s concept and its practical benefits. First, the Emotion Detection component uses the learners’ webcams for capturing their emotional states from facial expressions. Second, the Performance Statistics component is an add-on for learning analytics data processing, which allows instructors to track and inspect learners’ progress without bothering about the required statistics computations. Third, a set of language processing components accommodate the analysis of textual inputs of learners, facilitating comprehension assessment and prediction. Fourth, the Shared Data Storage component provides a technical solution for data storage - e.g. for player data or game world data - across multiple software components. The presented components are exemplary for the anticipated RAGE library, which will include up to forty reusable software components for serious gaming, addressing diverse pedagogical dimensions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, {\it MixKMeans}, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’augmentation de la croissance des réseaux, des blogs et des utilisateurs des sites d’examen sociaux font d’Internet une énorme source de données, en particulier sur la façon dont les gens pensent, sentent et agissent envers différentes questions. Ces jours-ci, les opinions des gens jouent un rôle important dans la politique, l’industrie, l’éducation, etc. Alors, les gouvernements, les grandes et petites industries, les instituts universitaires, les entreprises et les individus cherchent à étudier des techniques automatiques fin d’extraire les informations dont ils ont besoin dans les larges volumes de données. L’analyse des sentiments est une véritable réponse à ce besoin. Elle est une application de traitement du langage naturel et linguistique informatique qui se compose de techniques de pointe telles que l’apprentissage machine et les modèles de langue pour capturer les évaluations positives, négatives ou neutre, avec ou sans leur force, dans des texte brut. Dans ce mémoire, nous étudions une approche basée sur les cas pour l’analyse des sentiments au niveau des documents. Notre approche basée sur les cas génère un classificateur binaire qui utilise un ensemble de documents classifies, et cinq lexiques de sentiments différents pour extraire la polarité sur les scores correspondants aux commentaires. Puisque l’analyse des sentiments est en soi une tâche dépendante du domaine qui rend le travail difficile et coûteux, nous appliquons une approche «cross domain» en basant notre classificateur sur les six différents domaines au lieu de le limiter à un seul domaine. Pour améliorer la précision de la classification, nous ajoutons la détection de la négation comme une partie de notre algorithme. En outre, pour améliorer la performance de notre approche, quelques modifications innovantes sont appliquées. Il est intéressant de mentionner que notre approche ouvre la voie à nouveaux développements en ajoutant plus de lexiques de sentiment et ensembles de données à l’avenir.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08