877 resultados para Natural Language
Resumo:
Онтолингвистические системы ориентированы на решение сложных задач обработки естественного языка, требующих семантических знаний. В основе проектирования онтолингвистических систем лежат процессы скоординированного взаимодействия онтологических и лингвистических моделей. В статье рассматриваются методы решения лингвистических задач на основе онтологий, разработанные при проектировании специализированной онтолингвистической системы «ЛоТА», предназначенной для анализа специальных технических текстов «Логика работы системы... ».
Resumo:
In recent years, there has been an increas-ing interest in learning a distributed rep-resentation of word sense. Traditional context clustering based models usually require careful tuning of model parame-ters, and typically perform worse on infre-quent word senses. This paper presents a novel approach which addresses these lim-itations by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context clustering based model to generate the distributed representations of word senses. Our learned represen-tations outperform the publicly available embeddings on 2 out of 4 metrics in the word similarity task, and 6 out of 13 sub tasks in the analogical reasoning task.
Resumo:
The principal feature of ontology, which is developed for a text processing, is wider knowledge representation of an external world due to introduction of three-level hierarchy. It allows to improve semantic interpretation of natural language texts.
Resumo:
Mobile advertising is a rapidly growing sector providing brands and marketing agencies the opportunity to connect with consumers beyond traditional and digital media and instead communicate directly on their mobile phones. Mobile advertising will be intrinsically linked with mobile search, which has transported from the internet to the mobile and is identified as an area of potential growth. The result of mobile searching show that as a general rule such search result exceed 160 characters; the dialog is required to deliver the relevant portion of a response to the mobile user. In this paper we focus initially on mobile search and mobile advert creation, and later the mechanism of interaction between the user’s request, the result of searching, advertising and dialog.
Resumo:
The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated a technique of the text analysis of a technical specification is submitted, the expanded fuzzy attribute grammar of a technical specification, intended for formalization of limited Russian language is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical specification as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consist of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.
Resumo:
Рассматриваются проблемы анализа естественно-языковых объектов (ЕЯО) с точки зрения их представления и обработки в памяти компьютера. Предложена формализация задачи анализа ЕЯО и приведен пример формализованного представления ЕЯО предметной области.
Resumo:
Описывается один из подходов к анализу естественно-языкового текста, который использует толковый словарь естественного языка, локальный словарь анализируемого текста и частотные характеристики слов в этом тексте.
Resumo:
One of the ultimate aims of Natural Language Processing is to automate the analysis of the meaning of text. A fundamental step in that direction consists in enabling effective ways to automatically link textual references to their referents, that is, real world objects. The work presented in this paper addresses the problem of attributing a sense to proper names in a given text, i.e., automatically associating words representing Named Entities with their referents. The method for Named Entity Disambiguation proposed here is based on the concept of semantic relatedness, which in this work is obtained via a graph-based model over Wikipedia. We show that, without building the traditional bag of words representation of the text, but instead only considering named entities within the text, the proposed method achieves results competitive with the state-of-the-art on two different datasets.
Resumo:
Relatively little research on dialect variation has been based on corpora of naturally occurring language. Instead, dialect variation has been studied based primarily on language elicited through questionnaires and interviews. Eliciting dialect data has several advantages, including allowing for dialectologists to select individual informants, control the communicative situation in which language is collected, elicit rare forms directly, and make high-quality audio recordings. Although far less common, a corpus-based approach to data collection also has several advantages, including allowing for dialectologists to collect large amounts of data from a large number of informants, observe dialect variation across a range of communicative situations, and analyze quantitative linguistic variation in large samples of natural language. Although both approaches allow for dialect variation to be observed, they provide different perspectives on language variation and change. The corpus- based approach to dialectology has therefore produced a number of new findings, many of which challenge traditional assumptions about the nature of dialect variation. Most important, this research has shown that dialect variation involves a wider range of linguistic variables and exists across a wider range of language varieties than has previously been assumed. The goal of this chapter is to introduce this emerging approach to dialectology. The first part of this chapter reviews the growing body of research that analyzes dialect variation in corpora, including research on variation across nations, regions, genders, ages, and classes, in both speech and writing, and from both a synchronic and diachronic perspective, with a focus on dialect variation in the English language. Although collections of language data elicited through interviews and questionnaires are now commonly referred to as corpora in sociolinguistics and dialectology (e.g. see Bauer 2002; Tagliamonte 2006; Kretzschmar et al. 2006; D'Arcy 2011), this review focuses on corpora of naturally occurring texts and discourse. The second part of this chapter presents the results of an analysis of variation in not contraction across region, gender, and time in a corpus of American English letters to the editor in order to exemplify a corpus-based approach to dialectology.
Resumo:
Most research in the area of emotion detection in written text focused on detecting explicit expressions of emotions in text. In this paper, we present a rule-based pipeline approach for detecting implicit emotions in written text without emotion-bearing words based on the OCC Model. We have evaluated our approach on three different datasets with five emotion categories. Our results show that the proposed approach outperforms the lexicon matching method consistently across all the three datasets by a large margin of 17–30% in F-measure and gives competitive performance compared to a supervised classifier. In particular, when dealing with formal text which follows grammatical rules strictly, our approach gives an average F-measure of 82.7% on “Happy”, “Angry-Disgust” and “Sad”, even outperforming the supervised baseline by nearly 17% in F-measure. Our preliminary results show the feasibility of the approach for the task of implicit emotion detection in written text.
Resumo:
This paper introduces a quantitative method for identifying newly emerging word forms in large time-stamped corpora of natural language and then describes an analysis of lexical emergence in American social media using this method based on a multi-billion word corpus of Tweets collected between October 2013 and November 2014. In total 29 emerging word forms, which represent various semantic classes, grammatical parts-of speech, and word formations processes, were identified through this analysis. These 29 forms are then examined from various perspectives in order to begin to better understand the process of lexical emergence.
Resumo:
Storyline detection from news articles aims at summarizing events described under a certain news topic and revealing how those events evolve over time. It is a difficult task because it requires first the detection of events from news articles published in different time periods and then the construction of storylines by linking events into coherent news stories. Moreover, each storyline has different hierarchical structures which are dependent across epochs. Existing approaches often ignore the dependency of hierarchical structures in storyline generation. In this paper, we propose an unsupervised Bayesian model, called dynamic storyline detection model, to extract structured representations and evolution patterns of storylines. The proposed model is evaluated on a large scale news corpus. Experimental results show that our proposed model outperforms several baseline approaches.
Resumo:
The first study of its kind, Regional Variation in Written American English takes a corpus-based approach to map over a hundred grammatical alternation variables across the United States. A multivariate spatial analysis of these maps shows that grammatical alternation variables follow a relatively small number of common regional patterns in American English, which can be explained based on both linguistic and extra-linguistic factors. Based on this rigorous analysis of extensive data, Grieve identifies five primary modern American dialect regions, demonstrating that regional variation is far more pervasive and complex in natural language than is generally assumed. The wealth of maps and data and the groundbreaking implications of this volume make it essential reading for students and researchers in linguistics, English language, geography, computer science, sociology and communication studies.
Resumo:
The Everglades Online Thesaurus is a structured vocabulary of concepts and terms relating to the south Florida environment. Designed as an information management tool for both researchers and metadata creators, the Thesaurus is intended to improve information retrieval across the many disparate information systems, databases, and web sites that provide Everglades-related information. The vocabulary provided by the Everglades Online Thesaurus expresses each relevant concept using a single ‘preferred term’, whereas in natural language many terms may exist to express that same concept. In this way, the Thesaurus offers the possibility of standardizing the terminology used to describe Everglades-related information — an important factor in predictable and successful resource discovery.
Resumo:
There is a growing societal need to address the increasing prevalence of behavioral health issues, such as obesity, alcohol or drug use, and general lack of treatment adherence for a variety of health problems. The statistics, worldwide and in the USA, are daunting. Excessive alcohol use is the third leading preventable cause of death in the United States (with 79,000 deaths annually), and is responsible for a wide range of health and social problems. On the positive side though, these behavioral health issues (and associated possible diseases) can often be prevented with relatively simple lifestyle changes, such as losing weight with a diet and/or physical exercise, or learning how to reduce alcohol consumption. Medicine has therefore started to move toward finding ways of preventively promoting wellness, rather than solely treating already established illness. Evidence-based patient-centered Brief Motivational Interviewing (BMI) interven- tions have been found particularly effective in helping people find intrinsic motivation to change problem behaviors after short counseling sessions, and to maintain healthy lifestyles over the long-term. Lack of locally available personnel well-trained in BMI, however, often limits access to successful interventions for people in need. To fill this accessibility gap, Computer-Based Interventions (CBIs) have started to emerge. Success of the CBIs, however, critically relies on insuring engagement and retention of CBI users so that they remain motivated to use these systems and come back to use them over the long term as necessary. Because of their text-only interfaces, current CBIs can therefore only express limited empathy and rapport, which are the most important factors of health interventions. Fortunately, in the last decade, computer science research has progressed in the design of simulated human characters with anthropomorphic communicative abilities. Virtual characters interact using humans’ innate communication modalities, such as facial expressions, body language, speech, and natural language understanding. By advancing research in Artificial Intelligence (AI), we can improve the ability of artificial agents to help us solve CBI problems. To facilitate successful communication and social interaction between artificial agents and human partners, it is essential that aspects of human social behavior, especially empathy and rapport, be considered when designing human-computer interfaces. Hence, the goal of the present dissertation is to provide a computational model of rapport to enhance an artificial agent’s social behavior, and to provide an experimental tool for the psychological theories shaping the model. Parts of this thesis were already published in [LYL+12, AYL12, AL13, ALYR13, LAYR13, YALR13, ALY14].