44 resultados para programming language processing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

All aspects of the concept of collocation – the phenomenon whereby words naturally tend to occur in the company of a restricted set of other words – are covered in this book. It deals in detail with the history of the word collocation, the concepts associated with it and its use in a linguistic context. The authors show the practical means by which the collocational behaviour of words can be explored using illustrative computer programs and examine applications in teaching, lexicography and natural language processing that use collocation in formation. The book investigates the place that collocation occupies in theories of language and provides a thoroughly comprehensive and up-to-date survey of the current position of collocation in language studies and applied linguistics. This text presents a comprehensive description of collocation, covering both the theoretical and practical background and the implications and applications of the concept as language model and analytical tool. It provides a definitive survey of currently available techniques and a detailed description of their implementation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We show a new method for term extraction from a domain relevant corpus using natural language processing for the purposes of semi-automatic ontology learning. Literature shows that topical words occur in bursts. We find that the ranking of extracted terms is insensitive to the choice of population model, but calculating frequencies relative to the burst size rather than the document length in words yields significantly different results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identification of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classification results and produces more coherent violence-related topics compared toa few competitive baselines.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence has extends to many areas of these fields and includes contributions to Machine Translation, word sense disambiguation, dialogue modeling and Information Extraction.This book celebrates the work of Yorick Wilks from the perspective of his peers. It consists of original chapters each of which analyses an aspect of his work and links it to current thinking in that area. His work has spanned over four decades but is shown to be pertinent to recent developments in language processing such as the Semantic Web.This volume forms a two-part set together with Words and Intelligence I, Selected Works by Yorick Wilks, by the same editors.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Corpora—large collections of written and/or spoken text stored and accessed electronically—provide the means of investigating language that is of growing importance academically and professionally. Corpora are now routinely used in the following fields: The production of dictionaries and other reference materials; The development of aids to translation; Language teaching materials; The investigation of ideologies and cultural assumptions; Natural language processing; and The investigation of all aspects of linguistic behaviour, including vocabulary, grammar and pragmatics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years, learning word vector representations has attracted much interest in Natural Language Processing. Word representations or embeddings learned using unsupervised methods help addressing the problem of traditional bag-of-word approaches which fail to capture contextual semantics. In this paper we go beyond the vector representations at the word level and propose a novel framework that learns higher-level feature representations of n-grams, phrases and sentences using a deep neural network built from stacked Convolutional Restricted Boltzmann Machines (CRBMs). These representations have been shown to map syntactically and semantically related n-grams to closeby locations in the hidden feature space. We have experimented to additionally incorporate these higher-level features into supervised classifier training for two sentiment analysis tasks: subjectivity classification and sentiment classification. Our results have demonstrated the success of our proposed framework with 4% improvement in accuracy observed for subjectivity classification and improved the results achieved for sentiment classification over models trained without our higher level features.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years, there has been an increas-ing interest in learning a distributed rep-resentation of word sense. Traditional context clustering based models usually require careful tuning of model parame-ters, and typically perform worse on infre-quent word senses. This paper presents a novel approach which addresses these lim-itations by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context clustering based model to generate the distributed representations of word senses. Our learned represen-tations outperform the publicly available embeddings on 2 out of 4 metrics in the word similarity task, and 6 out of 13 sub tasks in the analogical reasoning task.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Most research in the area of emotion detection in written text focused on detecting explicit expressions of emotions in text. In this paper, we present a rule-based pipeline approach for detecting implicit emotions in written text without emotion-bearing words based on the OCC Model. We have evaluated our approach on three different datasets with five emotion categories. Our results show that the proposed approach outperforms the lexicon matching method consistently across all the three datasets by a large margin of 17–30% in F-measure and gives competitive performance compared to a supervised classifier. In particular, when dealing with formal text which follows grammatical rules strictly, our approach gives an average F-measure of 82.7% on “Happy”, “Angry-Disgust” and “Sad”, even outperforming the supervised baseline by nearly 17% in F-measure. Our preliminary results show the feasibility of the approach for the task of implicit emotion detection in written text.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Storyline detection from news articles aims at summarizing events described under a certain news topic and revealing how those events evolve over time. It is a difficult task because it requires first the detection of events from news articles published in different time periods and then the construction of storylines by linking events into coherent news stories. Moreover, each storyline has different hierarchical structures which are dependent across epochs. Existing approaches often ignore the dependency of hierarchical structures in storyline generation. In this paper, we propose an unsupervised Bayesian model, called dynamic storyline detection model, to extract structured representations and evolution patterns of storylines. The proposed model is evaluated on a large scale news corpus. Experimental results show that our proposed model outperforms several baseline approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The semantic model described in this paper is based on ones developed for arithmetic (e.g. McCloskey et al. 1985, Cohene and Dehaene 1995), natural language processing (Fodor 1975, Chomsky 1981) and work by the author on how learners parse mathematical structures. The semantic model highlights the importance of the parsing process and the relationship between this process and the mathematical lexicon/grammar. It concludes by demonstrating that for a learner to become an efficient, competent mathematician a process of top-down parsing is essential.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The semantic model developed in this research was in response to the difficulty a group of mathematics learners had with conventional mathematical language and their interpretation of mathematical constructs. In order to develop the model ideas from linguistics, psycholinguistics, cognitive psychology, formal languages and natural language processing were investigated. This investigation led to the identification of four main processes: the parsing process, syntactic processing, semantic processing and conceptual processing. The model showed the complex interdependency between these four processes and provided a theoretical framework in which the behaviour of the mathematics learner could be analysed. The model was then extended to include the use of technological artefacts into the learning process. To facilitate this aspect of the research, the theory of instrumentation was incorporated into the semantic model. The conclusion of this research was that although the cognitive processes were interdependent, they could develop at different rates until mastery of a topic was achieved. It also found that the introduction of a technological artefact into the learning environment introduced another layer of complexity, both in terms of the learning process and the underlying relationship between the four cognitive processes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We conducted a detailed study of a case of linguistic talent in the context of autism spectrum disorder, specifically Asperger syndrome. I.A. displays language strengths at the level of morphology and syntax. Yet, despite this grammar advantage, processing of figurative language and inferencing based on context presents a problem for him. The morphology advantage for I.A. is consistent with the weak central coherence (WCC) account of autism. From this account, the presence of a local processing bias is evident in the ways in which autistic individuals solve common problems, such as assessing similarities between objects and finding common patterns, and may therefore provide an advantage in some cognitive tasks compared to typical individuals. We extend the WCC account to language and provide evidence for a connection between the local processing bias and the acquisition of morphology and grammar.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

It is well established that speech, language and phonological skills are closely associated with literacy, and that children with a family risk of dyslexia (FRD) tend to show deficits in each of these areas in the preschool years. This paper examines what the relationships are between FRD and these skills, and whether deficits in speech, language and phonological processing fully account for the increased risk of dyslexia in children with FRD. One hundred and fifty-three 4-6-year-old children, 44 of whom had FRD, completed a battery of speech, language, phonology and literacy tasks. Word reading and spelling were retested 6 months later, and text reading accuracy and reading comprehension were tested 3 years later. The children with FRD were at increased risk of developing difficulties in reading accuracy, but not reading comprehension. Four groups were compared: good and poor readers with and without FRD. In most cases good readers outperformed poor readers regardless of family history, but there was an effect of family history on naming and nonword repetition regardless of literacy outcome, suggesting a role for speech production skills as an endophenotype of dyslexia. Phonological processing predicted spelling, while language predicted text reading accuracy and comprehension. FRD was a significant additional predictor of reading and spelling after controlling for speech production, language and phonological processing, suggesting that children with FRD show additional difficulties in literacy that cannot be fully explained in terms of their language and phonological skills. It is well established that speech, language and phonological skills are closely associated with literacy, and that children with a family risk of dyslexia (FRD) tend to show deficits in each of these areas in the preschool years. This paper examines what the relationships are between FRD and these skills, and whether deficits in speech, language and phonological processing fully account for the increased risk of dyslexia in children with FRD. One hundred and fifty-three 4-6-year-old children, 44 of whom had FRD, completed a battery of speech, language, phonology and literacy tasks. © 2014 John Wiley & Sons Ltd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

It has been proposed that language impairments in children with Autism Spectrum Disorders (ASD) stem from atypical neural processing of speech and/or nonspeech sounds. However, the strength of this proposal is compromised by the unreliable outcomes of previous studies of speech and nonspeech processing in ASD. The aim of this study was to determine whether there was an association between poor spoken language and atypical event-related field (ERF) responses to speech and nonspeech sounds in children with ASD (n = 14) and controls (n = 18). Data from this developmental population (ages 6-14) were analysed using a novel combination of methods to maximize the reliability of our findings while taking into consideration the heterogeneity of the ASD population. The results showed that poor spoken language scores were associated with atypical left hemisphere brain responses (200 to 400 ms) to both speech and nonspeech in the ASD group. These data support the idea that some children with ASD may have an immature auditory cortex that affects their ability to process both speech and nonspeech sounds. Their poor speech processing may impair their ability to process the speech of other people, and hence reduce their ability to learn the phonology, syntax, and semantics of their native language.