849 resultados para Suda lexicon.


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon. Preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than exiting weakly-supervised sentiment classification methods despite using no labeled documents.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Previous research into formulaic language has focussed on specialised groups of people (e.g. L1 acquisition by infants and adult L2 acquisition) with ordinary adult native speakers of English receiving less attention. Additionally, whilst some features of formulaic language have been used as evidence of authorship (e.g. the Unabomber’s use of you can’t eat your cake and have it too) there has been no systematic investigation into this as a potential marker of authorship. This thesis reports the first full-scale study into the use of formulaic sequences by individual authors. The theory of formulaic language hypothesises that formulaic sequences contained in the mental lexicon are shaped by experience combined with what each individual has found to be communicatively effective. Each author’s repertoire of formulaic sequences should therefore differ. To test this assertion, three automated approaches to the identification of formulaic sequences are tested on a specially constructed corpus containing 100 short narratives. The first approach explores a limited subset of formulaic sequences using recurrence across a series of texts as the criterion for identification. The second approach focuses on a word which frequently occurs as part of formulaic sequences and also investigates alternative non-formulaic realisations of the same semantic content. Finally, a reference list approach is used. Whilst claiming authority for any reference list can be difficult, the proposed method utilises internet examples derived from lists prepared by others, a procedure which, it is argued, is akin to asking large groups of judges to reach consensus about what is formulaic. The empirical evidence supports the notion that formulaic sequences have potential as a marker of authorship since in some cases a Questioned Document was correctly attributed. Although this marker of authorship is not universally applicable, it does promise to become a viable new tool in the forensic linguist’s tool-kit.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Browsing constitutes an important part of the user information searching process on the Web. In this paper, we present a browser plug-in called ESpotter, which recognizes entities of various types on Web pages and highlights them according to their types to assist user browsing. ESpotter uses a range of standard named entity recognition techniques. In addition, a key new feature of ESpotter is that it addresses the problem of multiple domains on the Web by adapting lexicon and patterns to these domains.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The influence of text messaging on language has been hotly debated especially in relation to spelling and the lexicon, but the impact of SMS on syntax has received less attention.This article focuses on manipulations within the verbal domain, as language evolution points towards a consistent trend going from synthetic to analytical forms (Bybee et al. 1994), which goes against the need for concision in texting. Based on an authentic corpus of about 500 SMS (Fairon et al. 2006b), the present study shows condensation strategies that are similar to those already described, yet reveals specific features such as the absence of aphaeresis and the scarcity of apocope, as well as the overuse of synthetic forms. It can thus be concluded that while SMS writing displays oral characteristics, it cannot obviously be assimilated to speech; in addition, it may well slow down language evolution and support the conservation of short standard forms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The best results in the application of computer science systems to automatic translation are obtained in word processing when texts pertain to specific thematic areas, with structures well defined and a concise and limited lexicon. In this article we present a plan of systematic work for the analysis and generation of language applied to the field of pharmaceutical leaflet, a type of document characterized by format rigidity and precision in the use of lexicon. We propose a solution based in the use of one interlingua as language pivot between source and target languages; we are considering Spanish and Arab languages in this case of application.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sentiment analysis on Twitter has attracted much attention recently due to its wide applications in both, commercial and public sectors. In this paper we present SentiCircles, a lexicon-based approach for sentiment analysis on Twitter. Different from typical lexicon-based approaches, which offer a fixed and static prior sentiment polarities of words regardless of their context, SentiCircles takes into account the co-occurrence patterns of words in different contexts in tweets to capture their semantics and update their pre-assigned strength and polarity in sentiment lexicons accordingly. Our approach allows for the detection of sentiment at both entity-level and tweet-level. We evaluate our proposed approach on three Twitter datasets using three different sentiment lexicons to derive word prior sentiments. Results show that our approach significantly outperforms the baselines in accuracy and F-measure for entity-level subjectivity (neutral vs. polar) and polarity (positive vs. negative) detections. For tweet-level sentiment detection, our approach performs better than the state-of-the-art SentiStrength by 4-5% in accuracy in two datasets, but falls marginally behind by 1% in F-measure in the third dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words' sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words’ sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure. © 2014 Springer International Publishing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Most research in the area of emotion detection in written text focused on detecting explicit expressions of emotions in text. In this paper, we present a rule-based pipeline approach for detecting implicit emotions in written text without emotion-bearing words based on the OCC Model. We have evaluated our approach on three different datasets with five emotion categories. Our results show that the proposed approach outperforms the lexicon matching method consistently across all the three datasets by a large margin of 17–30% in F-measure and gives competitive performance compared to a supervised classifier. In particular, when dealing with formal text which follows grammatical rules strictly, our approach gives an average F-measure of 82.7% on “Happy”, “Angry-Disgust” and “Sad”, even outperforming the supervised baseline by nearly 17% in F-measure. Our preliminary results show the feasibility of the approach for the task of implicit emotion detection in written text.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This research aimed to verify the vocabulary difficulties faced by 9th year students while understanding the didactic book of Portuguese Language (DBPL) “Vontade de Saber Português”, used at the Municipal School Sebastião Rangel. We noticed the students had some doubts concerning the unknown vocabulary in the texts and, therefore, in text comprehension. The hypothesis is that one “difficult” word and the lexicon used by DBPL author can disturb student comprehension. We adopted some action which could simplify the little vocabulary understanding and contributed to extend it. For that reason, the job was theoretically based on Biderman (1999), Barbosa (1989), Dias (2004), Krieger (2012), Coelho (1993) and on National Curriculum Parameters of Portuguese Language, aiming to ally theory and practice. The application methodology of the proposal was done in order to the students understand that the word needs to be adapted to its context. At the begging of the job, the students read the texts and took notes of the “difficult” words, selecting, corpus. We analyzed the doubts, registering them. Then, we showed to the students the classification of abbreviated words after each entry. The students separated the words for grammar classes – lexical words” (KRIEGER, 2012). Such words have a very significant meaning to the comprehension of the read texts, being interesting to take a look in online dictionaries. In the creative glossary, done by the students, the words were spread in alphabetical order. They transcript the part where was the word and copied again, substituting the word to a clearer word. Finally, we asked the students a writing production using five words from the glossary; we showed them that the meaning of the words is not found only in the dictionary, but they can be used in different contexts. In the analyzes, was discovered that there is one necessity of a pedagogic didactic work more effective with elementary school lexicon. Thus, this proposal is not a closed receipt, but the infield location allowed a reflexive pedagogic practice about lexicon education.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Peer reviewed

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Peer reviewed

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The semantic model described in this paper is based on ones developed for arithmetic (e.g. McCloskey et al. 1985, Cohene and Dehaene 1995), natural language processing (Fodor 1975, Chomsky 1981) and work by the author on how learners parse mathematical structures. The semantic model highlights the importance of the parsing process and the relationship between this process and the mathematical lexicon/grammar. It concludes by demonstrating that for a learner to become an efficient, competent mathematician a process of top-down parsing is essential.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There is a contemporary scepticism towards vision-based metaphors in management and organization studies that reflects a more general pattern across the social sciences. In short, there has been a shift away from ocularcentrism. This shift provides a useful basis for metatheoretical analysis of the philosophical discourse that informs organizational analysis. The article begins by briefly discussing the vision-generated, vision-centred interpretation of knowledge, truth, and reality that has characterized the western philosophical tradition. Taking late 18th-century rationalism as the high-point of ocularcentrism, the article then presents a metatheoretical framework based on three trajectories that critiques of ocularcentrism have subsequently taken. The first exposes the limits of the metaphor by, paradoxically, taking it to its limits. The second trajectory seeks to displace the primordial position of the ocular metaphor and replace it with an alternative lexicon based on other human senses. Last, the third trajectory describes how the Enlightenment ocular characterization of the visual and mental worlds has effectively been inverted in the postmodern moment.