942 resultados para erotic-obscene lexicon
Resumo:
Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.
Resumo:
This article presents two novel approaches for incorporating sentiment prior knowledge into the topic model for weakly supervised sentiment analysis where sentiment labels are considered as topics. One is by modifying the Dirichlet prior for topic-word distribution (LDA-DP), the other is by augmenting the model objective function through adding terms that express preferences on expectations of sentiment labels of the lexicon words using generalized expectation criteria (LDA-GE). We conducted extensive experiments on English movie review data and multi-domain sentiment dataset as well as Chinese product reviews about mobile phones, digital cameras, MP3 players, and monitors. The results show that while both LDA-DP and LDAGE perform comparably to existing weakly supervised sentiment classification algorithms, they are much simpler and computationally efficient, rendering themmore suitable for online and real-time sentiment classification on the Web. We observed that LDA-GE is more effective than LDA-DP, suggesting that it should be preferred when considering employing the topic model for sentiment analysis. Moreover, both models are able to extract highly domain-salient polarity words from text.
Resumo:
We propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon. Preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than exiting weakly-supervised sentiment classification methods despite using no labeled documents.
Resumo:
Previous research into formulaic language has focussed on specialised groups of people (e.g. L1 acquisition by infants and adult L2 acquisition) with ordinary adult native speakers of English receiving less attention. Additionally, whilst some features of formulaic language have been used as evidence of authorship (e.g. the Unabomber’s use of you can’t eat your cake and have it too) there has been no systematic investigation into this as a potential marker of authorship. This thesis reports the first full-scale study into the use of formulaic sequences by individual authors. The theory of formulaic language hypothesises that formulaic sequences contained in the mental lexicon are shaped by experience combined with what each individual has found to be communicatively effective. Each author’s repertoire of formulaic sequences should therefore differ. To test this assertion, three automated approaches to the identification of formulaic sequences are tested on a specially constructed corpus containing 100 short narratives. The first approach explores a limited subset of formulaic sequences using recurrence across a series of texts as the criterion for identification. The second approach focuses on a word which frequently occurs as part of formulaic sequences and also investigates alternative non-formulaic realisations of the same semantic content. Finally, a reference list approach is used. Whilst claiming authority for any reference list can be difficult, the proposed method utilises internet examples derived from lists prepared by others, a procedure which, it is argued, is akin to asking large groups of judges to reach consensus about what is formulaic. The empirical evidence supports the notion that formulaic sequences have potential as a marker of authorship since in some cases a Questioned Document was correctly attributed. Although this marker of authorship is not universally applicable, it does promise to become a viable new tool in the forensic linguist’s tool-kit.
Resumo:
Browsing constitutes an important part of the user information searching process on the Web. In this paper, we present a browser plug-in called ESpotter, which recognizes entities of various types on Web pages and highlights them according to their types to assist user browsing. ESpotter uses a range of standard named entity recognition techniques. In addition, a key new feature of ESpotter is that it addresses the problem of multiple domains on the Web by adapting lexicon and patterns to these domains.
Resumo:
This study for the first time demonstrates and analyses the full extent of Danish impressionist writer Herman Bang’s influence on one of Germany’s major authors, Thomas Mann. Mann was an avid reader of Bang’s works and he regarded the Scandinavian writer as a kindred spirit, a “brother up north”, who “taught [him] much”. It has previously been accepted that Bang was an inspiration for Mann in his formative years. However, as this study conclusively shows, references to Bang’s works occur throughout Mann’s writings, from the early novellas to the late novels. The book argues that Mann was not only impressed by Bang’s highly individual style of impressionist writing but that his fascination for Bang’s works was to a large extent based on this author’s recurrent depiction of decadence, his handling of artistic motifs and his treatment of erotic themes. Bang’s topical focus on the problematically isolated lives of artists and aristocrats as well as his insights on the destructive nature of love and sexuality – particularly of homoerotic desire – were surprisingly similar to Mann’s own views on these topics and yet provoked him to produce heavily referenced counter versions of Bang’s works. This phenomenon is explored in the context of Mann’s struggle with his own homosexuality and the attraction that death and decadence exerted over him. Most of Mann’s writings are in that way indebted to Bang. In addition, Mann’s frequent use of homoerotic subtexts and his depiction of female characters were noticeably influenced by Bang’s literary techniques. All these different, yet closely interlinked, aspects of Mann’s creative appropriation of Bang’s works are analysed and discussed in this study. To conclude, Mann’s references to Bang’s works are schematised and an attempt is made to characterise Mann’s intertextual practice in general in the context of his famous use of irony.
Resumo:
The influence of text messaging on language has been hotly debated especially in relation to spelling and the lexicon, but the impact of SMS on syntax has received less attention.This article focuses on manipulations within the verbal domain, as language evolution points towards a consistent trend going from synthetic to analytical forms (Bybee et al. 1994), which goes against the need for concision in texting. Based on an authentic corpus of about 500 SMS (Fairon et al. 2006b), the present study shows condensation strategies that are similar to those already described, yet reveals specific features such as the absence of aphaeresis and the scarcity of apocope, as well as the overuse of synthetic forms. It can thus be concluded that while SMS writing displays oral characteristics, it cannot obviously be assimilated to speech; in addition, it may well slow down language evolution and support the conservation of short standard forms.
Resumo:
The best results in the application of computer science systems to automatic translation are obtained in word processing when texts pertain to specific thematic areas, with structures well defined and a concise and limited lexicon. In this article we present a plan of systematic work for the analysis and generation of language applied to the field of pharmaceutical leaflet, a type of document characterized by format rigidity and precision in the use of lexicon. We propose a solution based in the use of one interlingua as language pivot between source and target languages; we are considering Spanish and Arab languages in this case of application.
Resumo:
Sentiment analysis on Twitter has attracted much attention recently due to its wide applications in both, commercial and public sectors. In this paper we present SentiCircles, a lexicon-based approach for sentiment analysis on Twitter. Different from typical lexicon-based approaches, which offer a fixed and static prior sentiment polarities of words regardless of their context, SentiCircles takes into account the co-occurrence patterns of words in different contexts in tweets to capture their semantics and update their pre-assigned strength and polarity in sentiment lexicons accordingly. Our approach allows for the detection of sentiment at both entity-level and tweet-level. We evaluate our proposed approach on three Twitter datasets using three different sentiment lexicons to derive word prior sentiments. Results show that our approach significantly outperforms the baselines in accuracy and F-measure for entity-level subjectivity (neutral vs. polar) and polarity (positive vs. negative) detections. For tweet-level sentiment detection, our approach performs better than the state-of-the-art SentiStrength by 4-5% in accuracy in two datasets, but falls marginally behind by 1% in F-measure in the third dataset.
Resumo:
Sentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words' sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.
Resumo:
Sentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words’ sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.
Resumo:
Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure. © 2014 Springer International Publishing.
Resumo:
Sex work is a subject of significant contestation across academic disciplines, as well as within legal, medical, moral, feminist, political and socio-cultural discourses. A large body of research exists, but much of this focuses on the sale of sex by women to men and ignores other performances, practices, meanings and embodiments in the contemporary sex industry. A queer agenda is important in order to challenge hetero-centric gender norms and to develop new insights into how gender, sex, power, crime, work, migration, space/place, health and intimacy are understood in the context of commercial sexual encounters. Queer Sex Work explores what it might mean to 'be', 'do' and 'think' queer(ly) in the study and practice of commercial sex. It brings together a multiplicity of empirical case studies - including erotic dance venues, online sex working, pornography, grey sexual economies, and BSDM - and offers a variety of perspectives from academic scholars, policy practitioners, activists and sex workers themselves. In so doing, the book advances a queer politics of sex work that aims to disrupt heteronormative logics whilst also making space for different voices in academic and political debates about commercial sex. This unique and multidisciplinary volume will be indispensable for scholars and students of the global sex trade and of gender, sexuality, feminism and queer theory more broadly, as well as policymakers, activists and practitioners interested in the politics and practice of sex work in local, national and international contexts.
Resumo:
Most research in the area of emotion detection in written text focused on detecting explicit expressions of emotions in text. In this paper, we present a rule-based pipeline approach for detecting implicit emotions in written text without emotion-bearing words based on the OCC Model. We have evaluated our approach on three different datasets with five emotion categories. Our results show that the proposed approach outperforms the lexicon matching method consistently across all the three datasets by a large margin of 17–30% in F-measure and gives competitive performance compared to a supervised classifier. In particular, when dealing with formal text which follows grammatical rules strictly, our approach gives an average F-measure of 82.7% on “Happy”, “Angry-Disgust” and “Sad”, even outperforming the supervised baseline by nearly 17% in F-measure. Our preliminary results show the feasibility of the approach for the task of implicit emotion detection in written text.
Resumo:
The aim of this dissertation is to identify, describe, and explain the common experiences defining the crack abuser's life-world. Its method is phenomenological. Using basic cybernetic premises, a neurophysiologically oriented phenomenological framework concerning the constitution of thoughts, memories, and perceptions is first written. The framework is designed to hypothetically represent the neuropathology of crack abuse within a perspective that prescinds and describes the constitution, flow, and interdependence of experience. After the framework is written, the dissertation outlines the neuro-psychopharmacology of crack abuse and delimits crack abusers as a specific group within the more general population of cocaine users. It then represents the neuropathology of crack abuse within its phenomenological framework and uses the first-person accounts of forty-two crack dependents to actualize a phenomenological sketch of the crack abuser's life-world. The ethnographies afford the possibility of writing a “thick” description of the crack abuser's daily life—one that communicates the substance, order, and subjective and cultural dimensions of the dependent's defining experiences. ^ The dissertation's goals are successfully realized. The framework written and the ethnographies recorded and transcribed, the dissertation is able to identify, describe, and to a certain extent explain some of the common experiences defining the crack abusers life-world. The dissertation concludes that the crack abuser's life-world is organized around three primary and four secondary experiences. His primary experiences include: (1) an almost complete, yet fleeting, satisfaction of the ego's innate insufficiency and sublime, erotic-like stimulation of its core, (2) a fundamental inclination and expansion of the uniquely oriented euphoria-dysphoria dynamic that vivifies and orients the flow of consciousness, and (3) a change in the ego's innate structure. His secondary experiences include: (a) a characteristic aiming of projects, actions, and conduct toward the procurement and consumption of crack, (b) a denigration in the hold of legitimations and institutionalizations on the thematic field, (c) a strict alignment and a contraction in the scope of logical types pointing to the salient experiences within the stock of knowledge, and (d) for some crack abusers, ontological insecurity, despair, and exhaustion. ^