890 resultados para Computational linguistics
Resumo:
ACM SIGIR; ACM SIGWEB
Resumo:
Knowledge Systems Institute Graduate School
Resumo:
The papers collected in this book cover a range of topics in semantics and pragmatics of dialogue. All these papers were presented at SemDial 2010, the 14th Workshop on the Semantics and Pragmatics of Dialogue. This 14th edition in the SemDial series, also known as PozDial, took place in Poznań (Poland) in June 2010, and was organized by the Chair of Logic and Cognitive Science (Institute of Psychology, Adam Mickiewicz University). From over 30 submissions overall, 14 were accepted as full papers for plenary presentation at the workshop, and all are included in this book. In addition, 10 were accepted as posters, and are included here as 2-4 page short papers. Finally, we also include abstracts from our keynote speakers. We hope that the ideas gathered in this book will be a valuable source of up-to-date achievements in the field, and will become a valuable inspiration for new ones. We would like to express our thanks to all those who submitted to and participated in SemDial 2010, especially the invited speakers: Dale Barr (University of Glasgow), Jonathan Ginzburg (King's College London), Jeroen Groenendijk (University of Amsterdam) and Henry Prakken (Utrecht University, The University of Groningen). Last but not least, we would like to thank everybody engaged in the workshop organization -- the chairs, the local organizing committee for their hard work in Poznań, and the programme committee members for their thorough and helpful reviews.
Resumo:
Nistor, N., Dascalu, M., Stavarache, L.L., Serafin, Y., & Trausan-Matu, S. (2015). Informal Learning in Online Knowledge Communities: Predicting Community Response to Visitor Inquiries. In G. Conole, T. Klobucar, C. Rensing, J. Konert & É. Lavoué (Eds.), 10th European Conf. on Technology Enhanced Learning (pp. 447–452). Toledo, Spain: Springer.
Resumo:
In this paper, we introduce an application of matrix factorization to produce corpus-derived, distributional
models of semantics that demonstrate cognitive plausibility. We find that word representations
learned by Non-Negative Sparse Embedding (NNSE), a variant of matrix factorization, are sparse,
effective, and highly interpretable. To the best of our knowledge, this is the first approach which
yields semantic representation of words satisfying these three desirable properties. Though extensive
experimental evaluations on multiple real-world tasks and datasets, we demonstrate the superiority
of semantic models learned by NNSE over other state-of-the-art baselines.
Resumo:
In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.
Resumo:
Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.
Resumo:
We present the results of exploratory experiments using lexical valence extracted from brain using electroencephalography (EEG) for sentiment analysis. We selected 78 English words (36 for training and 42 for testing), presented as stimuli to 3 English native speakers. EEG signals were recorded from the subjects while they performed a mental imaging task for each word stimulus. Wavelet decomposition was employed to extract EEG features from the time-frequency domain. The extracted features were used as inputs to a sparse multinomial logistic regression (SMLR) classifier for valence classification, after univariate ANOVA feature selection. After mapping EEG signals to sentiment valences, we exploited the lexical polarity extracted from brain data for the prediction of the valence of 12 sentences taken from the SemEval-2007 shared task, and compared it against existing lexical resources.
Resumo:
This paper presents a machine learning approach to sarcasm detection on Twitter in two languages – English and Czech. Although there has been some research in sarcasm detection in languages other than English (e.g., Dutch, Italian, and Brazilian Portuguese), our work is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labeled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different preprocessing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).