Multilingual author profiling using svms and linguistic features


Autoria(s): Bayot, Roy; Gonçalves, Teresa
Data(s)

06/02/2017

06/02/2017

2016

Resumo

This paper describes various experiments done to investigate author profiling of tweets in 4 different languages – English, Dutch, Italian, and Spanish. Profiling consists of age and gender classification, as well as regression on 5 different person- ality dimensions – extroversion, stability, agreeableness, open- ness, and conscientiousness. Different sets of features were tested – bag-of-words, word ngrams, POS ngrams, and average of word embeddings. SVM was used as the classifier. Tfidf worked best for most English tasks while for most of the tasks from the other languages, the combination of the best features worked better.

Identificador

Roy Bayot and Teresa Gonçalves. Multilingual author profiling using svms and linguistic features. International Journal of Computational Linguistics and Applications, vol. 7, 2016

http://hdl.handle.net/10174/20659

nd

nd

498

Idioma(s)

eng

Publicador

Bahri Publications

Direitos

restrictedAccess

Tipo

article