Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016


Autoria(s): Bayot, Roy; Gonçalves, Teresa
Data(s)

06/02/2017

06/02/2017

01/09/2016

Resumo

In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.

Erasmus Mundus EMMA-WEST project

Identificador

Roy Bayot and Teresa Gonçalves. Author Profiling using SVMs and Word Embedding Averages — Notebook for PAN at CLEF 2016. In Krisztian Balog, Linda Cappellato, Nicola Ferro, and Craig Macdonald, editors, Working Notes of CLEF’2016 – Conference and Labs of the Evaluation forum, Évora, Portugal, 5-8 September, 2016., volume 1609, pages 815–823, Évora, PT, September 2016. CEUR.

http://hdl.handle.net/10174/20667

nd

nd

498

Idioma(s)

eng

Publicador

CEUR

Direitos

openAccess

Tipo

article