5 resultados para Gender classification model
em Repositório Científico da Universidade de Évora - Portugal
Resumo:
This paper presents our work at 2016 FIRE CHIS. Given a CHIS query and a document associated with that query, the task is to classify the sentences in the document as relevant to the query or not; and further classify the relevant sentences to be supporting, neutral or opposing to the claim made in the query. In this paper, we present two different approaches to do the classification. With the first approach, we implement two models to satisfy the task. We first implement an information retrieval model to retrieve the sentences that are relevant to the query; and then we use supervised learning method to train a classification model to classify the relevant sentences into support, oppose or neutral. With the second approach, we only use machine learning techniques to learn a model and classify the sentences into four classes (relevant & support, relevant & neutral, relevant & oppose, irrelevant & neutral). Our submission for CHIS uses the first approach.
Resumo:
In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.
Resumo:
This paper describes various experiments done to investigate author profiling of tweets in 4 different languages – English, Dutch, Italian, and Spanish. Profiling consists of age and gender classification, as well as regression on 5 different person- ality dimensions – extroversion, stability, agreeableness, open- ness, and conscientiousness. Different sets of features were tested – bag-of-words, word ngrams, POS ngrams, and average of word embeddings. SVM was used as the classifier. Tfidf worked best for most English tasks while for most of the tasks from the other languages, the combination of the best features worked better.
Resumo:
This paper presents our approach of identifying the profile of an unknown user based on the activities of known users. The aim of author profiling task of PAN@CLEF 2016 is cross-genre identification of the gender and age of an unknown user. This means training the system using the behavior of different users from one social media platform and identifying the profile of other user on some different platform. Instead of using single classifier to build the system we used a combination of different classifiers, also known as stacking. This approach allowed us explore the strength of all the classifiers and minimize the bias or error enforced by a single classifier.
Resumo:
Models based on species distributions are widely used and serve important purposes in ecology, biogeography and conservation. Their continuous predictions of environmental suitability are commonly converted into a binary classification of predicted (or potential) presences and absences, whose accuracy is then evaluated through a number of measures that have been the subject of recent reviews. We propose four additional measures that analyse observation-prediction mismatch from a different angle – namely, from the perspective of the predicted rather than the observed area – and add to the existing toolset of model evaluation methods. We explain how these measures can complete the view provided by the existing measures, allowing further insights into distribution model predictions. We also describe how they can be particularly useful when using models to forecast the spread of diseases or of invasive species and to predict modifications in species’ distributions under climate and land-use change