5 resultados para bag-of-features

em Repositório Científico da Universidade de Évora - Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic tag- gers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and F1. Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntac- tic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achiev- ing (in almost every verb) an improvement in F1 when compared to the traditional bow approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes various experiments done to investigate author profiling of tweets in 4 different languages – English, Dutch, Italian, and Spanish. Profiling consists of age and gender classification, as well as regression on 5 different person- ality dimensions – extroversion, stability, agreeableness, open- ness, and conscientiousness. Different sets of features were tested – bag-of-words, word ngrams, POS ngrams, and average of word embeddings. SVM was used as the classifier. Tfidf worked best for most English tasks while for most of the tasks from the other languages, the combination of the best features worked better.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper proposes a process for the classifi cation of new residential electricity customers. The current state of the art is extended by using a combination of smart metering and survey data and by using model-based feature selection for the classifi cation task. Firstly, the normalized representative consumption profi les of the population are derived through the clustering of data from households. Secondly, new customers are classifi ed using survey data and a limited amount of smart metering data. Thirdly, regression analysis and model-based feature selection results explain the importance of the variables and which are the drivers of diff erent consumption profi les, enabling the extraction of appropriate models. The results of a case study show that the use of survey data signi ficantly increases accuracy of the classifi cation task (up to 20%). Considering four consumption groups, more than half of the customers are correctly classifi ed with only one week of metering data, with more weeks the accuracy is signifi cantly improved. The use of model-based feature selection resulted in the use of a signifi cantly lower number of features allowing an easy interpretation of the derived models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Markets are increasingly competitive and the companies feel the urge to improve their manufacturing processes. Blending that with a larger control of quality and safety it was created a need to develop new methods of analysis each time more accurate, faster and with lower costs. Alentejo is a region with a wide variety of soils, most of them are rich in calcium and potassium. In the production of sparkling wine many wineries use encapsulated yeast in alginate beads, instead of the traditional method, champenoise. The first method is faster, allowing a more versatile production, reducing the risk of contamination and features organoleptic characteristics similar to the traditional method (yeast free). However, encapsulated yeast spheres should be only used if the base wine matches a number of features, among them calcium content. In this study the calcium content in the wine was determined by atomic absorption spectroscopy (AAS) and by near-infrared spectroscopy. The AAS is a high sensitivity method clearly produces a reliable result, however it is very time consuming and produces great quantities of environmental waste, therefore the possibility of using near-infrared spectroscopy as a method was studied to be a fast, simple and clean alternative to the AAS. It was obtained a calibration model with a variation coefficient higher than 0.80 which indicates that the near-infrared spectroscopy as an adequately alternative the ASS.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The main purpose of this study is to evaluate the best set of features that automatically enables the identification of argumentative sentences from unstructured text. As corpus, we use case laws from the European Court of Human Rights (ECHR). Three kinds of experiments are conducted: Basic Experiments, Multi Feature Experiments and Tree Kernel Experiments. These experiments are basically categorized according to the type of features available in the corpus. The features are extracted from the corpus and Support Vector Machine (SVM) and Random Forest are the used as Machine learning algorithms. We achieved F1 score of 0.705 for identifying the argumentative sentences which is quite promising result and can be used as the basis for a general argument-mining framework.