7 resultados para implicit categorization

em Universidad Politécnica de Madrid


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to train

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a categorization module for improving the performance of a Spanish into Spanish Sign Language (LSE) translation system. This categorization module replaces Spanish words with associated tags. When implementing this module, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a phrase-based system and a Statistical Finite State Transducer (SFST). The evaluation results reveal that the BLEU has increased from 69.11% to 78.79% for the phrase-based system and from 69.84% to 75.59% for the SFST.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we investigate whether conventional text categorization methods may suffice to infer different verbal intelligence levels. This research goal relies on the hypothesis that the vocabulary that speakers make use of reflects their verbal intelligence levels. Automatic verbal intelligence estimation of users in a spoken language dialog system may be useful when defining an optimal dialog strategy by improving its adaptation capabilities. The work is based on a corpus containing descriptions (i.e. monologs) of a short film by test persons yielding different educational backgrounds and the verbal intelligence scores of the speakers. First, a one-way analysis of variance was performed to compare the monologs with the film transcription and to demonstrate that there are differences in the vocabulary used by the test persons yielding different verbal intelligence levels. Then, for the classification task, the monologs were represented as feature vectors using the classical TF–IDF weighting scheme. The Naive Bayes, k-nearest neighbors and Rocchio classifiers were tested. In this paper we describe and compare these classification approaches, define the optimal classification parameters and discuss the classification results obtained.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we describe new results and improvements to a lan-guage identification (LID) system based on PPRLM previously introduced in [1] and [2]. In this case, we use as parallel phone recognizers the ones provided by the Brno University of Technology for Czech, Hungarian, and Russian lan-guages, and instead of using traditional n-gram language models we use a lan-guage model that is created using a ranking with the most frequent and discrim-inative n-grams. In this language model approach, the distance between the ranking for the input sentence and the ranking for each language is computed, based on the difference in relative positions for each n-gram. This approach is able to model reliably longer span information than in traditional language models obtaining more reliable estimations. We also describe the modifications that we have being introducing along the time to the original ranking technique, e.g., different discriminative formulas to establish the ranking, variations of the template size, the suppression of repeated consecutive phones, and a new clus-tering technique for the ranking scores. Results show that this technique pro-vides a 12.9% relative improvement over PPRLM. Finally, we also describe re-sults where the traditional PPRLM and our ranking technique are combined.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last decades, software systems have become an intrinsic element in our daily lives. Software exists in our computers, in our cars, and even in our refrigerators. Today’s world has become heavily dependent on software and yet, we still struggle to deliver quality software products, on-time and within budget. When searching for the causes of such alarming scenario, we find concurrent voices pointing to the role of the project manager. But what is project management and what makes it so challenging? Part of the answer to this question requires a deeper analysis of why software project managers have been largely ineffective. Answering this question might assist current and future software project managers in avoiding, or at least effectively mitigating, problematic scenarios that, if unresolved, will eventually lead to additional failures. This is where anti-patterns come into play and where they can be a useful tool in identifying and addressing software project management failure. Unfortunately, anti-patterns are still a fairly recent concept, and thus, available information is still scarce and loosely organized. This thesis will attempt to help remedy this scenario. The objective of this work is to help organize existing, documented software project management anti-patterns by answering our two research questions: · What are the different anti-patterns in software project management? · How can these anti-patterns be categorized?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Comparación de los esquemas de integración temporal explícito e implícito, en la simulación del flujo sanguíneo y su interacción con la pared arterial. There are two major strategies in FSI coupling techniques: implicit and explicit. The general difference between these methodologies is how many times the data is exchanged between the fluid and solid domains at each FSI time-step. In both coupling strategies, the pressure values coming from fluid domain calculations at each time-step are exported to the solid domain, and consequently, the solid domain is analyzed with these imported forces. In contrast to the explicit coupling, in the implicit approach the fluid and solid domain’s data is exchanged several times until the convergence is achieved. Although this method may boost the numerical stabilization, it increases the computational cost due to the extra data exchanges. In cardiovascular simulations, depending on the analysis objectives, one may choose an explicit or implicit approach. In the current work, the advantage of an explicit coupling strategy is highlighted when simulation of pulsatile blood flow in elastic arteries is desired.