37 resultados para trä
Resumo:
[EU]Lan honetan, adibideetan oinarritutako patroi batzuk sortu ditugu, erregeletan oinarritutako itzulpen-sistema automatiko bat hobetzeko asmoz. Patroirik erabilgarrienak emango zituzten adibideak bakarrik hartzeko, euren erabilera-maiztasunari eta itzulpen automatikoen egokitasunari erreparatu diegu. Ondoren, adibideetako entitate-izenak eta zenbakiak orokortu ditugu, elementu horiek aldatuta ere, patroiak erabili ahal izateko.
Resumo:
Accurate and fast decoding of speech imagery from electroencephalographic (EEG) data could serve as a basis for a new generation of brain computer interfaces (BCIs), more portable and easier to use. However, decoding of speech imagery from EEG is a hard problem due to many factors. In this paper we focus on the analysis of the classification step of speech imagery decoding for a three-class vowel speech imagery recognition problem. We empirically show that different classification subtasks may require different classifiers for accurately decoding and obtain a classification accuracy that improves the best results previously published. We further investigate the relationship between the classifiers and different sets of features selected by the common spatial patterns method. Our results indicate that further improvement on BCIs based on speech imagery could be achieved by carefully selecting an appropriate combination of classifiers for the subtasks involved.
Resumo:
The Linear Ordering Problem is a popular combinatorial optimisation problem which has been extensively addressed in the literature. However, in spite of its popularity, little is known about the characteristics of this problem. This paper studies a procedure to extract static information from an instance of the problem, and proposes a method to incorporate the obtained knowledge in order to improve the performance of local search-based algorithms. The procedure introduced identifies the positions where the indexes cannot generate local optima for the insert neighbourhood, and thus global optima solutions. This information is then used to propose a restricted insert neighbourhood that discards the insert operations which move indexes to positions where optimal solutions are not generated. In order to measure the efficiency of the proposed restricted insert neighbourhood system, two state-of-the-art algorithms for the LOP that include local search procedures have been modified. Conducted experiments confirm that the restricted versions of the algorithms outperform the classical designs systematically. The statistical test included in the experimentation reports significant differences in all the cases, which validates the efficiency of our proposal.
Resumo:
[EN]The Mallows and Generalized Mallows models are compact yet powerful and natural ways of representing a probability distribution over the space of permutations. In this paper we deal with the problems of sampling and learning (estimating) such distributions when the metric on permutations is the Cayley distance. We propose new methods for both operations, whose performance is shown through several experiments. We also introduce novel procedures to count and randomly generate permutations at a given Cayley distance both with and without certain structural restrictions. An application in the field of biology is given to motivate the interest of this model.
Resumo:
[EN]In this paper we deal with distributions over permutation spaces. The Mallows model is the mode l in use. The associated distance for permutations is the Hamming distance.
Resumo:
[EN]In this paper we deal with probability distributions over permutation spaces. The Probability model in use is the Mallows model. The distance for permutations that the model uses in the Ulam distance.
Resumo:
[EN]Probability models on permutations associate a probability value to each of the permutations on n items. This paper considers two popular probability models, the Mallows model and the Generalized Mallows model. We describe methods for making inference, sampling and learning such distributions, some of which are novel in the literature. This paper also describes operations for permutations, with special attention in those related with the Kendall and Cayley distances and the random generation of permutations. These operations are of key importance for the efficient computation of the operations on distributions. These algorithms are implemented in the associated R package. Moreover, the internal code is written in C++.
Resumo:
[EN]Based on the theoretical tools of Complex Networks, this work provides a basic descriptive study of a synonyms dictionary, the Spanish Open Thesaurus represented as a graph. We study the main structural measures of the network compared with those of a random graph. Numerical results show that Open-Thesaurus is a graph whose topological properties approximate a scale-free network, but seems not to present the small-world property because of its sparse structure. We also found that the words of highest betweenness centrality are terms that suggest the vocabulary of psychoanalysis: placer (pleasure), ayudante (in the sense of assistant or worker), and regular (to regulate).
Resumo:
The learning of probability distributions from data is a ubiquitous problem in the fields of Statistics and Artificial Intelligence. During the last decades several learning algorithms have been proposed to learn probability distributions based on decomposable models due to their advantageous theoretical properties. Some of these algorithms can be used to search for a maximum likelihood decomposable model with a given maximum clique size, k, which controls the complexity of the model. Unfortunately, the problem of learning a maximum likelihood decomposable model given a maximum clique size is NP-hard for k > 2. In this work, we propose a family of algorithms which approximates this problem with a computational complexity of O(k · n^2 log n) in the worst case, where n is the number of implied random variables. The structures of the decomposable models that solve the maximum likelihood problem are called maximal k-order decomposable graphs. Our proposals, called fractal trees, construct a sequence of maximal i-order decomposable graphs, for i = 2, ..., k, in k − 1 steps. At each step, the algorithms follow a divide-and-conquer strategy based on the particular features of this type of structures. Additionally, we propose a prune-and-graft procedure which transforms a maximal k-order decomposable graph into another one, increasing its likelihood. We have implemented two particular fractal tree algorithms called parallel fractal tree and sequential fractal tree. These algorithms can be considered a natural extension of Chow and Liu’s algorithm, from k = 2 to arbitrary values of k. Both algorithms have been compared against other efficient approaches in artificial and real domains, and they have shown a competitive behavior to deal with the maximum likelihood problem. Due to their low computational complexity they are especially recommended to deal with high dimensional domains.
Resumo:
Recently, probability models on rankings have been proposed in the field of estimation of distribution algorithms in order to solve permutation-based combinatorial optimisation problems. Particularly, distance-based ranking models, such as Mallows and Generalized Mallows under the Kendall’s-t distance, have demonstrated their validity when solving this type of problems. Nevertheless, there are still many trends that deserve further study. In this paper, we extend the use of distance-based ranking models in the framework of EDAs by introducing new distance metrics such as Cayley and Ulam. In order to analyse the performance of the Mallows and Generalized Mallows EDAs under the Kendall, Cayley and Ulam distances, we run them on a benchmark of 120 instances from four well known permutation problems. The conducted experiments showed that there is not just one metric that performs the best in all the problems. However, the statistical test pointed out that Mallows-Ulam EDA is the most stable algorithm among the studied proposals.
Resumo:
To interpret the temporal information on texts, a mark-up language that will code that information is needed, in order to make that information automatically reachable. The most used mark-up language is TimeML (Pustejovsky et al., 2003), which has also been choosen for Basque. In this guidelines we present the Basque version of ISO-TimeML (ISO-TimeML working group, 2008). After having analysed the tags, attributes and values created for English, we describe the most appropriate ones to represent Basque time structures’ information.
Resumo:
More and more users aim at taking advantage of the existing Linked Open Data environment to formulate a query over a dataset and to then try to process the same query over different datasets, one after another, in order to obtain a broader set of answers. However, the heterogeneity of vocabularies used in the datasets on the one side, and the fact that the number of alignments among those datasets is scarce on the other, makes that querying task difficult for them. Considering this scenario we present in this paper a proposal that allows on demand translations of queries formulated over an original dataset, into queries expressed using the vocabulary of a targeted dataset. Our approach relieves users from knowing the vocabulary used in the targeted datasets and even more it considers situations where alignments do not exist or they are not suitable for the formulated query. Therefore, in order to favour the possibility of getting answers, sometimes there is no guarantee of obtaining a semantically equivalent translation. The core component of our proposal is a query rewriting model that considers a set of transformation rules devised from a pragmatic point of view. The feasibility of our scheme has been validated with queries defined in well known benchmarks and SPARQL endpoint logs, as the obtained results confirm.
Resumo:
[EN]In this report we present the tags we use when annotating the gold standard of syntactic functions and the decisions taken during its annotation. The gold standard is a necessary resource to evaluate the rulebased surface syntactic parser (the one based on the Constraint Grammar formalism), and, moreover, it can be useful to develop and evaluate statistical parsers. The tags we are presenting here follow the Constraint Grammar (CG) formalism (Karlsson et al., 1995). In fact, last experiments show that good results have been obtained when parsing with CG (Karlsson et al., 1995; Samuelsson and Voutilainen,1997; Tapanainen and Järvinen, 1997; Bick, 2000).
Resumo:
In this report we present the results obtained analysing the use, frequency of use and the position of adverbial clauses. This analysis has been performed in the Basque Dependency Treebank (BDT). We also have used the descriptive grammars of Euskaltzaindia, the Royal Academy of the Basque.
Resumo:
In recent years, the performance of semi-supervised learning has been theoretically investigated. However, most of this theoretical development has focussed on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover [1] [2] to the multi-class paradigm. Particularly, we consider the key problem in semi-supervised learning of classifying an unseen instance x into one of K different classes, using a training dataset sampled from a mixture density distribution and composed of l labelled records and u unlabelled examples. Even under the assumption of identifiability of the mixture and having infinite unlabelled examples, labelled records are needed to determine the K decision regions. Therefore, in this paper, we first investigate the minimum number of labelled examples needed to accomplish that task. Then, we propose an optimal multi-class learning algorithm which is a generalisation of the optimal procedure proposed in the literature for binary problems. Finally, we make use of this generalisation to study the probability of error when the binary class constraint is relaxed.