Biblioteca Digital

833 resultados para Edit distance

PartSS : An efficient partition-based filtering for edit distance constraints

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces PartSS, a new partition-based fil- tering for tasks performing string comparisons under edit distance constraints. PartSS offers improvements over the state-of-the-art method NGPP with the implementation of a new partitioning scheme and also improves filtering abil- ities by exploiting theoretical results on shifting and scaling ranges, thus accelerating the rate of calculating edit distance between strings. PartSS filtering has been implemented within two major tasks of data integration: similarity join and approximate membership extraction under edit distance constraints. The evaluation on an extensive range of real-world datasets demonstrates major gain in efficiency over NGPP and QGrams approaches.

Task-specific minimum Bayes-risk decoding using learned edit distance

Relevância:

100.00% 100.00%

Publicador:

Extending the Edit Distance for Musical Applications

Relevância:

100.00% 100.00%

Publicador:

Edit distance based kernel functions for structural pattern classification

Relevância:

100.00% 100.00%

Publicador:

A Random Walk Kernel Derived from Graph Edit Distance

Relevância:

100.00% 100.00%

Publicador:

Fast suboptimal algorithms for the computation of graph edit distance

Relevância:

100.00% 100.00%

Publicador:

Bridging the Gap between Graph Edit Distance and Kernel Machines

Relevância:

100.00% 100.00%

Publicador:

Speeding up Graph Edit Distance Computation with a Bipartite Heuristic

Relevância:

100.00% 100.00%

Publicador:

Automatic learning of cost functions for graph edit distance

Relevância:

100.00% 100.00%

Publicador:

A quadratic programming approach to the graph edit distance problem

Relevância:

100.00% 100.00%

Publicador:

Bipartite graph matching for computing the edit distance of graphs

Relevância:

100.00% 100.00%

Publicador:

Graph edit distance - optimal and suboptimal algorithms with application

Relevância:

100.00% 100.00%

Publicador:

Approximate graph edit distance computation by means of bipartite graph matching

Relevância:

100.00% 100.00%

Publicador:

Bounds on edit metric codes with combinatorial DNA constraints

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The design of a large and reliable DNA codeword library is a key problem in DNA based computing. DNA codes, namely sets of fixed length edit metric codewords over the alphabet {A, C, G, T}, satisfy certain combinatorial constraints with respect to biological and chemical restrictions of DNA strands. The primary constraints that we consider are the reverse--complement constraint and the fixed GC--content constraint, as well as the basic edit distance constraint between codewords. We focus on exploring the theory underlying DNA codes and discuss several approaches to searching for optimal DNA codes. We use Conway's lexicode algorithm and an exhaustive search algorithm to produce provably optimal DNA codes for codes with small parameter values. And a genetic algorithm is proposed to search for some sub--optimal DNA codes with relatively large parameter values, where we can consider their sizes as reasonable lower bounds of DNA codes. Furthermore, we provide tables of bounds on sizes of DNA codes with length from 1 to 9 and minimum distance from 1 to 9.

Methods for Answer Extraction in Textual Question Answering

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.

«
1
2
3
4
5
6
7
8
...
55
56
»