22 resultados para DATABASES
Resumo:
The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.
Resumo:
ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.
Resumo:
There is an increased interest in using broadcast disks to support mobile access to real-time databases. However, previous work has only considered the design of real-time immutable broadcast disks, the contents of which do not change over time. This paper considers the design of programs for real-time mutable broadcast disks - broadcast disks whose contents are occasionally updated. Recent scheduling-theoretic results relating to pinwheel scheduling and pfair scheduling are used to design algorithms for the efficient generation of real-time mutable broadcast disk programs.
Resumo:
The problem of discovering frequent poly-regions (i.e. regions of high occurrence of a set of items or patterns of a given alphabet) in a sequence is studied, and three efficient approaches are proposed to solve it. The first one is entropy-based and applies a recursive segmentation technique that produces a set of candidate segments which may potentially lead to a poly-region. The key idea of the second approach is the use of a set of sliding windows over the sequence. Each sliding window covers a sequence segment and keeps a set of statistics that mainly include the number of occurrences of each item or pattern in that segment. Combining these statistics efficiently yields the complete set of poly-regions in the given sequence. The third approach applies a technique based on the majority vote, achieving linear running time with a minimal number of false negatives. After identifying the poly-regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a poly-region). An efficient algorithm for mining frequent arrangements of intervals is applied to the converted sequence to discover frequently occurring arrangements of poly-regions in different parts of DNA, including coding regions. The proposed algorithms are tested on various DNA sequences producing results of significant biological meaning.
Resumo:
A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. This paper proposes a novel method for approximate nearest neighbor retrieval in such spaces. Our method is embedding-based, meaning that it constructs a function that maps objects into a real vector space. The mapping preserves a large amount of the proximity structure of the original space, and it can be used to rapidly obtain a short list of likely matches to the query. The main novelty of our method is that it constructs, together with the embedding, a query-sensitive distance measure that should be used when measuring distances in the vector space. The term "query-sensitive" means that the distance measure changes depending on the current query object. We report experiments with an image database of handwritten digits, and a time-series database. In both cases, the proposed method outperforms existing state-of-the-art embedding methods, meaning that it provides significantly better trade-offs between efficiency and retrieval accuracy.
Resumo:
Nearest neighbor classifiers are simple to implement, yet they can model complex non-parametric distributions, and provide state-of-the-art recognition accuracy in OCR databases. At the same time, they may be too slow for practical character recognition, especially when they rely on similarity measures that require computationally expensive pairwise alignments between characters. This paper proposes an efficient method for computing an approximate similarity score between two characters based on their exact alignment to a small number of prototypes. The proposed method is applied to both online and offline character recognition, where similarity is based on widely used and computationally expensive alignment methods, i.e., Dynamic Time Warping and the Hungarian method respectively. In both cases significant recognition speedup is obtained at the expense of only a minor increase in recognition error.
Resumo:
This article introduces a new neural network architecture, called ARTMAP, that autonomously learns to classify arbitrarily many, arbitrarily ordered vectors into recognition categories based on predictive success. This supervised learning system is built up from a pair of Adaptive Resonance Theory modules (ARTa and ARTb) that are capable of self-organizing stable recognition categories in response to arbitrary sequences of input patterns. During training trials, the ARTa module receives a stream {a^(p)} of input patterns, and ARTb receives a stream {b^(p)} of input patterns, where b^(p) is the correct prediction given a^(p). These ART modules are linked by an associative learning network and an internal controller that ensures autonomous system operation in real time. During test trials, the remaining patterns a^(p) are presented without b^(p), and their predictions at ARTb are compared with b^(p). Tested on a benchmark machine learning database in both on-line and off-line simulations, the ARTMAP system learns orders of magnitude more quickly, efficiently, and accurately than alternative algorithms, and achieves 100% accuracy after training on less than half the input patterns in the database. It achieves these properties by using an internal controller that conjointly maximizes predictive generalization and minimizes predictive error by linking predictive success to category size on a trial-by-trial basis, using only local operations. This computation increases the vigilance parameter ρa of ARTa by the minimal amount needed to correct a predictive error at ARTb· Parameter ρa calibrates the minimum confidence that ARTa must have in a category, or hypothesis, activated by an input a^(p) in order for ARTa to accept that category, rather than search for a better one through an automatically controlled process of hypothesis testing. Parameter ρa is compared with the degree of match between a^(p) and the top-down learned expectation, or prototype, that is read-out subsequent to activation of an ARTa category. Search occurs if the degree of match is less than ρa. ARTMAP is hereby a type of self-organizing expert system that calibrates the selectivity of its hypotheses based upon predictive success. As a result, rare but important events can be quickly and sharply distinguished even if they are similar to frequent events with different consequences. Between input trials ρa relaxes to a baseline vigilance pa When ρa is large, the system runs in a conservative mode, wherein predictions are made only if the system is confident of the outcome. Very few false-alarm errors then occur at any stage of learning, yet the system reaches asymptote with no loss of speed. Because ARTMAP learning is self stabilizing, it can continue learning one or more databases, without degrading its corpus of memories, until its full memory capacity is utilized.