980 resultados para Databases


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis describes a system that synthesizes regularity exposing attributes from large protein databases. After processing primary and secondary structure data, this system discovers an amino acid representation that captures what are thought to be the three most important amino acid characteristics (size, charge, and hydrophobicity) for tertiary structure prediction. A neural network trained using this 16 bit representation achieves a performance accuracy on the secondary structure prediction problem that is comparable to the one achieved by a neural network trained using the standard 24 bit amino acid representation. In addition, the thesis describes bounds on secondary structure prediction accuracy, derived using an optimal learning algorithm and the probably approximately correct (PAC) model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a new class of Concurrency Control Algorithms that is especially suited for real-time database applications. Our approach relies on the use of (potentially) redundant computations to ensure that serializable schedules are found and executed as early as possible, thus, increasing the chances of a timely commitment of transactions with strict timing constraints. Due to its nature, we term our concurrency control algorithms Speculative. The aforementioned description encompasses many algorithms that we call collectively Speculative Concurrency Control (SCC) algorithms. SCC algorithms combine the advantages of both Pessimistic and Optimistic Concurrency Control (PCC and OCC) algorithms, while avoiding their disadvantages. On the one hand, SCC resembles PCC in that conflicts are detected as early as possible, thus making alternative schedules available in a timely fashion in case they are needed. On the other hand, SCC resembles OCC in that it allows conflicting transactions to proceed concurrently, thus avoiding unnecessary delays that may jeopardize their timely commitment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Various concurrency control algorithms differ in the time when conflicts are detected, and in the way they are resolved. In that respect, the Pessimistic and Optimistic Concurrency Control (PCC and OCC) alternatives represent two extremes. PCC locking protocols detect conflicts as soon as they occur and resolve them using blocking. OCC protocols detect conflicts at transaction commit time and resolve them using rollbacks (restarts). For real-time databases, blockages and rollbacks are hazards that increase the likelihood of transactions missing their deadlines. We propose a Speculative Concurrency Control (SCC) technique that minimizes the impact of blockages and rollbacks. SCC relies on the use of added system resources to speculate on potential serialization orders and to ensure that if such serialization orders materialize, the hazards of blockages and roll-backs are minimized. We present a number of SCC-based algorithms that differ in the level of speculation they introduce, and the amount of system resources (mainly memory) they require. We show the performance gains (in terms of number of satisfied timing constraints) to be expected when a representative SCC algorithm (SCC-2S) is adopted.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we discuss a new type of query in Spatial Databases, called Trip Planning Query (TPQ). Given a set of points P in space, where each point belongs to a category, and given two points s and e, TPQ asks for the best trip that starts at s, passes through exactly one point from each category, and ends at e. An example of a TPQ is when a user wants to visit a set of different places and at the same time minimize the total travelling cost, e.g. what is the shortest travelling plan for me to visit an automobile shop, a CVS pharmacy outlet, and a Best Buy shop along my trip from A to B? The trip planning query is an extension of the well-known TSP problem and therefore is NP-hard. The difficulty of this query lies in the existence of multiple choices for each category. In this paper, we first study fast approximation algorithms for the trip planning query in a metric space, assuming that the data set fits in main memory, and give the theory analysis of their approximation bounds. Then, the trip planning query is examined for data sets that do not fit in main memory and must be stored on disk. For the disk-resident data, we consider two cases. In one case, we assume that the points are located in Euclidean space and indexed with an Rtree. In the other case, we consider the problem of points that lie on the edges of a spatial network (e.g. road network) and the distance between two points is defined using the shortest distance over the network. Finally, we give an experimental evaluation of the proposed algorithms using synthetic data sets generated on real road networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We describe a method for shape-based image database search that uses deformable prototypes to represent categories. Rather than directly comparing a candidate shape with all shape entries in the database, shapes are compared in terms of the types of nonrigid deformations (differences) that relate them to a small subset of representative prototypes. To solve the shape correspondence and alignment problem, we employ the technique of modal matching, an information-preserving shape decomposition for matching, describing, and comparing shapes despite sensor variations and nonrigid deformations. In modal matching, shape is decomposed into an ordered basis of orthogonal principal components. We demonstrate the utility of this approach for shape comparison in 2-D image databases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose and evaluate an admission control paradigm for RTDBS, in which a transaction is submitted to the system as a pair of processes: a primary task, and a recovery block. The execution requirements of the primary task are not known a priori, whereas those of the recovery block are known a priori. Upon the submission of a transaction, an Admission Control Mechanism is employed to decide whether to admit or reject that transaction. Once admitted, a transaction is guaranteed to finish executing before its deadline. A transaction is considered to have finished executing if exactly one of two things occur: Either its primary task is completed (successful commitment), or its recovery block is completed (safe termination). Committed transactions bring a profit to the system, whereas a terminated transaction brings no profit. The goal of the admission control and scheduling protocols (e.g., concurrency control, I/O scheduling, memory management) employed in the system is to maximize system profit. We describe a number of admission control strategies and contrast (through simulations) their relative performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report summarizes the technical presentations and discussions that took place during RTDB'96: the First International Workshop on Real-Time Databases, which was held on March 7 and 8, 1996 in Newport Beach, California. The main goals of this project were to (1) review recent advances in real-time database systems research, (2) to promote interaction among real-time database researchers and practitioners, and (3) to evaluate the maturity and directions of real-time database technology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time, finding nearest neighbors accurately and efficiently can be challenging, especially when the database contains a large number of objects, and when the underlying distance measure is computationally expensive. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed methods are domain-independent, and can be applied in arbitrary spaces, including non-Euclidean and non-metric spaces. In this thesis particular emphasis is given to computer vision applications related to object and shape recognition, where expensive non-Euclidean distance measures are often needed to achieve high accuracy. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently - often orders of magnitude faster than retrieval using the exact distance measure in the original space. The BoostMap algorithm has two key distinguishing features with respect to existing embedding methods. First, embedding construction explicitly maximizes the amount of nearest neighbor information preserved by the embedding. Second, embedding construction is treated as a machine learning problem, in contrast to existing methods that are based on geometric considerations. The second contribution is a method for constructing query-sensitive distance measures for the purposes of nearest neighbor retrieval and classification. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. It is shown theoretically and experimentally that query-sensitivity increases the modeling power of embeddings, allowing embeddings to capture a larger amount of the nearest neighbor structure of the original space. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade. In a cascade, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. An interesting property of the proposed cascade method is that, under certain conditions, classification time actually decreases as the size of the database increases, a behavior that is in stark contrast to the behavior of typical nearest neighbor classification systems. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods. In some datasets, the general-purpose methods introduced in this thesis even outperform domain-specific methods that have been custom-designed for such datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In an outsourced database system the data owner publishes information through a number of remote, untrusted servers with the goal of enabling clients to access and query the data more efficiently. As clients cannot trust servers, query authentication is an essential component in any outsourced database system. Clients should be given the capability to verify that the answers provided by the servers are correct with respect to the actual data published by the owner. While existing work provides authentication techniques for selection and projection queries, there is a lack of techniques for authenticating aggregation queries. This article introduces the first known authenticated index structures for aggregation queries. First, we design an index that features good performance characteristics for static environments, where few or no updates occur to the data. Then, we extend these ideas and propose more involved structures for the dynamic case, where the database owner is allowed to update the data arbitrarily. Our structures feature excellent average case performance for authenticating queries with multiple aggregate attributes and multiple selection predicates. We also implement working prototypes of the proposed techniques and experimentally validate the correctness of our ideas.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper shows how knowledge, in the form of fuzzy rules, can be derived from a self-organizing supervised learning neural network called fuzzy ARTMAP. Rule extraction proceeds in two stages: pruning removes those recognition nodes whose confidence index falls below a selected threshold; and quantization of continuous learned weights allows the final system state to be translated into a usable set of rules. Simulations on a medical prediction problem, the Pima Indian Diabetes (PID) database, illustrate the method. In the simulations, pruned networks about 1/3 the size of the original actually show improved performance. Quantization yields comprehensible rules with only slight degradation in test set prediction performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we look at ways of delivering and assessing learning on database units offered on higher degree programmes (MSc) in the School of Computing and Mathematical Sciences at the University of Greenwich. Of critical importance is the teaching methods employed for verbal disposition, practical laboratory exercises and a careful evaluation of assessment methods and assessment tools in view of the fact that databases involve not only database design but also use of practical tools, such as database management systems (DBMSs) software, human designers, database administrators (DBA) and end users. Our goal is to clearly identify potential key success factors in delivering and assessing learning in both practical and theoretical aspects of database course units.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador: