980 resultados para Databases


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Projeto de Pós-Graduação/Dissertação apresentado à Universidade Fernando Pessoa como parte dos requisitos para obtenção do grau de Mestre em Ciências Farmacêuticas

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The SIEGE (Smoking Induced Epithelial Gene Expression) database is a clinical resource for compiling and analyzing gene expression data from epithelial cells of the human intra-thoracic airway. This database supports a translational research study whose goal is to profile the changes in airway gene expression that are induced by cigarette smoke. RNA is isolated from airway epithelium obtained at bronchoscopy from current-, former- and never-smoker subjects, and hybridized to Affymetrix HG-U133A Genechips, which measure the level of expression of ~22 500 human transcripts. The microarray data generated along with relevant patient information is uploaded to SIEGE by study administrators using the database's web interface, found at http://pulm.bumc.bu.edu/siegeDB. PERL-coded scripts integrated with SIEGE perform various quality control functions including the processing, filtering and formatting of stored data. The R statistical package is used to import database expression values and execute a number of statistical analyses including t-tests, correlation coefficients and hierarchical clustering. Values from all statistical analyses can be queried through CGI-based tools and web forms found on the �Search� section of the database website. Query results are embedded with graphical capabilities as well as with links to other databases containing valuable gene resources, including Entrez Gene, GO, Biocarta, GeneCards, dbSNP and the NCBI Map Viewer.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND:In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.RESULTS:We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing.CONCLUSION:A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional in silico validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Paper published in PLoS Medicine in 2007.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We describe our work on shape-based image database search using the technique of modal matching. Modal matching employs a deformable shape decomposition that allows users to select example objects and have the computer efficiently sort the set of objects based on the similarity of their shape. Shapes are compared in terms of the types of nonrigid deformations (differences) that relate them. The modal decomposition provides deformation "control knobs" for flexible matching and thus allows for selecting weighted subsets of shape parameters that are deemed significant for a particular category or context. We demonstrate the utility of this approach for shape comparison in 2-D image databases; however, the general formulation is applicable to signals of any dimensionality.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent work in sensor databases has focused extensively on distributed query problems, notably distributed computation of aggregates. Existing methods for computing aggregates broadcast queries to all sensors and use in-network aggregation of responses to minimize messaging costs. In this work, we focus on uniform random sampling across nodes, which can serve both as an alternative building block for aggregation and as an integral component of many other useful randomized algorithms. Prior to our work, the best existing proposals for uniform random sampling of sensors involve contacting all nodes in the network. We propose a practical method which is only approximately uniform, but contacts a number of sensors proportional to the diameter of the network instead of its size. The approximation achieved is tunably close to exact uniform sampling, and only relies on well-known existing primitives, namely geographic routing, distributed computation of Voronoi regions and von Neumann's rejection method. Ultimately, our sampling algorithm has the same worst-case asymptotic cost as routing a point-to-point message, and thus it is asymptotically optimal among request/reply-based sampling methods. We provide experimental results demonstrating the effectiveness of our algorithm on both synthetic and real sensor topologies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A problem with Speculative Concurrency Control algorithms and other common concurrency control schemes using forward validation is that committing a transaction as soon as it finishes validating, may result in a value loss to the system. Haritsa showed that by making a lower priority transaction wait after it is validated, the number of transactions meeting their deadlines is increased, which may result in a higher value-added to the system. SCC-based protocols can benefit from the introduction of such delays by giving optimistic shadows with high value-added to the system more time to execute and commit instead of being aborted in favor of other validating transactions, whose value-added to the system is lower. In this paper we present and evaluate an extension to SCC algorithms that allows for commit deferments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

ImageRover is a search by image content navigation tool for the world wide web. To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There is an increased interest in using broadcast disks to support mobile access to real-time databases. However, previous work has only considered the design of real-time immutable broadcast disks, the contents of which do not change over time. This paper considers the design of programs for real-time mutable broadcast disks - broadcast disks whose contents are occasionally updated. Recent scheduling-theoretic results relating to pinwheel scheduling and pfair scheduling are used to design algorithms for the efficient generation of real-time mutable broadcast disk programs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of discovering frequent poly-regions (i.e. regions of high occurrence of a set of items or patterns of a given alphabet) in a sequence is studied, and three efficient approaches are proposed to solve it. The first one is entropy-based and applies a recursive segmentation technique that produces a set of candidate segments which may potentially lead to a poly-region. The key idea of the second approach is the use of a set of sliding windows over the sequence. Each sliding window covers a sequence segment and keeps a set of statistics that mainly include the number of occurrences of each item or pattern in that segment. Combining these statistics efficiently yields the complete set of poly-regions in the given sequence. The third approach applies a technique based on the majority vote, achieving linear running time with a minimal number of false negatives. After identifying the poly-regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a poly-region). An efficient algorithm for mining frequent arrangements of intervals is applied to the converted sequence to discover frequently occurring arrangements of poly-regions in different parts of DNA, including coding regions. The proposed algorithms are tested on various DNA sequences producing results of significant biological meaning.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. This paper proposes a novel method for approximate nearest neighbor retrieval in such spaces. Our method is embedding-based, meaning that it constructs a function that maps objects into a real vector space. The mapping preserves a large amount of the proximity structure of the original space, and it can be used to rapidly obtain a short list of likely matches to the query. The main novelty of our method is that it constructs, together with the embedding, a query-sensitive distance measure that should be used when measuring distances in the vector space. The term "query-sensitive" means that the distance measure changes depending on the current query object. We report experiments with an image database of handwritten digits, and a time-series database. In both cases, the proposed method outperforms existing state-of-the-art embedding methods, meaning that it provides significantly better trade-offs between efficiency and retrieval accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nearest neighbor classifiers are simple to implement, yet they can model complex non-parametric distributions, and provide state-of-the-art recognition accuracy in OCR databases. At the same time, they may be too slow for practical character recognition, especially when they rely on similarity measures that require computationally expensive pairwise alignments between characters. This paper proposes an efficient method for computing an approximate similarity score between two characters based on their exact alignment to a small number of prototypes. The proposed method is applied to both online and offline character recognition, where similarity is based on widely used and computationally expensive alignment methods, i.e., Dynamic Time Warping and the Hungarian method respectively. In both cases significant recognition speedup is obtained at the expense of only a minor increase in recognition error.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This article introduces a new neural network architecture, called ARTMAP, that autonomously learns to classify arbitrarily many, arbitrarily ordered vectors into recognition categories based on predictive success. This supervised learning system is built up from a pair of Adaptive Resonance Theory modules (ARTa and ARTb) that are capable of self-organizing stable recognition categories in response to arbitrary sequences of input patterns. During training trials, the ARTa module receives a stream {a^(p)} of input patterns, and ARTb receives a stream {b^(p)} of input patterns, where b^(p) is the correct prediction given a^(p). These ART modules are linked by an associative learning network and an internal controller that ensures autonomous system operation in real time. During test trials, the remaining patterns a^(p) are presented without b^(p), and their predictions at ARTb are compared with b^(p). Tested on a benchmark machine learning database in both on-line and off-line simulations, the ARTMAP system learns orders of magnitude more quickly, efficiently, and accurately than alternative algorithms, and achieves 100% accuracy after training on less than half the input patterns in the database. It achieves these properties by using an internal controller that conjointly maximizes predictive generalization and minimizes predictive error by linking predictive success to category size on a trial-by-trial basis, using only local operations. This computation increases the vigilance parameter ρa of ARTa by the minimal amount needed to correct a predictive error at ARTb· Parameter ρa calibrates the minimum confidence that ARTa must have in a category, or hypothesis, activated by an input a^(p) in order for ARTa to accept that category, rather than search for a better one through an automatically controlled process of hypothesis testing. Parameter ρa is compared with the degree of match between a^(p) and the top-down learned expectation, or prototype, that is read-out subsequent to activation of an ARTa category. Search occurs if the degree of match is less than ρa. ARTMAP is hereby a type of self-organizing expert system that calibrates the selectivity of its hypotheses based upon predictive success. As a result, rare but important events can be quickly and sharply distinguished even if they are similar to frequent events with different consequences. Between input trials ρa relaxes to a baseline vigilance pa When ρa is large, the system runs in a conservative mode, wherein predictions are made only if the system is confident of the outcome. Very few false-alarm errors then occur at any stage of learning, yet the system reaches asymptote with no loss of speed. Because ARTMAP learning is self stabilizing, it can continue learning one or more databases, without degrading its corpus of memories, until its full memory capacity is utilized.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

To investigate micronutrient intakes and the role of nutritional supplements in the diets of Irish adults aged 18-64 years and pre-school children aged 1-4 years. Analysis is based on data from the National Adult Nutrition Survey (NANS) (n=1274) and the National Pre-School Nutrition Survey (NPNS) (n=500). Food and beverage intakes and nutritional supplement use were recorded using 4-day food records. Nutrients were estimated using WISP© which is based on McCance and Widdowson’s The Composition of Foods, 6thEd and the Irish Food Composition Database. “Meats”, “milk/yoghurt”, “breads”, “fruit/fruit juices” and “breakfast cereals” made important contributions to the intakes of a number of micronutrients. Micronutrient intakes were generally adequate, with the exception of iron (in adult females and 1 year olds) and vitamin D (in all population groups). For iron, zinc, copper and vitamin B6, up to 2% of adults had intakes that exceeded the upper limit (UL). Small proportions of children had intakes of zinc (11%), copper (2%), retinol (4%) and folic acid (5%) exceeding the UL. Nutritional supplements (predominantly multivitamin and/or mineral preparations) were consumed by 28% of adults and 20% of pre-school children. Among users, supplements were effective in reducing the % with inadequate intakes for vitamins A and D (both population groups) and iron (adult females only). Supplement users had a lower prevalence of inadequate intakes for vitamin A and iron compared to non-users. In adults only, users had a lower prevalence of inadequate intakes for magnesium, calcium and zinc, and displayed better compliance with dietary recommendations and lifestyle characteristics compared with non-users. There is poor compliance among women of childbearing age for the recommendation to take a supplement containing 400µg/day of folic acid. These findings are important for the development of nutrition policies and future recommendations for adults and pre-school children in Ireland and the EU.