939 resultados para approximate string matching


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

MHCPEP (http://wehih.wehi.edu.au/mhcpep/) is a curated database comprising over 13 000 peptide sequences known to bind MHC molecules, Entries are compiled from published reports as well as from direct submissions of experimental data, Each entry contains the peptide sequence, its MHC specificity and where available, experimental method, observed activity, binding affinity, source protein and anchor positions, as well as publication references, The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW or FTP.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

MHCPEP is a curated database comprising over 9000 peptide sequences known to bind MHC molecules. Entries are compiled from published reports as well as from direct submissions of experimental data. Each entry contains the peptide sequence, its MHC specificity and, when available, experimental method, observed activity, binding affinity, source protein, anchor positions and publication references. The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW, FTP or Gopher.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Most current-generation Wireless Sensor Network (WSN) nodes are equipped with multiple sensors of various types, and therefore support for multi-tasking and multiple concurrent applications is becoming increasingly common. This trend has been fostering the design of WSNs allowing several concurrent users to deploy applications with dissimilar requirements. In this paper, we extend the advantages of a holistic programming scheme by designing a novel compiler-assisted scheduling approach (called REIS) able to identify and eliminate redundancies across applications. To achieve this useful high-level optimization, we model each user application as a linear sequence of executable instructions. We show how well-known string-matching algorithms such as the Longest Common Subsequence (LCS) and the Shortest Common Super-sequence (SCS) can be used to produce an optimal merged monolithic sequence of the deployed applications that takes into account embedded scheduling information. We show that our approach can help in achieving about 60% average energy savings in processor usage compared to the normal execution of concurrent applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-linear methods for estimating variability in time-series are currently of widespread use. Among such methods are approximate entropy (ApEn) and sample approximate entropy (SampEn). The applicability of ApEn and SampEn in analyzing data is evident and their use is increasing. However, consistency is a point of concern in these tools, i.e., the classification of the temporal organization of a data set might indicate a relative less ordered series in relation to another when the opposite is true. As highlighted by their proponents themselves, ApEn and SampEn might present incorrect results due to this lack of consistency. In this study, we present a method which gains consistency by using ApEn repeatedly in a wide range of combinations of window lengths and matching error tolerance. The tool is called volumetric approximate entropy, vApEn. We analyze nine artificially generated prototypical time-series with different degrees of temporal order (combinations of sine waves, logistic maps with different control parameter values, random noises). While ApEn/SampEn clearly fail to consistently identify the temporal order of the sequences, vApEn correctly do. In order to validate the tool we performed shuffled and surrogate data analysis. Statistical analysis confirmed the consistency of the method. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A expansão da tríplice continência em unidades com quatro ou mais elementos abriu novas perspectivas para a compreensão de comportamentos complexos, como a emergência de respostas que derivam da formação de classes de estímulos equivalentes e que modelam comportamentos simbólicos e conceituais. Na investigação experimental, o procedimento de matching to sample tem sido frequentemente empregado para estabelecer discriminações condicionais. Em particular, a obtenção do matching de identidade generalizado é considerada demonstrativa da aquisição dos conceitos de igualdade e diferença. Segundo argumentamos, o fato de se buscar a compreensão desses conceitos a partir de processos discriminativos condicionais pode ter sido responsável pelos frequentes fracassos em demonstrá-los em sujeitos não humanos. A falta de correspondência entre os processos discriminativos responsáveis por estabelecer a relação de reflexividade entre estímulos que formam classes equivalentes e o matching de identidade generalizado, nesse sentido, é aqui revista ao longo de estudos empíricos e discutida com respeito às suas implicações.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article, we evaluate the use of simple Lee-Goldburg cross-polarization (LG-CP) NMR experiments for obtaining quantitative information of molecular motion in the intermediate regime. In particular, we introduce the measurement of Hartmann-Hahn matching profiles for the assessment of heteronuclear dipolar couplings as well as dynamics as a reliable and robust alternative to the more common analysis of build-up curves. We have carried out dynamic spin dynamics simulations in order to test the method's sensitivity to intermediate motion and address its limitations concerning possible experimental imperfections. We further demonstrate the successful use of simple theoretical concepts, most prominently Anderson-Weiss (AW) theory, to analyze the data. We further propose an alternative way to estimate activation energies of molecular motions, based upon the acquisition of only two LG-CP spectra per temperature at different temperatures. As experimental tests, molecular jumps in imidazole methyl sulfonate, trimethylsulfoxonium iodide, and bisphenol A polycarbonate were investigated with the new method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a family of algorithms for approximate inference in credal networks (that is, models based on directed acyclic graphs and set-valued probabilities) that contain only binary variables. Such networks can represent incomplete or vague beliefs, lack of data, and disagreements among experts; they can also encode models based on belief functions and possibilistic measures. All algorithms for approximate inference in this paper rely on exact inferences in credal networks based on polytrees with binary variables, as these inferences have polynomial complexity. We are inspired by approximate algorithms for Bayesian networks; thus the Loopy 2U algorithm resembles Loopy Belief Propagation, while the Iterated Partial Evaluation and Structured Variational 2U algorithms are, respectively, based on Localized Partial Evaluation and variational techniques. (C) 2007 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the non-preemptive single machine scheduling problem to minimize total tardiness. We are interested in the online version of this problem, where orders arrive at the system at random times. Jobs have to be scheduled without knowledge of what jobs will come afterwards. The processing times and the due dates become known when the order is placed. The order release date occurs only at the beginning of periodic intervals. A customized approximate dynamic programming method is introduced for this problem. The authors also present numerical experiments that assess the reliability of the new approach and show that it performs better than a myopic policy.