999 resultados para CHECKING SEQUENCES


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework(and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodesthat we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of `protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a `roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning your αβγ's: The diversity of hydrogen-bonding patterns in backbone-expanded hybrid helices is shown by crystal-structure determination of several oligomeric peptides (see scheme; C=gray; H=white; O=red; N=blue). C 12 helices were observed in the αγ peptide series for n=2-8. In comparison, the αα peptide and αβ peptide sequences show C 10 and mixed C 14/C 15 helices, respectively. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Receive antenna selection (AS) has been shown to maintain the diversity benefits of multiple antennas while potentially reducing hardware costs. However, the promised diversity gains of receive AS depend on the assumptions of perfect channel knowledge at the receiver and slowly time-varying fading. By explicitly accounting for practical constraints imposed by the next-generation wireless standards such as training, packetization and antenna switching time, we propose a single receive AS method for time-varying fading channels. The method exploits the low training overhead and accuracy possible from the use of discrete prolate spheroidal (DPS) sequences based reduced rank subspace projection techniques. It only requires knowledge of the Doppler bandwidth, and does not require detailed correlation knowledge. Closed-form expressions for the channel prediction and estimation error as well as symbol error probability (SEP) of M-ary phase-shift keying (MPSK) for symbol-by-symbol receive AS are also derived. It is shown that the proposed AS scheme, after accounting for the practical limitations mentioned above, outperforms the ideal conventional single-input single-output (SISO) system with perfect CSI and no AS at the receiver and AS with conventional estimation based on complex exponential basis functions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A palindrome is a set of characters that reads the same forwards and backwards. Since the discovery of palindromic peptide sequences two decades ago, little effort has been made to understand its structural, functional and evolutionary significance. Therefore, in view of this, an algorithm has been developed to identify all perfect palindromes (excluding the palindromic subset and tandem repeats) in a single protein sequence. The proposed algorithm does not impose any restriction on the number of residues to be given in the input sequence. This avant-garde algorithm will aid in the identification of palindromic peptide sequences of varying lengths in a single protein sequence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The solution conformations of the -hybrid oligopeptides Boc-Aib-4(R)Val]n-OMe (n = 1-8) in organic solvents have been probed by NMR, IR, and CD spectroscopic methods. In the solid state, this peptide series favors C12-helical conformations, which are backbone-expanded analogues of 310 helices in -peptide sequences. NMR studies of the six- (n = 3) and 16-residue (n = 8) peptides reveal that only two NH protons attached the N-terminus residues Aib(1) and 4(R)Val(2) are solvent-exposed. Sequential NiH-Ni+1H NOEs characteristic of local helical conformations are also observed at the residues. IR studies establish that chain extension leads to a large enhancement in the intensities of the hydrogen-bonded NH stretching bands (3343-3280 cm-1), which suggest elongation of intramolecularly hydrogen-bonded structures. The development of C12-helical structures upon lengthening of the sequence is supported by the NMR and IR observations. The CD spectra of the ()n peptides reveal a negative maximum at ca. 206 nm and a positive maximum at ca. 192 nm, spectral feature that are distinct from those of 310 helices in -peptides.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bisimulation-based information flow properties were introduced by Focardi and Gorrieri [1] as a way of specifying security properties for transition system models. These properties were shown to be decidable for finite-state systems. In this paper, we study the problem of verifying these properties for some well-known classes of infinite state systems. We show that all the properties are undecidable for each of these classes of systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Development of simple functionalization methods to attach biomolecules such as proteins and DNA on inexpensive substrates is important for widespread use of low cost, disposable biosensors. Here, we describe a method based on polyelectrolyte multilayers to attach single stranded DNA molecules to conventional glass slides as well as a completely non-standard substrate, namely flexible plastic transparency sheets. We then use the functionalized transparency sheets to specifically detect single stranded Hepatitis B DNA sequences from samples. We also demonstrate a blocking method for reducing non-specific binding of target DNA sequences using negatively charged polyelectrolyte molecules. The polyelectrolyte based functionalization method, which relies on surface charge as opposed to covalent surface linkages, could be an attractive platform to develop assays on inexpensive substrates for low cost biosensing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genomic sequences are far from being random but are made up of systematically ordered and information rich patterns. These repeated sequence patterns have been vastly utilized for their fundamental importance in understanding the genome function and organization. To this end, a comprehensive toolkit, RepEx, has been developed which extracts repeat (inverted, everted and mirror) patterns from the given genome sequence(s) without any constraints. The toolkit can also be used to fetch the inverted repeats present in the protein sequence (s). Further, it is capable of extracting exact and degenerate repeats with a user defined spacer intervals. It is remarkably more precise and sensitive when compared to the existing tools. An example with comprehensive case studies and a performance evaluation of the proposed toolkit has been presented to authenticate its efficiency and accuracy. (C) 2013 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like ``linker'' sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be ``plugged-into'' routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold. (C) 2013 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analytical closed-form expressions for harmonic distortion factors corresponding to various pulsewidth modulation (PWM) techniques for a two-level inverter have been reported in the literature. This paper derives such analytical closed-form expressions, pertaining to centered space-vector PWM (CSVPWM) and eight different advanced bus-clamping PWM (ABCPWM) schemes, for a three-level neutral-point-clamped (NPC) inverter. These ABCPWM schemes switch each phase at twice the nominal switching frequency in certain intervals of the line cycle while clamping each phase to one of the dc terminals over certain other intervals. The harmonic spectra of the output voltages, corresponding to the eight ABCPWM schemes, are studied and compared experimentally with that of CSVPWM over the entire modulation range. The measured values of weighted total harmonic distortion (WTHD) of the line voltage V-WTHD are used to validate the analytical closed-form expressions derived. The analytical expressions, pertaining to two of the ABCPWM methods, are also validated by measuring the total harmonic distortion (THD) in the line current I-THD on a 2.2-kW constant volts-per-hertz induction motor drive.