953 resultados para RIBOSOMAL SEQUENCES
Resumo:
In this paper we consider the process of discovering frequent episodes in event sequences. The most computationally intensive part of this process is that of counting the frequencies of a set of candidate episodes. We present two new frequency counting algorithms for speeding up this part. These, referred to as non-overlapping and non-inteleaved frequency counts, are based on directly counting suitable subsets of the occurrences of an episode. Hence they are different from the frequency counts of Mannila et al [1], where they count the number of windows in which the episode occurs. Our new frequency counts offer a speed-up factor of 7 or more on real and synthetic datasets. We also show how the new frequency counts can be used when the events in episodes have time-durations as well.
Resumo:
Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework(and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodesthat we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes.
Resumo:
The cell cycle phase at starvation influences post-starvation differentiation and morphogenesis in Dictyostelium discoideum. We found that when expressed in Saccharomyces cerevisiae, a D. discoideum cDNA that encodes the ribosomal protein S4 (DdS4) rescues mutations in the cell cycle genes cdc24, cdc42 and bem1. The products of these genes affect morphogenesis in yeast via a coordinated moulding of the cytoskeleton during bud site selection. D. discoideum cells that over-or under-expressed DdS4 did not show detectable changes in protein synthesis but displayed similar developmental aberrations whose intensity was graded with the extent of over-or under-expression. This suggested that DdS4 might influence morphogenesis via a stoichiometric effect - specifically, by taking part in a multimeric complex similar to the one involving Cdc24p, Cdc42p and Bem1p in yeast. In support of the hypothesis, the S. cerevisiae proteins Cdc24p, Cdc42p and Bem1p as well as their D. discoideum cognates could be co-precipitated with antibodies to DdS4. Computational analysis and mutational studies explained these findings: a C-terminal domain of DdS4 is the functional equivalent of an SH3 domain in the yeast scaffold protein Bem1p that is central to constructing the bud site selection complex. Thus in addition to being part of the ribosome, DdS4 has a second function, also as part of a multi-protein complex. We speculate that the existence of the second role can act as a safeguard against perturbations to ribosome function caused by spontaneous variations in DdS4 levels.
Resumo:
Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of `protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a `roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.
Resumo:
Learning your αβγ's: The diversity of hydrogen-bonding patterns in backbone-expanded hybrid helices is shown by crystal-structure determination of several oligomeric peptides (see scheme; C=gray; H=white; O=red; N=blue). C 12 helices were observed in the αγ peptide series for n=2-8. In comparison, the αα peptide and αβ peptide sequences show C 10 and mixed C 14/C 15 helices, respectively. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Resumo:
Receive antenna selection (AS) has been shown to maintain the diversity benefits of multiple antennas while potentially reducing hardware costs. However, the promised diversity gains of receive AS depend on the assumptions of perfect channel knowledge at the receiver and slowly time-varying fading. By explicitly accounting for practical constraints imposed by the next-generation wireless standards such as training, packetization and antenna switching time, we propose a single receive AS method for time-varying fading channels. The method exploits the low training overhead and accuracy possible from the use of discrete prolate spheroidal (DPS) sequences based reduced rank subspace projection techniques. It only requires knowledge of the Doppler bandwidth, and does not require detailed correlation knowledge. Closed-form expressions for the channel prediction and estimation error as well as symbol error probability (SEP) of M-ary phase-shift keying (MPSK) for symbol-by-symbol receive AS are also derived. It is shown that the proposed AS scheme, after accounting for the practical limitations mentioned above, outperforms the ideal conventional single-input single-output (SISO) system with perfect CSI and no AS at the receiver and AS with conventional estimation based on complex exponential basis functions.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
A palindrome is a set of characters that reads the same forwards and backwards. Since the discovery of palindromic peptide sequences two decades ago, little effort has been made to understand its structural, functional and evolutionary significance. Therefore, in view of this, an algorithm has been developed to identify all perfect palindromes (excluding the palindromic subset and tandem repeats) in a single protein sequence. The proposed algorithm does not impose any restriction on the number of residues to be given in the input sequence. This avant-garde algorithm will aid in the identification of palindromic peptide sequences of varying lengths in a single protein sequence.
Resumo:
The accuracy of pairing of the anticodon of the initiator tRNA (tRNA(fMet)) and the initiation codon of an mRNA, in the ribosomal P-site, is crucial for determining the translational reading frame. However, a direct role of any ribosomal element(s) in scrutinizing this pairing is unknown. The P-site elements, m(2)G966 (methylated by RsmD), m(5)C967 (methylated by RsmB) and the C-terminal tail of the protein S9 lie in the vicinity of tRNA(fMet). We investigated the role of these elements in initiation from various codons, namely, AUG, GUG, UUG, CUG, AUA, AUU, AUC and ACG with tRNA(CAU)(fmet) (tRNA(fMet) with CAU anticodon); CAC and CAU with tRNA(GUG)(fme); UAG with tRNA(GAU)(fMet) using in vivo and computational methods. Although RsmB deficiency did not impact initiation from most codons, RsmD deficiency increased initiation from AUA, CAC and CAU (2- to 3.6-fold). Deletion of the S9 C-terminal tail resulted in poorer initiation from UUG, GUG and CUG, but in increased initiation from CAC, CAU and UAC codons (up to 4-fold). Also, the S9 tail suppressed initiation with tRNA(CAU)(fMet)lacking the 3GC base pairs in the anticodon stem. These observations suggest distinctive roles of 966/967 methylations and the S9 tail in initiation.
Resumo:
The solution conformations of the -hybrid oligopeptides Boc-Aib-4(R)Val]n-OMe (n = 1-8) in organic solvents have been probed by NMR, IR, and CD spectroscopic methods. In the solid state, this peptide series favors C12-helical conformations, which are backbone-expanded analogues of 310 helices in -peptide sequences. NMR studies of the six- (n = 3) and 16-residue (n = 8) peptides reveal that only two NH protons attached the N-terminus residues Aib(1) and 4(R)Val(2) are solvent-exposed. Sequential NiH-Ni+1H NOEs characteristic of local helical conformations are also observed at the residues. IR studies establish that chain extension leads to a large enhancement in the intensities of the hydrogen-bonded NH stretching bands (3343-3280 cm-1), which suggest elongation of intramolecularly hydrogen-bonded structures. The development of C12-helical structures upon lengthening of the sequence is supported by the NMR and IR observations. The CD spectra of the ()n peptides reveal a negative maximum at ca. 206 nm and a positive maximum at ca. 192 nm, spectral feature that are distinct from those of 310 helices in -peptides.
Resumo:
The ribosomal P-site hosts the peptidyl-tRNAs during translation elongation. Which P-site elements support these tRNA species to maintain codon-anticodon interactions has remained unclear. We investigated the effects of P-site features of methylations of G966, C967, and the conserved C-terminal tail sequence of Ser, Lys, and Arg (SKR) of the S9 ribosomal protein in maintenance of the translational reading frame of an mRNA. We generated Escherichia coli strains deleted for the SKR sequence in S9 ribosomal protein, RsmB (which methylates C967), and RsmD (which methylates G966) and used them to translate LacZ from its +1 and -1 out-of-frame constructs. We show that the S9 SKR tail prevents both the +1 and -1 frameshifts and plays a general role in holding the P-site tRNA/peptidyl-tRNA in place. In contrast, the G966 and C967 methylations did not make a direct contribution to the maintenance of the translational frame of an mRNA. However, deletion of rsmB in the S9 Delta 3 background caused significantly increased -1 frameshifting at 37 degrees C. Interestingly, the effects of the deficiency of C967 methylation were annulled when the E. coli strain was grown at 30 degrees C, supporting its context-dependent role.
Resumo:
In all domains of life, initiator tRNA functions exclusively at the first step of protein synthesis while elongator tRNAs extend the polypeptide chain. Unique features of initiator tRNA enable it to preferentially bind the ribosomal P site and initiate translation. Recently, we showed that the abundance of initiator tRNA also contributes to its specialized role. This motivates the question, can a cell also use elongator tRNA to initiate translation under certain conditions? To address this, we introduced non-AUG initiation codons CCC (Pro), GAG (Glu), GGU (Gly), UCU (Ser), UGU (Cys), ACG (Thr), AAU (Asn), and AGA (Arg) into the uracil DNA glycosylase gene (ung) used as a reporter gene. Enzyme assays from log-phase cells revealed initiation from non-AUG codons when intracellular initiator tRNA levels were reduced. The activity increased significantly in stationary phase. Further increases in initiation from non-AUG codons occurred in both growth phases upon introduction of plasmid-borne genes of cognate elongator tRNAs. Since purine-rich Shine-Dalgarno sequences occur frequently on mRNAs (in places other than the canonical AUG codon initiation contexts), initiation with elongator tRNAs from the alternate contexts may generate proteome diversity under stress without compromising genomic integrity. Thus, by changing the relative amounts of initiator and elongator tRNAs within the cell, we have blurred the distinction between the two classes of tRNAs thought to be frozen through years of evolution.
Resumo:
Development of simple functionalization methods to attach biomolecules such as proteins and DNA on inexpensive substrates is important for widespread use of low cost, disposable biosensors. Here, we describe a method based on polyelectrolyte multilayers to attach single stranded DNA molecules to conventional glass slides as well as a completely non-standard substrate, namely flexible plastic transparency sheets. We then use the functionalized transparency sheets to specifically detect single stranded Hepatitis B DNA sequences from samples. We also demonstrate a blocking method for reducing non-specific binding of target DNA sequences using negatively charged polyelectrolyte molecules. The polyelectrolyte based functionalization method, which relies on surface charge as opposed to covalent surface linkages, could be an attractive platform to develop assays on inexpensive substrates for low cost biosensing.