29 resultados para SEQUENCE DATA
em Indian Institute of Science - Bangalore - Índia
Resumo:
Elucidation of the detailed structural features and sequence requirements for iv helices of various lengths could be very important in understanding secondary structure formation in proteins and, hence. in the protein folding mechanism. An algorithm to characterize the geometry of an alpha helix from its C-alpha coordinates has been developed and used to analyze the structures of long cu helices (number of residues greater than or equal to 25) found in globular proteins, the crystal structure coordinates of which are available from the Brookhaven Protein Data Bank, Ail long a helices can be unambiguously characterized as belonging to one of three classes: linear, curved, or kinked, with a majority being curved. Analysis of the sequences of these helices reveals that the long alpha helices have unique sequence characteristics that distinguish them from the short alpha helices in globular proteins, The distribution and statistical propensities of individual amino acids to occur in long alpha heices are different from those found in short alpha helices, with amino acids having longer side chains and/or having a greater number of functional groups occurring more frequently in these helices, The sequences of the long alpha helices can be correlated with their gross structural features, i.e., whether they are curved, linear, or kinked, and in case of the curved helices, with their curvature.
Resumo:
Tetrapeptide sequences of the type Z-Pro-Y-X were obtained from the crystal structure data on 34 globular proteins, and used in an analysis of the positional preferences of the individual amino acid residues in the β-turn conformation. The effect of fixing proline as the second position residue in the tetrapeptide sequence was studied by comparing the data obtained on the positional preferences with the corresponding data obtained by Chou and Fasman using the Z-R-Y-X sequence, where no particular residue was fixed in any of the four positions. While, in general, several amino acid residues having relatively very high or very low preferences for specific positions were found to be common to both the Z-Pro-Y-X and Z-R-Y-X sequences, many significant differences were found between the two sets of data, which are to be attributed to specific interactions arising from the presence of the proline residue.
Resumo:
Antibodies were raised in rabbits against the bovine serum albumin conjugate of dpApT. Analysis by double diffusion in agar gel and quantitative precipitation test showed the presence of antibodies specific to the hapten in the antisera. Quantitative data on the specificity of the antibodies were obtained by studying the inhibition of the binding of 3H-dpApT to the anti-sera by various nonradioactive mono- and oligonucleotides, using a nitrocellulose membrane binding assay. The antibodies were found to be highly specific for the dinucleotide sequence dpApT. The antibodies were able to bind to synthetic oligonucleotides containing the sequence dpApT and to denatured calf thymus DNA.
Resumo:
In recent years, identification of sequence patterns has been given immense importance to understand better their significance with respect to genomic organization and evolutionary processes. To this end, an algorithm has been derived to identify all similar sequence repeats present in a protein sequence. The proposed algorithm is useful to correlate the three-dimensional structure of various similar sequence repeats available in the Protein Data Bank against the same sequence repeats present in other databases like SWISS-PROT, PIR and Genome databases.
Resumo:
Software packages NUPARM and NUCGEN, are described, which can be used to understand sequence directed structural variations in nucleic acids, by analysis and generation of non-uniform structures. A set of local inter basepair parameters (viz. tilt, roll, twist, shift, slide and rise) have been defined, which use geometry and coordinates of two successive basepairs only and can be used to generate polymeric structures with varying geometries for each of the 16 possible dinucleotide steps. Intra basepair parameters, propeller, buckle, opening and the C6...C8 distance can also be varied, if required, while the sugar phosphate backbone atoms are fixed in some standard conformation ill each of the nucleotides. NUPARM can be used to analyse both DNA and RNA structures, with single as well as double stranded helices. The NUCGEN software generates double helical models with the backbone fixed in B-form DNA, but with appropriate modifications in the input data, it can also generate A-form DNA ar rd RNA duplex structures.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
The nucleotide sequence of a 714 bp BamHI-EcoRI fragment of cucumber chloroplast DNA was determined. The fragment contained a gene for tRNA(Leu) together with its flanking regions. The trnL(CAA) gene sequence is about 99% in similarity to broad bean, cauliflower, maize, spinach and tobacco corresponding genes. The relative expression level of the gene was determined by Northern (tRNA) gel blot and Northern (total cellular RNA) slot-blot analyses using the trnL gene probe in 6-day old etiolated cucumber seedlings and the seedlings that had been kept in the dark (dark-grown), treated with benzyladenine (BA) and kept in the dark (BA-treated dark-grown), illuminated (light-grown), and treated with BA and illuminated (BA-treated light-grown), for additional 4, 8 or 12 hr. The trnL transcripts and tRNA(Leu) levels in BA-treated dark-grown seedlings were 5 and 3 times higher, respectively after 4 hr BA treatment, while in the BA treated light-grown seedlings the level of trnL transcripts was only 3 times higher and had no detectable effect on mature tRNA(Leu) when compared to the time-4 hr dark-grown seedlings. However, the level of mature tRNA(Leu) did not show marked changes in the light-grown seedlings, whereas the level of trnL transcripts increases 3 times after 8 hr illumination of dark-grown seedlings. These data indicate that both light and cytokinin can signal changes in plastid tRNA gene expression. The possible regulatory mechanisms for such changes are discussed.
Resumo:
The problem of scheduling divisible loads in distributed computing systems, in presence of processor release time is considered. The objective is to find the optimal sequence of load distribution and the optimal load fractions assigned to each processor in the system such that the processing time of the entire processing load is a minimum. This is a difficult combinatorial optimization problem and hence genetic algorithms approach is presented for its solution.
Resumo:
Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.
Resumo:
Sinusoidal structured light projection (SSLP) technique, specifically-phase stepping method, is in widespread use to obtain accurate, dense 3-D data. But, if the object under investigation possesses surface discontinuities, phase unwrapping (an intermediate step in SSLP) stage mandatorily require several additional images, of the object with projected fringes (of different spatial frequencies), as input to generate a reliable 3D shape. On the other hand, Color-coded structured light projection (CSLP) technique is known to require a single image as in put, but generates sparse 3D data. Thus we propose the use of CSLP in conjunction with SSLP to obtain dense 3D data with minimum number of images as input. This approach is shown to be significantly faster and reliable than temporal phase unwrapping procedure that uses a complete exponential sequence. For example, if a measurement with the accuracy obtained by interrogating the object with 32 fringes in the projected pattern is carried out with both the methods, new strategy proposed requires only 5 frames as compared to 24 frames required by the later method.
Resumo:
Determining the sequence of amino acid residues in a heteropolymer chain of a protein with a given conformation is a discrete combinatorial problem that is not generally amenable for gradient-based continuous optimization algorithms. In this paper we present a new approach to this problem using continuous models. In this modeling, continuous "state functions" are proposed to designate the type of each residue in the chain. Such a continuous model helps define a continuous sequence space in which a chosen criterion is optimized to find the most appropriate sequence. Searching a continuous sequence space using a deterministic optimization algorithm makes it possible to find the optimal sequences with much less computation than many other approaches. The computational efficiency of this method is further improved by combining it with a graph spectral method, which explicitly takes into account the topology of the desired conformation and also helps make the combined method more robust. The continuous modeling used here appears to have additional advantages in mimicking the folding pathways and in creating the energy landscapes that help find sequences with high stability and kinetic accessibility. To illustrate the new approach, a widely used simplifying assumption is made by considering only two types of residues: hydrophobic (H) and polar (P). Self-avoiding compact lattice models are used to validate the method with known results in the literature and data that can be practically obtained by exhaustive enumeration on a desktop computer. We also present examples of sequence design for the HP models of some real proteins, which are solved in less than five minutes on a single-processor desktop computer Some open issues and future extensions are noted.
Resumo:
The importance and usefulness of local doublet parameters in understanding sequence dependent effects has been described for A- and B-DNA oligonucleotide crystal structures. Each of the two sets of local parameters described by us in the NUPARM algorithm, namely the local doublet parameters, calculated with reference to the mean z-axis, and the local helical parameters, calculated with reference to the local helix axis, is sufficient to describe the oligonucleotide structures, with the local helical parameters giving a slightly magnified picture of the variations in the structures. The values of local doublet parameters calculated by NUPARM algorithm are similar to those calculated by NEWHELIX90 program, only if the oligonucleotide fragment is not too distorted. The mean values obtained using all the available data for B-DNA crystals are not significantly different from those obtained when a limited data set is used, consisting only of structures with a data resolution of better than 2.4 A and without any bound drug molecule. Thus the variation observed in the oligonucleotide crystals appears to be independent of the quality of their crystallinity. No strong correlation is seen between any pair of local doublet parameters but the local helical parameters are interrelated by geometric relationships. An interesting feature that emerges from this analysis is that the local rise along the z-axis is highly correlated with the difference in the buckle values of the two basepairs in the doublet, as suggested earlier for the dodecamer structures (Bansal and Bhattacharyya, in Structure & Methods: DNA & RNA, Vol. 3 (Eds., R.H. Sarma and M.H. Sarma), pp. 139-153 (1990)). In fact the local rise values become almost constant for both A- and B-forms, if a correction is applied for the buckling of the basepairs. In B-DNA the AA, AT, TA and GA basepair sequences generally have a smaller local rise (3.25 A) compared to the other sequences (3.4 A) and this seems to be an intrinsic feature of basepair stacking interaction and not related to any other local doublet parameter. The roll angles in B-DNA oligonucleotides have small values (less than +/- 8 degrees), while mean local twist varies from 24 degrees to 45 degrees. The CA/TG doublet sequences show two types of preferred geometries, one with positive roll, small positive slide and reduced twist and another with negative roll, large positive slide and increased twist.(ABSTRACT TRUNCATED AT 400 WORDS)
Resumo:
Sesbania mosaic virus (SMV) is a plant virus infecting Sesbania grandiflora plants in Andhra Pradesh, India. Amino acid sequence of the tryptic peptides of SMV coat protein were determined using a gas phase sequenator. These sequences showed identical amino acids at 69% of the positions when aligned with the corresponding residues of southern bean mosaic virus (SBMV).Crystals diffracting to better than 3 Å resolution were obtained by precipitating the virus with ammonium sulphate. The crystals belonged to rhombohedral space group R3 with α = 291·4 Å and α = 61·9°. Three-dimensional X-ray diffraction data on these crystals were collected to a resolution of 4·7 Å, using a Siemens-Nicolet area detector system. Self-rotation function studies revealed the icosahedral symmetry of the virus particles, as well as their precise orientation in the unit cell. Cross-rotation function and modelling studies with SBMV showed that it is a valid starting model for SMV structure determination. Low resolution phases computed using a polyalanine model of SBMV were subjected to refinement and extension by real-space electron density averaging and solvent flattening. The final electron density map revealed a polypeptide fold similar to SBMV. The single disulphide bridge of SBMV coat protein is retained in SMV. Four icosahedrally independent cation binding sites have been tentatively identified. Three of these sites, related by a quasi threefold axis, are also found in SBMV. The fourth site is situated on the quasi threefold axis. Aspartic acid residues, which replace Ile218 of SBMV from the quasi threefold-related subunits are suitable ligands to the cation at this site
Resumo:
In this article we describe and demonstrate the versatility of a computer program, GENOME MAPPING, that uses interactive graphics and runs on an IRIS workstation. The program helps to visualize as well as analyse global and local patterns of genomic DNA sequences. It was developed keeping in mind the requirements of the human genome sequencing programme, which requires rapid analysis of the data. Using GENOME MAPPING one can discern signature patterns of different kinds of sequences and analyse such patterns for repetitive as well as rare sequence strings. Further, one can visualize the extent of global homology between different genomic sequences. An application of our method to the published yeast mitochondrial genome data shows similar sequence organizations in the entire sequence and in smaller subsequences.
Resumo:
In a mobile ad-hoc network scenario, where communication nodes are mounted on moving platforms (like jeeps, trucks, tanks, etc.), use of V-BLAST requires that the number of receive antennas in a given node must be greater than or equal to the sum of the number of transmit antennas of all its neighbor nodes. This limits the achievable spatial multiplexing gain (data rate) for a given node. In such a scenario, we propose to achieve high data rates per node through multicode direct sequence spread spectrum techniques in conjunction with V-BLAST. In the considered multicode V-BLAST system, the receiver experiences code domain interference (CDI) in frequency selective fading, in addition to space domain interference (SDI) experienced in conventional V-BLAST systems. We propose two interference cancelling receivers that employ a linear parallel interference cancellation approach to handle the CDI, followed by conventional V-BLAST detector to handle the SDI, and then evaluate their bit error rates.