Biblioteca Digital

891 resultados para sequence database

Highly specific protein sequence motifs for genome analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called emotif (http://motif.stanford.edu/emotif). Given an aligned set of protein sequences, emotif generates a set of motifs with a wide range of specificities and sensitivities. emotif also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used emotif to generate sets of motifs from all 7,000 protein alignments in the blocks and prints databases. The resulting database, called identify (http://motif.stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10−10 to 10−5. Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. identify assigns biological functions to 25–30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, identify assigned functions to 172 of proteins of unknown function in the yeast genome.

A unified statistical framework for sequence comparison and structure comparison

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., blast and fasta validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.

Expressed Sequence Tags from a Root-Hair-Enriched Medicago truncatula cDNA Library1

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The root hair is a specialized cell type involved in water and nutrient uptake in plants. In legumes the root hair is also the primary site of recognition and infection by symbiotic nitrogen-fixing Rhizobium bacteria. We have studied the root hairs of Medicago truncatula, which is emerging as an increasingly important model legume for studies of symbiotic nodulation. However, only 27 genes from M. truncatula were represented in GenBank/EMBL as of October, 1997. We report here the construction of a root-hair-enriched cDNA library and single-pass sequencing of randomly selected clones. Expressed sequence tags (899 total, 603 of which have homology to known genes) were generated and made available on the Internet. We believe that the database and the associated DNA materials will provide a useful resource to the community of scientists studying the biology of roots, root tips, root hairs, and nodulation.

Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.

Sequence analysis of a mannitol dehydrogenase cDNA from plants reveals a function for the pathogenesis-related protein ELI3.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mannitol is the most abundant sugar alcohol in nature, occurring in bacteria, fungi, lichens, and many species of vascular plants. Celery (Apium graveolens L.), a plant that forms mannitol photosynthetically, has high photosynthetic rates thought to results from intrinsic differences in the biosynthesis of hexitols vs. sugars. Celery also exhibits high salt tolerance due to the function of mannitol as an osmoprotectant. A mannitol catabolic enzyme that oxidizes mannitol to mannose (mannitol dehydrogenase, MTD) has been identified. In celery plants, MTD activity and tissue mannitol concentration are inversely related. MTD provides the initial step by which translocated mannitol is committed to central metabolism and, by regulating mannitol pool size, is important in regulating salt tolerance at the cellular level. We have now isolated, sequenced, and characterized a Mtd cDNA from celery. Analyses showed that Mtd RNA was more abundant in cells grown on mannitol and less abundant in salt-stressed cells. A protein database search revealed that the previously described ELI3 pathogenesis-related proteins from parsley and Arabidopsis are MTDs. Treatment of celery cells with salicylic acid resulted in increased MTD activity and RNA. Increased MTD activity results in an increased ability to utilize mannitol. Among other effects, this may provide an additional source of carbon and energy for response to pathogen attack. These responses of the primary enzyme controlling mannitol pool size reflect the importance of mannitol metabolism in plant responses to divergent types of environmental stress.

Quantifying the contamination by old main-sequence stars in young moving groups: the case of the Local Association

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context. The associations and moving groups of young stars are excellent laboratories for investigating stellar formation in the solar neighborhood. Previous results have confirmed that a non-negligible fraction of old main-sequence stars is present in the lists of possible members of young stellar kinematic groups. A detailed study of the properties of these samples is needed to separate the young stars from old main-sequence stars with similar space motion, and identify the origin of these structures. Aims. Our intention is to characterize members of the young moving groups, determine their age distribution, and quantify the contamination by old main-sequence stars, in particular, for the Local Association. Methods. We used stars possible members of the young (~10-650 Myr) moving groups from the literature. To determine the age of the stars, we used several suitable age indicators for young main sequence stars, i.e., X-ray fluxes from the Rosat All-sky Survey database, photometric data from the Tycho-2, Hipparcos, and 2MASS database. We also used spectroscopic data, in particular the equivalent width of the lithium line Li I λ6707.8 Å and H_α, to constrain the range of ages of the stars. Results. By combining photometric and spectroscopic data, we were able to separate the young stars (10-650 Myr) from the old (> 1 Gyr) field ones. We found, in particular, that the Local Association is contaminated by old field stars at the level of ~30%. This value must be considered as the contamination for our particular sample, and not of the entire Local Association. For other young moving groups, it is more difficult to estimate the fraction of old stars among possible members. However, the level of X-ray emission can, at least, help to separate two age populations: stars with <200 Myr and stars older than this. Conclusions. Among the candidate members of the classical moving groups, there is a non-negligible fraction of old field stars that should be taken into account when studying the stellar birthrate in the solar neighborhood. Our results are consistent with a scenario in which the moving groups contain both groups of young stars formed in a recent star-formation episode and old field stars with similar space motion. Only by combining X-ray and optical spectroscopic data is it possible to distinguish between these two age populations.

List of size fractionated eukaryotic plankton community samples and associated metadata (Database W1)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present data set provides an Excel file in a zip archive. The file lists 334 samples of size fractionated eukaryotic plankton community with a suite of associated metadata (Database W1). Note that if most samples represented the piconano- (0.8-5 µm, 73 samples), nano- (5-20 µm, 74 samples), micro- (20-180 µm, 70 samples), and meso- (180-2000 µm, 76 samples) planktonic size fractions, some represented different organismal size-fractions: 0.2-3 µm (1 sample), 0.8-20 µm (6 samples), 0.8 µm - infinity (33 samples), and 3-20 µm (1 sample). The table contains the following fields: a unique sample sequence identifier; the sampling station identifier; the Tara Oceans sample identifier (TARA_xxxxxxxxxx); an INDSC accession number allowing to retrieve raw sequence data for the major nucleotide databases (short read archives at EBI, NCBI or DDBJ); the depth of sampling (Subsurface - SUR or Deep Chlorophyll Maximum - DCM); the targeted size range; the sequences template (either DNA or WGA/DNA if DNA extracted from the filters was Whole Genome Amplified); the latitude of the sampling event (decimal degrees); the longitude of the sampling event (decimal degrees); the time and date of the sampling event; the device used to collect the sample; the logsheet event corresponding to the sampling event ; the volume of water sampled (liters). Then follows information on the cleaning bioinformatics pipeline shown on Figure W2 of the supplementary litterature publication: the number of merged pairs present in the raw sequence file; the number of those sequences matching both primers; the number of sequences after quality-check filtering; the number of sequences after chimera removal; and finally the number of sequences after selecting only barcodes present in at least three copies in total and in at least two samples. Finally, are given for each sequence sample: the number of distinct sequences (metabarcodes); the number of OTUs; the average number of barcode per OTU; the Shannon diversity index based on barcodes for each sample (URL of W4 dataset in PANGAEA); and the Shannon diversity index based on each OTU (URL of W5 dataset in PANGAEA).

Total V9 rDNA information organized at the metabarcode level (Database W4)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present data set provides a tab separated text file compressed in a zip archive. The file includes metadata for each TaraOceans V9 rDNA metabarcode including the following fields: md5sum = unique identifier; lineage = taxonomic path associated to the metabarcode; pid = % identity to the closest reference barcode from V9_PR2; sequence = nucleotide sequence of the metabarcode; refs = identity of the best hit reference sequence(s); TARA_xxx = number of occurrences of this barcode in each of the 334 samples; totab = total abundance of the barcode ; cid = identifier of the OTU to which the barcode belongs; and taxogroup = high-taxonomic level assignation of this barcode. The file also includes three categories of functional annotations: (1) Chloroplast: yes, presence of permanent chloroplast; no, absence of permanent chloroplast ; NA, undetermined. (2) Symbiont (small partner): parasite, the species is a parasite; commensal, the species is a commensal; mutualist, the species is a mutualist symbiont, most often a microalgal taxon involved in photosymbiosis; no the species is not involved in a symbiosis as small partner; NA, undetermined. (3) Symbiont (host): photo, the host species relies on a mutualistic microalgal photosymbiont to survive (obligatory photosymbiosis); photo_falc, same as photo, but facultative relationship; photo_klep, the host species maintains chloroplasts from microalgal prey(s) to survive; photo_klep_falc, same as photo_klep, but facultative; Nfix, the host species must interact with a mutualistic symbiont providing N2 fixation to survive; Nfix_falc, same as Nfix, but facultative; no, the species is not involved in any mutualistic symbioses; NA, undetermined. For example, the collodarian/Brandtodinium symbiosis is annotated: Chloroplast, "no"; Symbiont (small), "no"; Symbiont (host), "photo", for the collodarian host; and: Chloroplast, "yes"; Symbiont (small), "mutualist"; Symbiont (host), "no", for the dinoflagellate microalgal endosymbiont.chloroplast = "yes", "no" or "NA"; symbiont.small = "parasite", "commensal", "mutualist", "no" or "NA"; symbiont.host = "photo", "photo_falc", "photo_klep", "Nfix", no or NA; benef = "Nfix", "no" or "NA"; trophism = Metazoa , heterotroph , NA , photosymbiosis , phototroph according to the previous fields.

Better prediction of protein contact number using a support vector regression analysis of amino acid sequence

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.

Ultraconserved elements in insect genomes: A highly conserved intronic sequence implicated in the control of homothorax mRNA splicing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, we identified a large number of ultraconserved (uc) sequences in noncoding regions of human, mouse, and rat genomes that appear to be essential for vertebrate and amniote ontogeny. Here, we used similar methods to identify ultraconserved genomic regions between the insect species Drosophila melanogaster and Drosophila pseudoobscura, as well as the more distantly related Anopheles gambiae. As with vertebrates, ultraconserved sequences in insects appear to Occur primarily in intergenic and intronic sequences, and at intron-exon junctions. The sequences are significantly associated with genes encoding developmental regulators and transcription factors, but are less frequent and are smaller in size than in vertebrates. The longest identical, nongapped orthologous match between the three genomes was found within the homothorax (hth) gene. This sequence spans an internal exon-intron junction, with the majority located within the intron, and is predicted to form a highly stable stem-loop RNA structure. Real-time quantitative PCR analysis of different hth splice isoforms and Northern blotting showed that the conserved element is associated with a high incidence of intron retention in hth pre-mRNA, suggesting that the conserved intronic element is critically important in the post-transcriptional regulation of hth expression in Diptera.

Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A 16S rRNA gene database (http://greengenes.bl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.

Identifying sequence regions undergoing conformational change via predicted continuum secondary structure

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accurate than categorical predictors for structurally ambivalent sequence regions, suggesting that such models are suited to characterize protein flexibility. Results: We develop a computational method for identifying regions that are prone to conformational change directly from the amino acid sequence. The method uses the entropy of the probabilistic output of an 8-class continuum secondary structure predictor. Results for 171 unique amino acid sequences with well-characterized variable structure (identified in the 'Macromolecular movements database') indicate that the method is highly sensitive at identifying flexible protein regions, but false positives remain a problem. The method can be used to explore conformational flexibility of proteins (including hypothetical or synthetic ones) whose structure is yet to be determined experimentally.

SCORPION2: A database for structure-function analysis of scorpion toxins

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scorpion toxins are important experimental tools for characterization of vast array of ion channels and serve as scaffolds for drug design. General public database entries contain limited annotation whereby rich structure-function information from mutation studies is typically not available. SCORPION2 contains more than 800 records of native and mutant toxin sequences enriched with binding affinity and toxicity information, 624 three-dimensional structures and some 500 references. SCORPION2 has a set of search and prediction tools that allow users to extract and perform specific queries: text searches of scorpion toxin records, sequence similarity search, extraction of sequences, visualization of scorpion toxin structures, analysis of toxic activity, and functional annotation of previously uncharacterized scorpion toxins. The SCORPION2 database is available at http://sdmc.i2r.a-star.edu.sg/scorpion/. (c) 2006 Elsevier Ltd. All rights reserved.

AntigenDB:an immunoinformatics database of pathogen antigens

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The continuing threat of infectious disease and future pandemics, coupled to the continuous increase of drug-resistant pathogens, makes the discovery of new and better vaccines imperative. For effective vaccine development, antigen discovery and validation is a prerequisite. The compilation of information concerning pathogens, virulence factors and antigenic epitopes has resulted in many useful databases. However, most such immunological databases focus almost exclusively on antigens where epitopes are known and ignore those for which epitope information was unavailable. We have compiled more than 500 antigens into the AntigenDB database, making use of the literature and other immunological resources. These antigens come from 44 important pathogenic species. In AntigenDB, a database entry contains information regarding the sequence, structure, origin, etc. of an antigen with additional information such as B and T-cell epitopes, MHC binding, function, gene-expression and post translational modifications, where available. AntigenDB also provides links to major internal and external databases. We shall update AntigenDB on a rolling basis, regularly adding antigens from other organisms and extra data analysis tools. AntigenDB is available freely at http://www.imtech.res.in/raghava/antigendb and its mirror site http://www.bic.uams.edu/raghava/antigendb.

A novel multilocus sequence typing scheme for the opportunistic pathogen Propionibacterium acnes and characterisation of type 1 cell surface-associated antigens

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have developed a novel multilocus sequence typing (MLST) scheme and database (http://pubmlst.org/pacnes/) for Propionibacterium acnes based on the analysis of seven core housekeeping genes. The scheme, which was validated against previously described antibody, single locus and random amplification of polymorphic DNA typing methods, displayed excellent resolution and differentiated 123 isolates into 37 sequence types (STs). An overall clonal population structure was detected with six eBURST groups representing the major clades I, II and III, along with two singletons. Two highly successful and global clonal lineages, ST6 (type IA) and ST10 (type IB1), representing 64?% of this current MLST isolate collection were identified. The ST6 clone and closely related single locus variants, which comprise a large clonal complex CC6, dominated isolates from patients with acne, and were also significantly associated with ophthalmic infections. Our data therefore support an association between acne and P. acnes strains from the type IA cluster and highlight the role of a widely disseminated clonal genotype in this condition. Characterization of type I cell surface-associated antigens that are not detected in ST10 or strains of type II and III identified two dermatan-sulphate-binding proteins with putative phase/antigenic variation signatures. We propose that the expression of these proteins by type IA organisms contributes to their role in the pathophysiology of acne and helps explain the recurrent nature of the disease. The MLST scheme and database described in this study should provide a valuable platform for future epidemiological and evolutionary studies of P. acnes.

«
1
2
...
6
7
8
9
10
11
12
...
59
60
»