891 resultados para sequence database


Relevância:

60.00% 60.00%

Publicador:

Resumo:

A human genome contains more than 20 000 protein-encoding genes. A human proteome, instead, has been estimated to be much more complex and dynamic. The most powerful tool to study proteins today is mass spectrometry (MS). MS based proteomics is based on the measurement of the masses of charged peptide ions in a gas-phase. The peptide amino acid sequence can be deduced, and matching proteins can be found, using software to correlate MS-data with sequence database information. Quantitative proteomics allow the estimation of the absolute or relative abundance of a certain protein in a sample. The label-free quantification methods use the intrinsic MS-peptide signals in the calculation of the quantitative values enabling the comparison of peptide signals from numerous patient samples. In this work, a quantitative MS methodology was established to study aromatase overexpressing (AROM+) male mouse liver and ovarian endometriosis tissue samples. The workflow of label-free quantitative proteomics was optimized in terms of sensitivity and robustness, allowing the quantification of 1500 proteins with a low coefficient of variance in both sample types. Additionally, five statistical methods were evaluated for the use with label-free quantitative proteomics data. The proteome data was integrated with other omics datasets, such as mRNA microarray and metabolite data sets. As a result, an altered lipid metabolism in liver was discovered in male AROM+ mice. The results suggest a reduced beta oxidation of long chain phospholipids in the liver and increased levels of pro-inflammatory fatty acids in the circulation in these mice. Conversely, in the endometriosis tissues, a set of proteins highly specific for ovarian endometrioma were discovered, many of which were under the regulation of the growth factor TGF-β1. This finding supports subsequent biomarker verification in a larger number of endometriosis patient samples.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

ACTIVITY is a database on DNA/RNA site sequences with known activity magnitudes, measurement systems, sequence-activity relationships under fixed experimental conditions and procedures to adapt these relationships from one measurement system to another. This database deposits information on DNA/RNA affinities to proteins and cell nuclear extracts, cutting efficiencies, gene transcription activity, mRNA translation efficiencies, mutability and other biological activities of natural sites occurring within promoters, mRNA leaders, and other regulatory regions in pro- and eukaryotic genomes, their mutant forms and synthetic analogues. Since activity magnitudes are heavily system-dependent, the current version of ACTIVITY is supplemented by three novel sub-databases: (i) SYSTEM, measurement systems; (ii) KNOWLEDGE, sequence-activity relationships under fixed experimental conditions; and (iii) CROSS_TEST, procedures adapting a relationship from one measurement system to another. These databases are useful in molecular biology, pharmacogenetics, metabolic engineering, drug design and biotechnology. The databases can be queried using SRS and are available through the Web, http://wwwmgs.bionet.nsc.ru/systems/Activity/.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Relevância:

40.00% 40.00%

Publicador:

Resumo:

CyBase is a curated database and information source for backbone-cyclized proteins. The database incorporates naturally occurring cyclic proteins as well as synthetic derivatives, grafted analogues and acyclic permutants. The database provides a centralized repository of information on all aspects of cyclic protein biology and addresses issues pertaining to the management and searching of topologically circular sequences. The database is freely available at http://research.imb.uq.edu.au/cybase.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The availaibilty of chloroplast genome (cpDNA) sequences of Atropa belladonna, Nicotiana sylvestris, N tabacum, N tomentosiformis, Solanum bulbocastanum, S lycopersicum and S tuberosum, which are Solanaceae species, allowed us to analyze the organization of cpSSRs in their genic and intergenic regions In general, the number of cpSSRs in cpDNA ranged from 161 in S tuberosum to 226 in N tabacum, and the number of intergenic cpSSRs was higher than genic cpSSRs The mononucleotide repeats were the most frequent in studied species, but we also identified di-, tri-, tetra-, penta- and hexanucleotide repeats Multiple alignments of all cpSSRs sequence from Solanaceae species made the identification of nucleotide variability possible and the phylogeny was estimated by maximum parsimony Our study showed that the plastome database can be exploited for phylogenetic analyses and biotechnological approaches

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results: This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion: These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/.S3T source code and datasets can also be downloaded from the aforementioned website.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context. A sample of 27 sources, cataloged as pre-main sequence stars by the Pico dos Dias Survey (PDS), is analyzed to investigate a possible contamination by post-AGB stars. The far-infrared excess due to dust present in the circumstellar envelope is typical of both categories: young stars and objects that have already left the main sequence and are suffering severe mass loss. Aims. The two known post-AGB stars in our sample inspired us to seek for other very likely or possible post-AGB objects among PDS sources previously suggested to be Herbig Ae/Be stars, by revisiting the observational database of this sample. Methods. In a comparative study with well known post-AGBs, several characteristics were evaluated: (i) parameters related to the circumstellar emission; (ii) spatial distribution to verify the background contribution from dark clouds; (iii) spectral features; and (iv) optical and infrared colors. Results. These characteristics suggest that seven objects of the studied sample are very likely post-AGBs, five are possible post-AGBs, eight are unlikely post-AGBs, and the nature of seven objects remains unclear.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Staphylococcus aureus is one of the most important infectious mastitis causative agents in small ruminants. In order to know the distribution of Staph. aureus strains associated with infectious mastitis in flocks of sheep in the northeast of Brazil and establish whether these clones are related to the strains distributed internationally, this study analysed the genetic diversity of Staph. aureus isolates from cases of clinical and subclinical mastitis in ewes by pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing (MLST). In this research, 135 ewes with mastitis from 31 sheep flocks distributed in 15 districts were examined. Staph. aureus was isolated from sheep milk in 9 (29%) out of 31 herds located in 47% of the districts surveyed. MLST analysis allowed the identification of four STs (ST750, ST1728, ST1729 and ST1730). The last three with their respective novel alleles (g/p-220; pta-182 and yqil-180) were recently reported in the Staph. aureus MLST database (http://www.mlst.net). Each novel allele showed only a nucleotide different from those already described. The occurrence of CC133 (ST750 and ST1729) in this study is in agreement with other reports that only a few clones of Staph. aureus seem to be responsible for most cases of mastitis in dairy farms and that some of these clones may have broad geographic distribution. However, the prevalence of CC5 (ST1728 and ST1730)-an important group related to cases of colonization or infection in humans-differs from previous studies by its widespread occurrence and may suggest human contamination followed by selective pressures of the allelic diversifications presented for these STs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Enteropathogenic Escherichia coli (EPEC) infections are a leading cause of infantile diarrhea in developing nations. Multilocus sequence typing (MLST) characterizes bacterial strains based on the sequences of internal fragments in housekeeping genes. Little is known about strains of EPEC analyzed by MLST from Brazil. In this study, a diverse collection of 29 EPEC strains isolated from patients with diarrhea, admitted to the University Hospital of Ribeirao Preto, was characterized by MLST. Strain analysis demonstrated 22 different sequence types (STs), of which almost half (48%) were new, indicating a high genotype diversity. The 22 STs were divided by eBURST into 12 clonal complexes. It was not possible to correlate typical and atypical EPEC with other strains in the MLST database. This is the first study that analyzed EPEC strains from South America that are included in the E. coli MLST database. Nine (31%) out of 29 strains are part of the CC10 clonal complex, the major clonal complex in the database, which comprises 174 strains and 86 different STs, suggesting that these strains might be the most important intestinal pathogenic E. coli worldwide. Genetic relationships between typical and atypical EPEC, enterohemorrhagic E. coli, and enteroaggregative E. coli strains were not established by MLST.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

MHCPEP (http://wehih.wehi.edu.au/mhcpep/) is a curated database comprising over 13 000 peptide sequences known to bind MHC molecules, Entries are compiled from published reports as well as from direct submissions of experimental data, Each entry contains the peptide sequence, its MHC specificity and where available, experimental method, observed activity, binding affinity, source protein and anchor positions, as well as publication references, The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW or FTP.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The nifH gene sequence of the nitrogen-fixing bacterium Acetobacter diazotrophicus was determined with the use of the polymerase chain reaction and universal degenerate oligonucleotide primers. The gene shows highest pair-wise similarity to the nifH gene of Azospirillum brasilense. The phylogenetic relationships of the nifH gene sequences were compared with those inferred from 16S rRNA gene sequences. Knowledge of the sequence of the nifH gene contributes to the growing database of nifH gene sequences, and will allow the detection of Acet. diazotrophicus from environmental samples with nifH gene-based primers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

MHCPEP is a curated database comprising over 9000 peptide sequences known to bind MHC molecules. Entries are compiled from published reports as well as from direct submissions of experimental data. Each entry contains the peptide sequence, its MHC specificity and, when available, experimental method, observed activity, binding affinity, source protein, anchor positions and publication references. The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW, FTP or Gopher.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The complete nucleotide sequence of the mitochondrial (mt) DNA molecule of the liverfluke, Fasciola hepatica (phylum Platyhelminthes, class Trematoda, family Fasciolidae), was determined, It comprises 14462 bp, contains 12 protein-encoding, 2 ribosomal and 22 transfer RNA genes, and is the second complete flatworm (and the first trematode) mitochondrial sequence to be described in detail. All of the genes are transcribed from the same strand. Of the genes typically found in mitochondrial genomes of eumetazoans, only atp8 is absent. The nad4L and nad4 genes overlap by 40 nt. Most intergenic sequences are very short. Two larger non-coding regions are present. The longer one (817 nt) is located between trnG and cox3 and consists of 8 identical tandem repeats of 85 nt, rich in G and C, followed by 1 imperfect repeat. The shorter non-coding region (187 nt) exhibits no special features and is separated from the longer region by trnG. The gene arrangement resembles that of some other trematodes including the eastern Asian Schistosoma species (and cyclophyllidean cestode species) but it is strikingly different from that of the African schistosomes, represented by Schistosoma mansoni. The genetic code is as inferred previously for flatworms. Transfer RNA genes range in length from 58 to 70 nt, their products producing characteristic 'clover leaf' structures, except for tRNA(S-VON) and tRNA(S-AGN) lacking the DHU arm.