956 resultados para Genomic sequence database
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The complete nucleotide sequence of the genomic RNA 1 (8745 nt) and RNA 2 (4986 nt) of Citrus leprosis virus cytoplasmic type (CiLV-C) was determined using cloned cDNA. RNA 1 contains two open reading frames (ORFs), which correspond to 286 and 29 kDa proteins. The 286 kDa protein is a polyprotein putatively involved in virus replication, which contains four conserved domains: methyltransferase, protease, helicase and polymerase. RNA 2 contains four ORFs corresponding to 15, 61, 32 and 24 kDa proteins, respectively. The 32 kDa protein is apparently involved in cell-to-cell movement of the virus, but none of the other putative proteins exhibit any conserved domain. The 5' regions of the two genomic RNAs contain a 'cap' structure and poly(A) tails were identified in the 3'-terminals. Sequence analyses and searches for structural and non-structural protein similarities revealed conserved domains with members of the genera Furovirus, Bromovirus, Tobravirus and Tobamovirus, although phylogenetic analyses strongly suggest that CiLV-C is a member of a distinct, novel virus genus and family, and definitely demonstrate that it does not belong to the family Rhabdoviridae, as previously proposed. Based on these results it was proposed that Citrus leprosis virus be considered as the type member of a new genus of viruses, Cilevirus.
Resumo:
The simultaneous existence of alternative oxidases and uncoupling proteins in plants has raised the question as to why plants need two energy-dissipating systems with apparently similar physiological functions. A probably complete plant uncoupling protein gene family is described and the expression profiles of this family compared with the multigene family of alternative oxidases in Arabidopsis thaliana and sugarcane (Saccharum sp.) employed as dicot and monocot models, respectively. In total, six uncoupling protein genes, AtPUMP1-6, were recognized within the Arabidopsis genome and five (SsPUMP1-5) in a sugarcane EST database. The recombinant AtPUMP5 protein displayed similar biochemical properties as AtPUMP1. Sugarcane possessed four Arabidopsis AOx1-type orthologues (SsAOx1a-1d); no sugarcane orthologue corresponding to Arabidopsis AOx2-type genes was identified. Phylogenetic and expression analyses suggested that AtAOx1d does not belong to the AOx1-type family but forms a new (AOx3-type) family. Tissue-enriched expression profiling revealed that uncoupling protein genes were expressed more ubiquitously than the alternative oxidase genes. Distinct expression patterns among gene family members were observed between monocots and dicots and during chilling stress. These findings suggest that the members of each energy-dissipating system are subject to different cell or tissue/organ transcriptional regulation. As a result, plants may respond more flexibly to adverse biotic and abiotic conditions, in which oxidative stress is involved. © The Author [2006]. Published by Oxford University Press [on behalf of the Society for Experimental Biology]. All rights reserved.
Resumo:
In DNA microarray experiments, the gene fragments that are spotted on the slides are usually obtained by the synthesis of specific oligonucleotides that are able to amplify genes through PCR. Shotgun library sequences are an alternative to synthesis of primers for the study of each gene in the genome. The possibility of putting thousands of gene sequences into a single slide allows the use of shotgun clones in order to proceed with microarray analysis without a completely sequenced genome. We developed an OC Identifier tool (optimal clone identifier for genomic shotgun libraries) for the identification of unique genes in shotgun libraries based on a partially sequenced genome; this allows simultaneous use of clones in projects such as transcriptome and phylogeny studies, using comparative genomic hybridization and genome assembly. The OC Identifier tool allows comparative genome analysis, biological databases, query language in relational databases, and provides bioinformatics tools to identify clones that contain unique genes as alternatives to primer synthesis. The OC Identifier allows analysis of clones during the sequencing phase, making it possible to select genes of interest for construction of a DNA microarray. ©FUNPEC-RP.
Resumo:
MicroRNAs (miRNAs) are small non-coding RNAs that regulate target gene expression and hence play important roles in metabolic pathways. Recent studies have evidenced the interrelation of miRNAs with cell proliferation, differentiation, development, and diseases. Since they are involved in gene regulation, they are intrinsically related to metabolic pathways. This leads to questions that are particularly interesting for investigating medical and laboratorial applications. We developed an miRNApath online database that uses miRNA target genes to link miRNAs to metabolic pathways. Currently, databases about miRNA target genes (DIANA miRGen), genomic maps (miRNAMap) and sequences (miRBase) do not provide such correlations. Additionally, miRNApath offers five search services and a download area. For each search, there is a specific type of input, which can be a list of target genes, miRNAs, or metabolic pathways, which results in different views, depending upon the input data, concerning relationships between the target genes, miRNAs and metabolic pathways. There are also internal links that lead to a deeper analysis and cross-links to other databases with more detailed information. miRNApath is being continually updated and is available at http://lgmb.fmrp.usp.br/mirnapath. ©FUNPEC-RP.
Resumo:
Thirteen spontaneous multiple-antibiotic-resistant (Mar) mutants of Escherichia coli AG100 were isolated on Luria-Bertani (LB) agar in the presence of tetracycline (4 microg/ml). The phenotype was linked to insertion sequence (IS) insertions in marR or acrR or unstable large tandem genomic amplifications which included acrAB and which were bordered by IS3 or IS5 sequences. Five different lon mutations, not related to the Mar phenotype, were also found in 12 of the 13 mutants. Under specific selective conditions, most drug-resistant mutants appearing late on the selective plates evolved from a subpopulation of AG100 with lon mutations. That the lon locus was involved in the evolution to low levels of multidrug resistance was supported by the following findings: (i) AG100 grown in LB broth had an important spontaneous subpopulation (about 3.7x10(-4)) of lon::IS186 mutants, (ii) new lon mutants appeared during the selection on antibiotic-containing agar plates, (iii) lon mutants could slowly grow in the presence of low amounts (about 2x MIC of the wild type) of chloramphenicol or tetracycline, and (iv) a lon mutation conferred a mutator phenotype which increased IS transposition and genome rearrangements. The association between lon mutations and mutations causing the Mar phenotype was dependent on the medium (LB versus MacConkey medium) and the antibiotic used for the selection. A previously reported unstable amplifiable high-level resistance observed after the prolonged growth of Mar mutants in a low concentration of tetracycline or chloramphenicol can be explained by genomic amplification.
Resumo:
The cell matrix adhesion regulator (CMAR) gene has been suggested to be a signal transduction molecule influencing cell adhesion to collagen and, through this, possibly involved in tumor suppression. The originally reported CMAR cDNA was 464 bp long with a tyrosine phosphorylation site at the extreme 3′ end, which mutagenesis studies had shown to be central to the function of this gene. Since the discovery of a 4-bp insertion polymorphism within the originally reported coding region, further sequence information has been obtained. The cDNA has been extended 5′ by ≈2 kb revealing a 559-bp region showing strong homology to the proposed 5′ untranslated sequence of a murine protein kinase receptor family member, variant in kinase (vik). CMAR genomic sequencing has shown the presence of an intron, the intron/exon boundary lying within this region of homology. An RNA transcript for CMAR of ≈2.5 kb has also been identified. The data suggest complex mechanisms for control of expression of two closely associated genes, CMAR and the vik- associated sequence.
Resumo:
In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25 320 structural domains and a further 160 000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153–165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.
The Zebrafish Information Network (ZFIN): a resource for genetic, genomic and developmental research
Resumo:
The Zebrafish Information Network, ZFIN, is a WWW community resource of zebrafish genetic, genomic and developmental research information (http://zfin.org). ZFIN provides an anatomical atlas and dictionary, developmental staging criteria, research methods, pathology information and a link to the ZFIN relational database (http://zfin.org/ZFIN/). The database, built on a relational, object-oriented model, provides integrated information about mutants, genes, genetic markers, mapping panels, publications and contact information for the zebrafish research community. The database is populated with curated published data, user submitted data and large dataset uploads. A broad range of data types including text, images, graphical representations and genetic maps supports the data. ZFIN incorporates links to other genomic resources that provide sequence and ortholog data. Zebrafish nomenclature guidelines and an automated registration mechanism for new names are provided. Extensive usability testing has resulted in an easy to learn and use forms interface with complex searching capabilities.
Resumo:
ACTIVITY is a database on DNA/RNA site sequences with known activity magnitudes, measurement systems, sequence-activity relationships under fixed experimental conditions and procedures to adapt these relationships from one measurement system to another. This database deposits information on DNA/RNA affinities to proteins and cell nuclear extracts, cutting efficiencies, gene transcription activity, mRNA translation efficiencies, mutability and other biological activities of natural sites occurring within promoters, mRNA leaders, and other regulatory regions in pro- and eukaryotic genomes, their mutant forms and synthetic analogues. Since activity magnitudes are heavily system-dependent, the current version of ACTIVITY is supplemented by three novel sub-databases: (i) SYSTEM, measurement systems; (ii) KNOWLEDGE, sequence-activity relationships under fixed experimental conditions; and (iii) CROSS_TEST, procedures adapting a relationship from one measurement system to another. These databases are useful in molecular biology, pharmacogenetics, metabolic engineering, drug design and biotechnology. The databases can be queried using SRS and are available through the Web, http://wwwmgs.bionet.nsc.ru/systems/Activity/.
Resumo:
VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html.
Resumo:
The iProClass database is an integrated resource that provides comprehensive family relationships and structural and functional features of proteins, with rich links to various databases. It is extended from ProClass, a protein family database that integrates PIR superfamilies and PROSITE motifs. The iProClass currently consists of more than 200 000 non-redundant PIR and SWISS-PROT proteins organized with more than 28 000 superfamilies, 2600 domains, 1300 motifs, 280 post-translational modification sites and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature and taxonomy. Protein and family summary reports provide rich annotations, including membership information with length, taxonomy and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references and graphical feature display. The database facilitates classification-driven annotation for protein sequence databases and complete genomes, and supports structural and functional genomic research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence search and report retrieval at http://pir.georgetow n.edu/iproclass/.
Resumo:
A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT–AG junctions (22 199 entries) and 0.56% have non-canonical GC–AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC–AG pairs (of which one was an error that corrected to GC–AG), 61 errors corrected to GT–AG canonical pairs, six AT–AC pairs (of which two were errors corrected to AT–AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac.uk/spldb/SpliceDB.html and at http://www.softberry.com/spldb/SpliceDB.html.
Resumo:
This report describes an efficient strategy for determining the functions of sequenced genes in microorganisms. A large population of cells is subjected to insertional mutagenesis. The mutagenized population is then divided into representative samples, each of which is subjected to a different selection. DNA is prepared from each sample population after the selection. The polymerase chain reaction is then used to determine retrospectively whether insertions into a particular sequence affected the outcome of any selection. The method is efficient because the insertional mutagenesis and each selection need only to be performed once to enable the functions of thousands of genes to be investigated, rather than once for each gene. We tested this "genetic footprinting" strategy using the model organism Saccharomyces cerevisiae.