102 resultados para Genomic sequence database


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi.shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Defects in the XPG DNA repair endonuclease gene can result in the cancer-prone disorders xeroderma pigmentosum (XP) or the XP–Cockayne syndrome complex. While the XPG cDNA sequence was known, determination of the genomic sequence was required to understand its different functions. In cells from normal donors, we found that the genomic sequence of the human XPG gene spans 30 kb, contains 15 exons that range from 61 to 1074 bp and 14 introns that range from 250 to 5763 bp. Analysis of the splice donor and acceptor sites using an information theory-based approach revealed three splice sites with low information content, which are components of the minor (U12) spliceosome. We identified six alternatively spliced XPG mRNA isoforms in cells from normal donors and from XPG patients: partial deletion of exon 8, partial retention of intron 8, two with alternative exons (in introns 1 and 6) and two that retained complete introns (introns 3 and 9). The amount of alternatively spliced XPG mRNA isoforms varied in different tissues. Most alternative splice donor and acceptor sites had a relatively high information content, but one has the U12 spliceosome sequence. A single nucleotide polymorphism has allele frequencies of 0.74 for 3507G and 0.26 for 3507C in 91 donors. The human XPG gene contains multiple splice sites with low information content in association with multiple alternatively spliced isoforms of XPG mRNA.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The biological significance of DNA amplification in cancer is thought to be due to the selection of increased expression of a single or few important genes. However, systematic surveys of the copy number and expression of all genes within an amplified region of the genome have not been performed. Here we have used a combination of molecular, genomic, and microarray technologies to identify target genes for 17q23, a common region of amplification in breast cancers with poor prognosis. Construction of a 4-Mb genomic contig made it possible to define two common regions of amplification in breast cancer cell lines. Analysis of 184 primary breast tumors by fluorescence in situ hybridization on tissue microarrays validated these results with the highest amplification frequency (12.5%) observed for the distal region. Based on GeneMap'99 information, 17 known genes and 26 expressed sequence tags were localized to the contig. Analysis of genomic sequence identified 77 additional transcripts. A comprehensive analysis of expression levels of these transcripts in six breast cancer cell lines was carried out by using complementary DNA microarrays. The expression patterns varied from one cell line to another, and several overexpressed genes were identified. Of these, RPS6KB1, MUL, APPBP2, and TRAP240 as well as one uncharacterized expressed sequence tag were located in the two common amplified regions. In summary, comprehensive analysis of the 17q23 amplicon revealed a limited number of highly expressed genes that may contribute to the more aggressive clinical course observed in breast cancer patients with 17q23-amplified tumors.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Since 1991, the Rice Genome Research Program in Japan has carried out rice genomics, such as large-scale cDNA analysis, construction of a fine-scale restriction fragment length polymorphism map, and physical mapping of the rice genome with yeast artificial chromosome clones. These studies have made a great impact on research into grass genomes and made rice a model plant for other cereal crop research. Starting in 1998, the Rice Genome Research Program will step into a new stage of genomics—that of genome sequencing. This project eventually should reveal all of the genomic sequence information in the rice plant and be an indispensable aid in understanding the genomics of other grass species.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Male infertility, affecting as many as 10% of the adult population, is an extremely prevalent disorder. In most cases, the cause of the condition is unknown, and genetic factors that might affect male fertility, other than some sequences on the Y chromosome, have not been identified. We report here that male mice heterozygous for a targeted mutation of the apolipoprotein B (apo B) gene exhibit severely compromised fertility. Sperm from these mice failed to fertilize eggs both in vivo and in vitro. However, these sperm were able to fertilize eggs once the zona pellucida was removed but displayed persistent abnormal binding to the egg after fertilization. In vitro fertilization-related and other experiments revealed reduced sperm motility, survival time, and sperm count also contributed to the infertility phenotype. Recognition of the infertility phenotype led to the identification of apo B mRNA in the testes and epididymides of normal mice, and these transcripts were substantially reduced in the affected animals. Moreover, when the genomic sequence encoding human apo B was introduced into these animals, normal fertility was restored. These findings suggest that this genetic locus may have an important impact on male fertility and identify a previously unrecognized function for apo B.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An entire gene encoding wheat (var. Hard Red Winter Tam 107) acetyl-CoA carboxylase [ACCase; acetyl-CoA:carbon-dioxide ligase (ADP-forming), EC 6.4.1.2] has been cloned and sequenced. Comparison of the 12-kb genomic sequence with the 7.4-kb cDNA sequence reported previously revealed 29 introns. Within the coding region, the exon sequence is 98% identical to the known wheat cDNA sequence. A second ACCase gene was identified by sequencing fragments of genomic clones that include the first two exons and the first intron. Additional transcripts were detected by 5' and 3' RACE analysis (rapid amplification of cDNA ends). One set of transcripts had a 5' end sequence identical to the cDNA found previously and another set was identical to the gene reported here. The 3' RACE clones fall into four distinguishable sequence sets, bringing the number of ACCase sequences to six. None of these cDNA or genomic clones encodes a chloroplast targeting signal. Identification of six different sequences suggests that either the cytosolic ACCase genes are duplicated in the three chromosome sets in hexaploid wheat or that each of the six alleles of the cytosolic ACCase gene has a readily distinguishable DNA sequence.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Golgi alpha-mannosidase II (alpha-MII) is an enzyme involved in the processing of N-linked glycans. Using a previously isolated murine cDNA clone as a probe, we have isolated cDNA clones encompassing the human alpha-MII cDNA open reading frame and initiated isolation of human genomic clones. During the isolation of genomic clones, genes related to that encoding alpha-MII were isolated. One such gene was found to encode an isozyme, designated alpha-MIIx. A 5-kb cDNA clone encoding alpha-MIIx was then isolated from a human melanoma cDNA library. However, comparison between alpha-MIIx and alpha-MII cDNAs suggested that the cloned cDNA encodes a truncated polypeptide with 796 amino acid residues, while alpha-MII consists of 1144 amino acid residues. To reevaluate the sequence of alpha-MIIx cDNA, polymerase chain reaction (PCR) was performed with lymphocyte mRNAs. Comparison of the sequence of PCR products with the alpha-MIIx genomic sequence revealed that alternative splicing of the alpha-MIIx transcript can result in an additional transcript encoding a 1139-amino acid polypeptide. Northern analysis showed transcription of alpha-MIIx in various tissues, suggesting that the alpha-MIIx gene is a housekeeping gene. COS cells transfected with alpha-MIIx cDNA containing the full-length open reading frame showed an increase of alpha-mannosidase activity. The alpha-MIIx gene was mapped to human chromosome 15q25, whereas the alpha-MII gene was mapped to 5q21-22.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The biosynthesis of gibberellins (GAs) after GA12-aldehyde involves a series of oxidative steps that lead to the formation of bioactive GAs. Previously, a cDNA clone encoding a GA 20-oxidase [gibberellin, 2-oxoglutarate:oxygen oxidoreductase (20-hydroxylating, oxidizing), EC 1.14.11.-] was isolated by immunoscreening a cDNA library from liquid endosperm of pumpkin (Cucurbita maxima L.) with antibodies against partially purified GA 20-oxidase. Here, we report isolation of a genomic clone for GA 20-oxidase from a genomic library of the long-day species Arabidopsis thaliana Heynh., strain Columbia, by using the pumpkin cDNA clone as a heterologous probe. This genomic clone contains a GA 20-oxidase gene that consists of three exons and two introns. The three exons are 1131-bp long and encode 377 amino acid residues. A cDNA clone corresponding to the putative GA 20-oxidase genomic sequence was constructed with the reverse transcription-PCR method, and the identity of the cDNA clone was confirmed by analyzing the capability of the fusion protein expressed in Escherichia coli to convert GA53 to GA44 and GA19 to GA20. The Arabidopsis GA 20-oxidase shares 55% identity and > 80% similarity with the pumpkin GA 20-oxidase at the derived amino acid level. Both GA 20-oxidases share high homology with other 2-oxoglutarate-dependent dioxygenases (2-ODDs), but the highest homology was found between the two GA 20-oxidases. Mapping results indicated tight linkage between the cloned GA 20-oxidase and the GA5 locus of Arabidopsis. The ga5 semidwarf mutant contains a G-->A point mutation that inserts a translational stop codon in the protein-coding sequence, thus confirming that the GA5 locus encodes GA 20-oxidase. Expression of the GA5 gene in Ara-bidopsis leaves was enhanced after plants were transferred from short to long days; it was reduced by GA4 treatment, suggesting end-product repression in the GA biosynthetic pathway.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The cell matrix adhesion regulator (CMAR) gene has been suggested to be a signal transduction molecule influencing cell adhesion to collagen and, through this, possibly involved in tumor suppression. The originally reported CMAR cDNA was 464 bp long with a tyrosine phosphorylation site at the extreme 3′ end, which mutagenesis studies had shown to be central to the function of this gene. Since the discovery of a 4-bp insertion polymorphism within the originally reported coding region, further sequence information has been obtained. The cDNA has been extended 5′ by ≈2 kb revealing a 559-bp region showing strong homology to the proposed 5′ untranslated sequence of a murine protein kinase receptor family member, variant in kinase (vik). CMAR genomic sequencing has shown the presence of an intron, the intron/exon boundary lying within this region of homology. An RNA transcript for CMAR of ≈2.5 kb has also been identified. The data suggest complex mechanisms for control of expression of two closely associated genes, CMAR and the vik- associated sequence.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25 320 structural domains and a further 160 000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153–165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homo­logous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Zebrafish Information Network, ZFIN, is a WWW community resource of zebrafish genetic, genomic and developmental research information (http://zfin.org). ZFIN provides an anatomical atlas and dictionary, developmental staging criteria, research methods, pathology information and a link to the ZFIN relational database (http://zfin.org/ZFIN/). The database, built on a relational, object-oriented model, provides integrated information about mutants, genes, genetic markers, mapping panels, publications and contact information for the zebrafish research community. The database is populated with curated published data, user submitted data and large dataset uploads. A broad range of data types including text, images, graphical representations and genetic maps supports the data. ZFIN incorporates links to other genomic resources that provide sequence and ortholog data. Zebrafish nomenclature guidelines and an automated registration mechanism for new names are provided. Extensive usability testing has resulted in an easy to learn and use forms interface with complex searching capabilities.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

ACTIVITY is a database on DNA/RNA site sequences with known activity magnitudes, measurement systems, sequence-activity relationships under fixed experimental conditions and procedures to adapt these relationships from one measurement system to another. This database deposits information on DNA/RNA affinities to proteins and cell nuclear extracts, cutting efficiencies, gene transcription activity, mRNA translation efficiencies, mutability and other biological activities of natural sites occurring within promoters, mRNA leaders, and other regulatory regions in pro- and eukaryotic genomes, their mutant forms and synthetic analogues. Since activity magnitudes are heavily system-dependent, the current version of ACTIVITY is supplemented by three novel sub-databases: (i) SYSTEM, measurement systems; (ii) KNOWLEDGE, sequence-activity relationships under fixed experimental conditions; and (iii) CROSS_TEST, procedures adapting a relationship from one measurement system to another. These databases are useful in molecular biology, pharmacogenetics, metabolic engineering, drug design and biotechnology. The databases can be queried using SRS and are available through the Web, http://wwwmgs.bionet.nsc.ru/systems/Activity/.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The iProClass database is an integrated resource that provides comprehensive family relationships and structural and functional features of proteins, with rich links to various databases. It is extended from ProClass, a protein family database that integrates PIR superfamilies and PROSITE motifs. The iProClass currently consists of more than 200 000 non-redundant PIR and SWISS-PROT proteins organized with more than 28 000 superfamilies, 2600 domains, 1300 motifs, 280 post-translational modification sites and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature and taxonomy. Protein and family summary reports provide rich annotations, including membership information with length, taxonomy and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references and graphical feature display. The database facilitates classification-driven annotation for protein sequence databases and complete genomes, and supports structural and functional genomic research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence search and report retrieval at http://pir.georgetow n.edu/iproclass/.