956 resultados para Genomic sequence database
Resumo:
The recently accomplished complete genomic sequence analysis of the type strain PG1 of Mycoplasma mycoides subsp. mycoides small-colony type revealed four large repeated segments of 24, 13, 12, and 8 kb that are flanked by insertion sequence (IS) elements. Genetic analysis of type strain PG1 and African, European, and Australian field and vaccine strains revealed that the 24-kb genetic locus is repeated only in PG1 and not in other M. mycoides subsp. mycoides SC strains. In contrast, the 13-kb genetic locus was found duplicated in some strains originating from Africa and Australia but not in strains that were isolated from the European outbreaks. The 12- and 8-kb genetic loci were found in two and three copies, respectively, in all 28 strains analyzed. The flanking IS elements are assumed to lead to these tandem duplications, thus contributing to genomic plasticity. This aspect must be considered when designing novel diagnostic approaches and recombinant vaccines.
Resumo:
The growing knowledge on physiology, cell biology and biochemistry of the reproductive organs has provided many insights into molecular mechanisms that are required for successful reproduction. Research directed at the investigation of reproduction physiology in domestic animals was hampered in the past by a lack of species-specific genomic information. The genome sequences of dog, cattle and horse have become publicly available in 2005, 2006 and 2007 respectively. Although the gene content of mammalian genomes is generally very similar, genes involved in reproduction tend to be less conserved than the average mammalian gene. The availability of genome sequences provides a valuable resource to check whether any protein that may be known from human or mouse research is present in cattle and/or horse as well. Currently there are more than 200 genes known that are involved in the production of fertile sperm cells. Great progress has been made in the understanding of genetic aberrations that lead to male infertility. Additionally, the first genetic mechanisms are being discovered that contribute to the quantitative variation of fertility traits in fertile male animals. Here, I will review some selected aspects of genetic research in male fertility and offer some perspectives for the use of genomic sequence information.
Resumo:
The intensely studied MHC has become the paradigm for understanding the architectural evolution of vertebrate multigene families. The 4-Mb human MHC (also known as the HLA complex) encodes genes critically involved in the immune response, graft rejection, and disease susceptibility. Here we report the continuous 1,796,938-bp genomic sequence of the HLA class I region, linking genes between MICB and HLA-F. A total of 127 genes or potentially coding sequences were recognized within the analyzed sequence, establishing a high gene density of one per every 14.1 kb. The identification of 758 microsatellite provides tools for high-resolution mapping of HLA class I-associated disease genes. Most importantly, we establish that the repeated duplication and subsequent diversification of a minimal building block, MIC-HCGIX-3.8–1-P5-HCGIV-HLA class I-HCGII, engendered the present-day MHC. That the currently nonessential HLA-F and MICE genes have acted as progenitors to today’s immune-competent HLA-ABC and MICA/B genes provides experimental evidence for evolution by “birth and death,” which has general relevance to our understanding of the evolutionary forces driving vertebrate multigene families.
Resumo:
Null mutations at the misato locus of Drosophila melanogaster are associated with irregular chromosomal segregation at cell division. The consequences for morphogenesis are that mutant larvae are almost devoid of imaginal disk tissue, have a reduction in brain size, and die before the late third-instar larval stage. To analyze these findings, we isolated cDNAs in and around the misato locus, mapped the breakpoints of chromosomal deficiencies, determined which transcript corresponded to the misato gene, rescued the cell division defects in transgenic organisms, and sequenced the genomic DNA. Database searches revealed that misato codes for a novel protein, the N-terminal half of which contains a mixture of peptide motifs found in α-, β-, and γ-tubulins, as well as a motif related to part of the myosin heavy chain proteins. The sequence characteristics of misato indicate either that it arose from an ancestral tubulin-like gene, different parts of which underwent convergent evolution to resemble motifs in the conventional tubulins, or that it arose by the capture of motifs from different tubulin genes. The Saccharomyces cerevisiae genome lacks a true homolog of the misato gene, and this finding highlights the emerging problem of assigning functional attributes to orphan genes that occur only in some evolutionary lineages.
Resumo:
We examined the MLL genomic translocation breakpoint in acute myeloid leukemia of infant twins. Southern blot analysis in both cases showed two identical MLL gene rearrangements indicating chromosomal translocation. The rearrangements were detectable in the second twin before signs of clinical disease and the intensity relative to the normal fragment indicated that the translocation was not constitutional. Fluorescence in situ hybridization with an MLL-specific probe and karyotype analyses suggested t(11;22)(q23;q11.2) disrupting MLL. Known 5′ sequence from MLL but unknown 3′ sequence from chromosome band 22q11.2 formed the breakpoint junction on the der(11) chromosome. We used panhandle variant PCR to clone the translocation breakpoint. By ligating a single-stranded oligonucleotide that was homologous to known 5′ MLL genomic sequence to the 5′ ends of BamHI-digested DNA through a bridging oligonucleotide, we formed the stem–loop template for panhandle variant PCR which yielded products of 3.9 kb. The MLL genomic breakpoint was in intron 7. The sequence of the partner DNA from band 22q11.2 was identical to the hCDCrel (human cell division cycle related) gene that maps to the region commonly deleted in DiGeorge and velocardiofacial syndromes. Both MLL and hCDCrel contained homologous CT, TTTGTG, and GAA sequences within a few base pairs of their respective breakpoints, which may have been important in uniting these two genes by translocation. Reverse transcriptase-PCR amplified an in-frame fusion of MLL exon 7 to hCDCrel exon 3, indicating that an MLL-hCDCrel chimeric mRNA had been transcribed. Panhandle variant PCR is a powerful strategy for cloning translocation breakpoints where the partner gene is undetermined. This application of the method identified a region of chromosome band 22q11.2 involved in both leukemia and a constitutional disorder.
Resumo:
Expressed sequence tags (ESTs) are randomly sequenced cDNA clones. Currently, nearly 3 million human and 2 million mouse ESTs provide valuable resources that enable researchers to investigate the products of gene expression. The EST databases have proven to be useful tools for detecting homologous genes, for exon mapping, revealing differential splicing, etc. With the increasing availability of large amounts of poorly characterised eukaryotic (notably human) genomic sequence, ESTs have now become a vital tool for gene identification, sometimes yielding the only unambiguous evidence for the existence of a gene expression product. However, BLAST-based Web servers available to the general user have not kept pace with these developments and do not provide appropriate tools for querying EST databases with large highly spliced genes, often spanning 50 000–100 000 bases or more. Here we describe Gene2EST (http://woody.embl-heidelberg.de/gene2est/), a server that brings together a set of tools enabling efficient retrieval of ESTs matching large DNA queries and their subsequent analysis. RepeatMasker is used to mask dispersed repetitive sequences (such as Alu elements) in the query, BLAST2 for searching EST databases and Artemis for graphical display of the findings. Gene2EST combines these components into a Web resource targeted at the researcher who wishes to study one or a few genes to a high level of detail.
Resumo:
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Resumo:
The function of the prion protein gene (PRNP) and its normal product PrPC is elusive. We used comparative genomics as a strategy to understand the normal function of PRNP. As the reliability of comparisons increases with the number of species and increased evolutionary distance, we isolated and sequenced a 66.5 kb BAC containing the PRNP gene from a distantly related mammal, the model Australian marsupial Macropus eugenii (tammar wallaby). Marsupials are separated from eutherians such as human and mouse by roughly 180 million years of independent evolution. We found that tammar PRNP, like human PRNP, has two exons. Prion proteins encoded by the tammar wallaby and a distantly related marsupial, Monodelphis domestica (Brazilian opossum) PRNP contain proximal PrP repeats with a distinct, marsupial-specific composition and a variable number. Comparisons of tammar wallaby PRNP with PRNPs from human, mouse, bovine and ovine allowed us to identify non-coding gene regions conserved across the marsupial-eutherian evolutionary distance, which are candidates for regulatory regions. In the PRNP 3' UTR we found a conserved signal for nuclear-specific polyadenylation and the putative cytoplasmic polyadenylation element (CPE), indicating that post-transcriptional control of PRNP mRNA activity is important. Phylogenetic footprinting revealed conserved potential binding sites for the MZF-1 transcription factor in both upstream promoter and intron/intron 1, and for the MEF2, MyTI, Oct-1 and NFAT transcription factors in the intron(s). The presence of a conserved NFAT-binding site and CPE indicates involvement of PrPC in signal transduction and synaptic plasticity. (c) 2004 Elsevier B.V. All rights reserved.
Resumo:
Background: Approximately 40% of mammalian mRNA sequences contain AUG trinucleotides upstream of the main coding sequence, with a quarter of these AUGs demarcating open reading frames of 20 or more codons. In order to investigate whether these open reading frames may encode functional peptides, we have carried out a comparative genomic analysis of human and mouse mRNA 'untranslated regions' using sequences from the RefSeq mRNA sequence database. Results: We have identified over 200 upstream open reading frames which are strongly conserved between the human and mouse genomes. Consensus sequences associated with efficient initiation of translation are overrepresented at the AUG trinucleotides of these upstream open reading frames, while comparative analysis of their DNA and putative peptide sequences shows evidence of purifying selection. Conclusion: The occurrence of a large number of conserved upstream open reading frames, in association with features consistent with protein translation, strongly suggests evolutionary maintenance of the coding sequence and indicates probable functional expression of the peptides encoded within these upstream open reading frames.
Resumo:
The flood of new genomic sequence information together with technological innovations in protein structure determination have led to worldwide structural genomics (SG) initiatives. The goals of SG initiatives are to accelerate the process of protein structure determination, to fill in protein fold space and to provide information about the function of uncharacterized proteins. In the long-term, these outcomes are likely to impact on medical biotechnology and drug discovery, leading to a better understanding of disease as well as the development of new therapeutics. Here we describe the high throughput pipeline established at the University of Queensland in Australia. In this focused pipeline, the targets for structure determination are proteins that are expressed in mouse macrophage cells and that are inferred to have a role in innate immunity. The aim is to characterize the molecular structure and the biochemical and cellular function of these targets by using a parallel processing pipeline. The pipeline is designed to work with tens to hundreds of target gene products and comprises target selection, cloning, expression, purification, crystallization and structure determination. The structures from this pipeline will provide insights into the function of previously uncharacterized macrophage proteins and could lead to the validation of new drug targets for chronic obstructive pulmonary disease and arthritis. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
An important topic in genomic sequence analysis is the identification of protein coding regions. In this context, several coding DNA model-independent methods based on the occurrence of specific patterns of nucleotides at coding regions have been proposed. Nonetheless, these methods have not been completely suitable due to their dependence on an empirically predefined window length required for a local analysis of a DNA region. We introduce a method based on a modified Gabor-wavelet transform (MGWT) for the identification of protein coding regions. This novel transform is tuned to analyze periodic signal components and presents the advantage of being independent of the window length. We compared the performance of the MGWT with other methods by using eukaryote data sets. The results show that MGWT outperforms all assessed model-independent methods with respect to identification accuracy. These results indicate that the source of at least part of the identification errors produced by the previous methods is the fixed working scale. The new method not only avoids this source of errors but also makes a tool available for detailed exploration of the nucleotide occurrence.
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
BACKGROUND: Cancer/testis (CT) genes are normally expressed only in germ cells, but can be activated in the cancer state. This unusual property, together with the finding that many CT proteins elicit an antigenic response in cancer patients, has established a role for this class of genes as targets in immunotherapy regimes. Many families of CT genes have been identified in the human genome, but their biological function for the most part remains unclear. While it has been shown that some CT genes are under diversifying selection, this question has not been addressed before for the class as a whole. RESULTS: To shed more light on this interesting group of genes, we exploited the generation of a draft chimpanzee (Pan troglodytes) genomic sequence to examine CT genes in an organism that is closely related to human, and generated a high-quality, manually curated set of human:chimpanzee CT gene alignments. We find that the chimpanzee genome contains homologues to most of the human CT families, and that the genes are located on the same chromosome and at a similar copy number to those in human. Comparison of putative human:chimpanzee orthologues indicates that CT genes located on chromosome X are diverging faster and are undergoing stronger diversifying selection than those on the autosomes or than a set of control genes on either chromosome X or autosomes. CONCLUSION: Given their high level of diversifying selection, we suggest that CT genes are primarily responsible for the observed rapid evolution of protein-coding genes on the X chromosome.
Current millennium biotechniques for biomedical research on parasites and host-parasite interactions
Resumo:
The development of biotechnology in the last three decades has generated the feeling that the newest scientific achievements will deliver high standard quality of life through abundance of food and means for successfully combating diseases. Where the new biotechnologies give access to genetic information, there is a common belief that physiological and pathological processes result from subtle modifications of gene expression. Trustfully, modern genetics has produced genetic maps, physical maps and complete nucleotide sequences from 141 viruses, 51 organelles, two eubacteria, one archeon and one eukaryote (Saccharomices cerevisiae). In addition, during the Centennial Commemoration of the Oswaldo Cruz Institute the nearly complete human genome map was proudly announced, whereas the latest Brazilian key stone contribution to science was the publication of the Shillela fastidiosa genomic sequence highlythed on a Nature cover issue. There exists a belief among the populace that further scientific accomplishments will rapidly lead to new drugs and methodological approaches to cure genetic diseases and other incurable ailments. Yet, much evidence has been accumulated, showing that a large information gap exists between the knowledge of genome sequence and our knowledge of genome function. Now that many genome maps are available, people wish to know what are we going to do with them. Certainly, all these scientific accomplishments will shed light on many more secrets of life. Nevertheless, parsimony in the weekly announcements of promising scientific achievements is necessary. We also need many more creative experimental biologists to discover new, as yet un-envisaged biotechnological approaches, and the basic resource needed for carrying out mile stone research necessary for leading us to that "promised land"often proclaimed by the mass media.