935 resultados para Complete Genome Sequence
Resumo:
To identify novel quantitative trait loci (QTL) within horses, we performed genome-wide association studies (GWAS) based on sequence-level genotypes for conformation and performance traits in the Franches-Montagnes (FM) horse breed. Sequence-level genotypes of FM horses were derived by re-sequencing 30 key founders and imputing 50K data of genotyped horses. In total, we included 1077 FM horses genotyped for ~4 million SNPs and their respective de-regressed breeding values of the traits in the analysis. Based on this dataset, we identified a total of 14 QTL associated with 18 conformation traits and one performance trait. Therefore, our results suggest that the application of sequence-derived genotypes increases the power to identify novel QTL which were not identified previously based on 50K SNP chip data.
Resumo:
A complete reference genome of the Apis mellifera Filamentous virus (AmFV) was determined using Illumina Hiseq sequencing. The AmFV genome is a double stranded DNA molecule of approximately 498,500 nucleotides with a GC content of 50.8%. It encompasses 247 non-overlapping open reading frames (ORFs), equally distributed on both strands, which cover 65% of the genome. While most of the ORFs lacked threshold sequence alignments to reference protein databases, twenty-eight were found to display significant homologies with proteins present in other large double stranded DNA viruses. Remarkably, 13 ORFs had strong similarity with typical baculovirus domains such as PIFs (per os infectivity factor genes: pif-1, pif-2, pif-3 and p74) and BRO (Baculovirus Repeated Open Reading Frame). The putative AmFV DNA polymerase is of type B, but is only distantly related to those of the baculoviruses. The ORFs encoding proteins involved in nucleotide metabolism had the highest percent identity to viral proteins in GenBank. Other notable features include the presence of several collagen-like, chitin-binding, kinesin and pacifastin domains. Due to the large size of the AmFV genome and the inconsistent affiliation with other large double stranded DNA virus families infecting invertebrates, AmFV may belong to a new virus family.
Resumo:
Cosmids from the 1A3–1A10 region of the complete miniset were individually subcloned by using the vector M13 mp18. Sequences of each cosmid were assembled from about 400 DNA fragments generated from the ends of these phage subclones and merged into one 189-kb contig. About 160 ORFs identified by the CodonUse program were subjected to similarity searches. The biological functions of 80 ORFs could be assigned reliably by using the WIT and Magpie genome investigation tools. Eighty percent of these recognizable ORFs were organized in functional clusters, which simplified assignment decisions and increased the strength of the predictions. A set of 26 genes for cobalamin biosynthesis, genes for polyhydroxyalkanoic acid metabolism, DNA replication and recombination, and DNA gyrase were among those identified. Most of the ORFs lacking significant similarity with reference databases also were grouped. There are two large clusters of these ORFs, one located between 45 and 67 kb of the map, and the other between 150 and 183 kb. Nine of the loosely identified ORFs (of 15) of the first of these clusters match ORFs from phages or transposons. The other cluster also has four ORFs of possible phage origin.
Resumo:
VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html.
Resumo:
The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.
Resumo:
Reovirus genome segment S1 encodes protein σ1, which is the receptor binding protein, modulates tissue tropism, and specifies the nature of the antiviral immune response. It makes up less than 2% of reovirus particles and is synthesized in very small amounts in infected cells. Any antiviral strategy aimed at reducing specifically the expression of this genome segment should, in principle, reduce the infectivity of the virus. To test this hypothesis, we have assembled two hammer-head motif-containing ribozymes (Rzs) targeted to cleave at the conserved B and C domains of the reovirus s1 RNA. Protein-independent but Mg2+-dependent sequence-specific cleavage of s1 RNA was achieved by both the Rzs in trans. Cells that transiently express these Rzs, when challenged with reovirus, were protected against the cytopathic effects caused by the virus. This protection correlated with the specific intracellular reduction of s1 transcripts that was due to their cleavage by the Rzs. Rz-treated cells that were challenged with reovirus showed almost complete disappearance of protein σ1 without significantly altering the levels of the other reovirus structural proteins. Thus, Rzs, besides acting as antiviral agents, could be exploited as biological tools to delineate specific functions of target genes.
Resumo:
We present a method for discovering conserved sequence motifs from families of aligned protein sequences. The method has been implemented as a computer program called emotif (http://motif.stanford.edu/emotif). Given an aligned set of protein sequences, emotif generates a set of motifs with a wide range of specificities and sensitivities. emotif also can generate motifs that describe possible subfamilies of a protein superfamily. A disjunction of such motifs often can represent the entire superfamily with high specificity and sensitivity. We have used emotif to generate sets of motifs from all 7,000 protein alignments in the blocks and prints databases. The resulting database, called identify (http://motif.stanford.edu/identify), contains more than 50,000 motifs. For each alignment, the database contains several motifs having a probability of matching a false positive that range from 10−10 to 10−5. Highly specific motifs are well suited for searching entire proteomes, while generating very few false predictions. identify assigns biological functions to 25–30% of all proteins encoded by the Saccharomyces cerevisiae genome and by several bacterial genomes. In particular, identify assigned functions to 172 of proteins of unknown function in the yeast genome.
Resumo:
We have analyzed the developmental molecular programs of the mouse hippocampus, a cortical structure critical for learning and memory, by means of large-scale DNA microarray techniques. Of 11,000 genes and expressed sequence tags examined, 1,926 showed dynamic changes during hippocampal development from embryonic day 16 to postnatal day 30. Gene-cluster analysis was used to group these genes into 16 distinct clusters with striking patterns that appear to correlate with major developmental hallmarks and cellular events. These include genes involved in neuronal proliferation, differentiation, and synapse formation. A complete list of the transcriptional changes has been compiled into a comprehensive gene profile database (http://BrainGenomics.Princeton.edu), which should prove valuable in advancing our understanding of the molecular and genetic programs underlying both the development and the functions of the mammalian brain.
Resumo:
The rearrangement of antibody and T-cell receptor gene segments is indispensable to the vertebrate immune response. All extant jawed vertebrates can rearrange these gene segments. This ability is conferred by the recombination activating genes I and II (RAG I and RAG II). To elucidate their origin and function, the cDNA encoding RAG I from a member of the most ancient class of extant gnathostomes, the Carcharhine sharks, was characterized. Homology domains identified within shark RAG I prompted sequence comparison analyses that suggested similarity of the RAG I and II genes, respectively, to the integrase family genes and integration host factor genes of the bacterial site-specific recombination system. Thus, the apparent explosive evolution (or "big bang") of the ancestral immune system may have been initiated by a transfer of microbial site-specific recombinases.
Resumo:
The nucleotide sequence of the human alpha-albumin gene, including 887 bp of the 5'-flanking region and 1311 bp of the 3-flanking region (24,454 in total), was determined from three overlapping lambda phage clones. The sequence spans 22,256 bp from the cap site to the polyadenylylation site, revealing a gene structure of 15 exons separated by 14 introns. The methionine initiation codon ATG is within exon 1; the termination codon TGA is within exon 14. Exon 15 is entirely untranslated and contains the polyadenylylation signal AATAAA. The deduced polypeptide chain is composed of a 21-amino-acid leader peptide, followed by 578 amino acids of the mature protein. There are seven repetitive DNA elements (Alu and Kpn) in the introns and 3-flanking region. The sizes of the 15 alpha-albumin exons match closely those of the albumin, alpha-fetoprotein, and vitamin D-binding protein genes. The exons are symmetrically placed within the three domains of the individual proteins, and they share a characteristic codon splitting pattern that is conserved among members of the gene family. The results provide strong evidence that alpha-albumin belongs to, and most likely completes with, the serum albumin gene family. Based on structural similarity, alpha-albumin appears to be most closely related to alpha-fetoprotein. The complete structure of this family of four tandemly linked genes provides a well-characterized approximately 200 kb locus in the 4q subcentromeric region of the human genome.
Resumo:
We have characterized a family of repetitive DNA elements with homology to the MgPa cellular adhesion operon of Mycoplasma genitalium, a bacterium that has the smallest known genome of any free-living organism. One element, 2272 bp in length and flanked by DNA with no homology to MgPa, was completely sequenced. At least four others were partially sequenced. The complete element is a composite of six regions. Five of these regions show sequence similarity with nonadjacent segments of genes of the MgPa operon. The sixth region, located near the center of the element, is an A+T-rich sequence that has only been found in this repeat family. Open reading frames are present within the five individual regions showing sequence homology to MgPa and the adjacent open reading frame 3 (ORF3) gene. However, termination codons are found between adjacent regions of homology to the MgPa operon and in the A+T-rich sequence. Thus, these repetitive elements do not appear to be directly expressible protein coding sequences. The sequence of one region from five different repetitive elements was compared with the homologous region of the MgPa gene from the type strain G37 and four newly isolated M. genitalium strains. Recombination between repetitive elements of strain G37 and the MgPa operon can explain the majority of polymorphisms within our partial sequences of the MgPa genes of the new isolates. Therefore, we propose that the repetitive elements of M. genitalium provide a reservoir of sequence that contributes to antigenic variation in proteins of the MgPa cellular adhesion operon.
Resumo:
In this paper, we describe the accomplishments of the initial phase of the Human Genome Project, with particular attention to the progress made toward achieving the defined goals for constructing genetic and physical maps of the human genome and determining the sequence of human DNA, identifying the complete set of human genes, and analyzing the need for adequate policies for using the information about human genetics in ways that maximize the benefits for individuals and society.
Resumo:
The bithorax complex (BX-C) of Drosophila, one of two complexes that act as master regulators of the body plan of the fly, is included within a sequence of 338,234 bp (SEQ89E). This paper presents the strategy used in sequencing SEQ89E and an analysis of its open reading frames. The BX-C sequence (BXCALL) contains 314,895 bp obtained by deletion of putative genes that are located at each end of SEQ89E and appear to be functionally unrelated to the BX-C. Only 1.4% of BXCALL codes for the three homeodomain-containing proteins of the complex. Principal findings include a putative ABD-A protein (ABD-AII) larger than a previously known ABD-A protein and a putative glucose transporter-like gene (1521 bp) located at or near the bithoraxoid (bxd), infra-abdominal-2 (iab-2) boundary on the opposite strand relative to that of the homeobox-containing genes.
Resumo:
The ability to carry out high-resolution genetic mapping at high throughput in the mouse is a critical rate-limiting step in the generation of genetically anchored contigs in physical mapping projects and the mapping of genetic loci for complex traits. To address this need, we have developed an efficient, high-resolution, large-scale genome mapping system. This system is based on the identification of polymorphic DNA sites between mouse strains by using interspersed repetitive sequence (IRS) PCR. Individual cloned IRS PCR products are hybridized to a DNA array of IRS PCR products derived from the DNA of individual mice segregating DNA sequences from the two parent strains. Since gel electrophoresis is not required, large numbers of samples can be genotyped in parallel. By using this approach, we have mapped > 450 polymorphic probes with filters containing the DNA of up to 517 backcross mice, potentially allowing resolution of 0.14 centimorgan. This approach also carries the potential for a high degree of efficiency in the integration of physical and genetic maps, since pooled DNAs representing libraries of yeast artificial chromosomes or other physical representations of the mouse genome can be addressed by hybridization of filter representations of the IRS PCR products of such libraries.
Resumo:
Sequence diversity in the coat protein coding region of Australian strains of Johnsongrass mosaic virus (JGMV) was investigated. Field isolates were sampled during a seven year period from Johnsongrass, sorghum and corn across the northern grain growing region. The 23 isolates were found to have greater than 94% nucleotide and amino acid sequence identity. The Australian isolates and two strains from the U.S.A. had about 90% nucleotide sequence identity and were between 19 and 30% different in the N-terminus of the coat protein. Two amino acid residues were found in the core region of the coat protein in isolates obtained from sorghum having the Krish gene for JGMV resistance that differed from those found in isolates from other hosts which did not have this single dominant resistance gene. These amino acid changes may have been responsible for overcoming the resistance conferred by the Krish gene for JGMV resistance in sorghum. The identification of these variable regions was essential for the development of durable pathogen-derived resistance to JGMV in sorghum.