950 resultados para genomic sequence
Resumo:
The complete nucleotide sequence of Subterranean clover mottle virus (SCMoV) genomic RNA has been determined. The SCMoV genome is 4,258 nucleotides in length. It shares most nucleotide and amino acid sequence identity with the genome of Lucerne transient streak virus (LTSV). SCMoV RNA encodes four overlapping open reading frames and has a genome organisation similar to that of Cocksfoot mottle virus (CfMV). ORF1 and ORF4 are predicted to encode single proteins. ORF2 is predicted to encode two proteins that are derived from a -1 translational frameshift between two overlapping reading frames (ORF2a and ORF2b). A search of amino acid databases did not find a significant match for ORF1 and the function of this protein remains unclear. ORF2a contains a motif typical of chymotrypsin-like serine proteases and ORF2b has motifs characteristically present in positive-stranded RNA-dependent RNA polymerases. ORF4 is likely to be expressed from a subgenomic RNA and encodes the viral coat protein. The ORF2a/ORF2b overlapping gene expression strategy used by SCMoV and CfMV is similar to that of the poleroviruses and differ from that of other published sobemoviruses. These results suggest that the sobemoviruses could now be divided into two distinct subgroups based on those that express the RNA-dependent RNA polymerase from a single, in-frame polyprotein, and those that express it via a -1 translational frameshifting mechanism.
Resumo:
Subterranean clover stunt disease is an economically important aphid-borne virus disease affecting certain pasture and grain legumes in Australia. The virus associated with the disease, subterranean clover stunt virus (SCSV), was previously found to be representative of a new type of single-stranded DNA virus. Analysis of the virion DNA and restriction mapping of double-stranded cDNA synthesized from virion DNA suggested that SCSV has a segmented genome composed of 3 or 4 different species of circular ssDNA each of about 850-880 nucleotides. To further investigate the complexity of the SCSV genome, we have isolated the replicative form DNA from infected pea and from it prepared putative full-length clones representing the SCSV genome segments. Analysis of these clones by restriction mapping indicated that clones representing at least 4 distinct genomic segments were obtained. This method is thus suitable for generating an extensive genomic library of novel ssDNA viruses containing multiple genome segments such as SCSV and banana bunchy top virus. The N-terminal amino acid sequence and amino acid composition of the coat protein of SCSV were determined. Comparison of the amino acid sequence with partial DNA sequence data, and the distinctly different restriction maps obtained for the full-length clones suggested that only one of these clones contained the coat protein gene. The results confirmed that SCSV has a functionally divided genome composed of several distinct ssDNA circles each of about 1 kb.
Resumo:
Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.
Resumo:
This article documents the public availability of (i) transcriptome sequence data, assembled and annotated contigs and unigenes, and BLAST hits from the Queensland fruit fly, Bactrocera tryoni; (ii) 75 single-nucleotide variants (SNVs) from 454 sequencing of reduced representation libraries for Phalangiidae harvestmen, Megabunus armatus, Megabunus vignai, Megabunus lesserti, and Rilaena triangularis; and (iii) expressed sequence tags from 454 sequencing of the lepidopterans Lymantria dispar and Lymantria monacha.
Resumo:
Over the past decade the mitochondrial (mt) genome has become the most widely used genomic resource available for systematic entomology. While the availability of other types of ‘–omics’ data – in particular transcriptomes – is increasing rapidly, mt genomes are still vastly cheaper to sequence and are far less demanding of high quality templates. Furthermore, almost all other ‘–omics’ approaches also sequence the mt genome, and so it can form a bridge between legacy and contemporary datasets. Mitochondrial genomes have now been sequenced for all insect orders, and in many instances representatives of each major lineage within orders (suborders, series or superfamilies depending on the group). They have also been applied to systematic questions at all taxonomic scales from resolving interordinal relationships (e.g. Cameron et al., 2009; Wan et al., 2012; Wang et al., 2012), through many intraordinal (e.g. Dowton et al., 2009; Timmermans et al., 2010; Zhao et al. 2013a) and family-level studies (e.g. Nelson et al., 2012; Zhao et al., 2013b) to population/biogeographic studies (e.g. Ma et al., 2012). Methodological issues around the use of mt genomes in insect phylogenetic analyses and the empirical results found to date have recently been reviewed by Cameron (2014), yet the technical aspects of sequencing and annotating mt genomes were not covered. Most papers which generate new mt genome report their methods in a simplified form which can be difficult to replicate without specific knowledge of the field. Published studies utilize a sufficiently wide range of approaches, usually without justification for the one chosen, that confusion about commonly used jargon such as ‘long PCR’ and ‘primer walking’ could be a serious barrier to entry. Furthermore, sequenced mt genomes have been annotated (gene locations defined) to wildly varying standards and improving data quality through consistent annotation procedures will benefit all downstream users of these datasets. The aims of this review are therefore to: 1. Describe in detail the various sequencing methods used on insect mt genomes; 2. Explore the strengths/weakness of different approaches; 3. Outline the procedures and software used for insect mt genome annotation, and; 4. Highlight quality control steps used for new annotations, and to improve the re-annotation of previously sequenced mt genomes used in systematic or comparative research.
Resumo:
Escherichia coli ST131 is now recognised as a leading contributor to urinary tract and bloodstream infections in both community and clinical settings. Here we present the complete, annotated genome of E. coli EC958, which was isolated from the urine of a patient presenting with a urinary tract infection in the Northwest region of England and represents the most well characterised ST131 strain. Sequencing was carried out using the Pacific Biosciences platform, which provided sufficient depth and read-length to produce a complete genome without the need for other technologies. The discovery of spurious contigs within the assembly that correspond to site-specific inversions in the tail fibre regions of prophages demonstrates the potential for this technology to reveal dynamic evolutionary mechanisms. E. coli EC958 belongs to the major subgroup of ST131 strains that produce the CTX-M-15 extended spectrum β-lactamase, are fluoroquinolone resistant and encode the fimH30 type 1 fimbrial adhesin. This subgroup includes the Indian strain NA114 and the North American strain JJ1886. A comparison of the genomes of EC958, JJ1886 and NA114 revealed that differences in the arrangement of genomic islands, prophages and other repetitive elements in the NA114 genome are not biologically relevant and are due to misassembly. The availability of a high quality uropathogenic E. coli ST131 genome provides a reference for understanding this multidrug resistant pathogen and will facilitate novel functional, comparative and clinical studies of the E. coli ST131 clonal lineage.
Resumo:
Single nucleotide polymorphisms (SNPs) are widely acknowledged as the marker of choice for many genetic and genomic applications because they show co-dominant inheritance, are highly abundant across genomes and are suitable for high-throughput genotyping. Here we evaluated the applicability of SNP markers developed from Crassostrea gigas and C. virginica expressed sequence tags (ESTs) in closely related Crassostrea and Ostrea species. A total of 213 putative interspecific level SNPs were identified from re-sequencing data in six amplicons, yielding on average of one interspecific level SNP per seven bp. High polymorphism levels were observed and the high success rate of transferability show that genic EST-derived SNP markers provide an efficient method for rapid marker development and SNP discovery in closely related oyster species. The six EST-SNP markers identified here will provide useful molecular tools for addressing questions in molecular ecology and evolution studies including for stock analysis (pedigree monitoring) in related oyster taxa.
Resumo:
Striped catfish (Pangasianodon hypophthalmus) is a commercially important freshwater fish used in inland aquaculture in the Mekong Delta, Vietnam. The culture industry is facing a significant challenge however from saltwater intrusion into many low topographical coastal provinces across the Mekong Delta as a result of predicted climate change impacts. Developing genomic resources for this species can facilitate the production of improved culture lines that can withstand raised salinity conditions, and so we have applied high-throughput Ion Torrent sequencing of transcriptome libraries from six target osmoregulatory organs from striped catfish as a genomic resource for use in future selection strategies. We obtained 12,177,770 reads after trimming and processing with an average length of 97 bp. De novo assemblies were generated using CLC Genomic Workbench, Trinity and Velvet/Oases with the best overall contig performance resulting from the CLC assembly. De novo assembly using CLC yielded 66,451 contigs with an average length of 478 bp and N50 length of 506 bp. A total of 37,969 contigs (57%) possessed significant similarity with proteins in the non-redundant database. Comparative analyses revealed that a significant number of contigs matched sequences reported in other teleost fishes, ranging in similarity from 45.2% with Atlantic cod to 52% with zebrafish. In addition, 28,879 simple sequence repeats (SSRs) and 55,721 single nucleotide polymorphisms (SNPs) were detected in the striped catfish transcriptome. The sequence collection generated in the current study represents the most comprehensive genomic resource for P. hypophthalmus available to date. Our results illustrate the utility of next-generation sequencing as an efficient tool for constructing a large genomic database for marker development in non-model species.
Resumo:
Chlamydia pneumoniae is an obligate intracellular bacterium implicated in a wide range of human diseases including atherosclerosis and Alzheimer's disease. Efforts to understand the relationships between C. pneumoniae detected in these diseases have been hindered by the availability of sequence data for non-respiratory strains. In this study, we sequenced the whole genomes for C. pneumoniae isolates from atherosclerosis and Alzheimer's disease, and compared these to previously published C. pneumoniae genomes. Phylogenetic analyses of these new C. pneumoniae strains indicate two sub-groups within human C. pneumoniae, and suggest that both recombination and mutation events have driven the evolution of human C. pneumoniae. Further fine-detailed analyses of these new C. pneumoniae sequences show several genetically variable loci. This suggests that similar strains of C. pneumoniae are found in the brain, lungs and cardiovascular system and that only minor genetic differences may contribute to the adaptation of particular strains in human disease.
Resumo:
We completed the genome sequence of Lettuce necrotic yellows virus (LNYV) by determining the nucleotide sequences of the 4a (putative phosphoprotein), 4b, M (matrix protein), G (glycoprotein) and L (polymerase) genes. The genome consists of 12,807 nucleotides and encodes six genes in the order 3′ leader-N-4a(P)-4b-M-G-L-5′ trailer. Sequences were derived from clones of a cDNA library from LNYV genomic RNA and from fragments amplified using reverse transcription-polymerase chain reaction. The 4a protein has a low isoelectric point characteristic for rhabdovirus phosphoproteins. The 4b protein has significant sequence similarities with the movement proteins of capillo- and trichoviruses and may be involved in cell-to-cell movement. The putative G protein sequence contains a predicted 25 amino acids signal peptide and endopeptidase cleavage site, three predicted glycosylation sites and a putative transmembrane domain. The deduced L protein sequence shows similarities with the L proteins of other plant rhabdoviruses and contains polymerase module motifs characteristic for RNA-dependent RNA polymerases of negative-strand RNA viruses. Phylogenetic analysis of this motif among rhabdoviruses placed LNYV in a group with other sequenced cytorhabdoviruses, most closely related to Strawberry crinkle virus.
Resumo:
Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated approximately 2,000, approximately 3,700 and approximately 9,500 SNPs explained approximately 21%, approximately 24% and approximately 29% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/beta-catenin and chondroitin sulfate-related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.
Resumo:
The number of genetic factors associated with common human traits and disease is increasing rapidly, and the general public is utilizing affordable, direct-to-consumer genetic tests. The results of these tests are often in the public domain. A combination of factors has increased the potential for the indirect estimation of an individual's risk for a particular trait. Here we explain the basic principals underlying risk estimation which allowed us to test the ability to make an indirect risk estimation from genetic data by imputing Dr. James Watson's redacted apolipoprotein E gene (APOE) information. The principles underlying risk prediction from genetic data have been well known and applied for many decades, however, the recent increase in genomic knowledge, and advances in mathematical and statistical techniques and computational power, make it relatively easy to make an accurate but indirect estimation of risk. There is a current hazard for indirect risk estimation that is relevant not only to the subject but also to individuals related to the subject; this risk will likely increase as more detailed genomic data and better computational tools become available.
Resumo:
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Resumo:
Multidrug-resistant Escherichia colt sequence type 131 (51131) has recently emerged as a globally distributed cause of extraintestinal infections in humans. Diverse factors have been investigated as explanations for ST131's rapid and successful dissemination, including transmission through animal contact and consumption of food, as suggested by the detection of ST131 in a number of nonhuman species. For example, ST131 has recently been identified as a cause of clinical infection in companion animals and poultry, and both host groups have been confirmed as faecal carriers of ST131. Moreover, a high degree of similarity has been shown among certain ST131 isolates from humans, companion animals, and poultry based on resistance characteristics and genomic background and human and companion animal ST131 isolates tend to exhibit similar virulence genotypes. However, most ST131 isolates from poultry appear to possess specific virulence genes that are typically absent from human and companion animal isolates, including genes associated with avian pathogenic E. coli. Since the number of reported animal and food-associated ST131 isolates is quite small, the role of nonhuman host species in the emergence, dissemination, and transmission of ST131 to humans remains unclear. Nevertheless, given the profound public health importance of the emergent ST131 clonal group, even the limited available evidence indicates a pressing need for further careful study of this significant question.
Resumo:
Expressed sequence tag (EST) databases provide a primary source of nuclear DNA sequences for genetic marker development in non-model organisms. To date, the process has been relatively inefficient for several reasons: - 1) priming site polymorphism in the template leads to inferior or erratic amplification; - 2) introns in the target amplicon are too large and/or numerous to allow effective amplification under standard screening conditions, and; - 3) at least occasionally, a PCR primer straddles an exon–intron junction and is unable to bind to genomic DNA template. The first is only a minor issue for species or strains with low heterozygosity but becomes a significant problem for species with high genomic variation, such as marine organisms with extremely large effective population sizes. Problems arising from unanticipated introns are unavoidable but are most pronounced in intron-rich species, such as vertebrates and lophotrochozoans. We present an approach to marker development in the Pacific oyster Crassostrea gigas, a highly polymorphic and intron-rich species, which minimizes these problems, and should be applicable to other non-model species for which EST databases are available. Placement of PCR primers in the 3′ end of coding sequence and 3′ UTR improved PCR success rate from 51% to 97%. Almost all (37 of 39) markers developed for the Pacific oyster were polymorphic in a small test panel of wild and domesticated oysters.