931 resultados para Genome
Resumo:
For complex disease genetics research in human populations, remarkable progress has been made in recent times with the publication of a number of genome-wide association scans (GWAS) and subsequent statistical replications. These studies have identified new genes and pathways implicated in disease, many of which were not known before. Given these early successes, more GWAS are being conducted and planned, both for disease and quantitative phenotypes. Many researchers and clinicians have DNA samples available on collections of families, including both cases and controls. Twin registries around the world have facilitated the collection of large numbers of families, with DNA and multiple quantitative phenotypes collected on twin pairs and their relatives. In the design of a new GWAS with a fixed budget for the number of chips, the question arises whether to include or exclude related individuals. It is commonly believed to be preferable to use unrelated individuals in the first stage of a GWAS because relatives are 'over-matched' for genotypes. In this study, we quantify that for GWAS of a quantitative phenotype, relative to a sample of unrelated individuals surprisingly little power is lost when using relatives. The advantages of using relatives are manifold, including the ability to perform more quality control, the choice to perform within-family tests of association that are robust to population stratification, and the ability to perform joint linkage and association analysis. Therefore, the advantages of using relatives in GWAS for quantitative traits may well outweigh the small disadvantage in terms of statistical power.
Resumo:
Latent class analysis was performed on migraine symptom data collected in a Dutch population sample (N = 12,210, 59% female) in order to obtain empirical groupings of individuals suffering from symptoms of migraine headache. Based on these heritable groupings (h(2) = 0.49, 95% CI: 0.41-0.57) individuals were classified as affected (migrainous headache) or unaffected. Genome-wide linkage analysis was performed using genotype data from 105 families with at least 2 affected siblings. In addition to this primary phenotype, linkage analyses were performed for the individual migraine symptoms. Significance levels, corrected for the analysis of multiple traits, were determined empirically via a novel simulation approach. Suggestive linkage for migrainous headache was found on chromosomes 1 (LOD = 1.63; pointwise P = 0.0031), 13 (LOD = 1.63; P = 0.0031), and 20 (LOD = 1.85; P = 0.0018). Interestingly, the chromosome 1 peak was located close to the ATP1A2 gene, associated with familial hemiplegic migraine type 2 (FHM2). Individual symptom analysis produced a LOD score of 1.97 (P = 0.0013) on chromosome 5 (photo/phonophobia), a LOD score of 2.13 (P = 0.0009) on chromosome 10 (moderate/severe pain intensity) and a near significant LOD score of 3.31 (P = 0.00005) on chromosome 13 (pulsating headache). These peaks were all located near regions previously reported in migraine linkage studies. Our results provide important replication and support for the presence of migraine susceptibility genes within these regions, and further support the utility of an LCA-based phenotyping approach and analysis of individual symptoms in migraine genetic research. Additionally, our novel "2-step" analysis and simulation approach provides a powerful means to investigate linkage to individual trait components.
Resumo:
The propagation of herpesvirus genomes as infectious bacterial artificial chromosomes (iBAC) has enabled the application of highly efficient strategies to investigate gene function across the genome. One of these strategies, transposition, has been used successfully on a number of herpesvirus iBACs to generate libraries of gene disruption mutants. Gene deletion studies aimed at determining the dispensable gene repertoire of the Meleagrid herpesvirus 1 (MeHV-1) genome to enhance the utility of this virus as a vaccine vector have been conducted in this report. A MeHV-1 iBAC was used in combination with the Tn5 and MuA transposition systems in an attempt to generate MeHV-1 gene interruption libraries. However, these studies demonstrated that Tn5 transposition events into the MeHV-1 genome occurred at unexpectedly low frequencies. Furthermore, characterization of genomic locations of the rare Tn5 transposon insertion events indicated a nonrandom distribution within the viral genome, with seven of the 24 insertions occurring within the gene encoding infected cell protein 4. Although insertion events with the MuA system occurred at higher frequency compared with the Tn5 system, fewer insertion events were generated than has previously been reported with this system. The characterization and distribution of these MeHV-1 iBAC transposed mutants is discussed at both the nucleotide and genomic level, and the properties of the MeHV-1 genome that could influence transposition frequency are discussed. © American Association of Avian Pathologists.
Resumo:
Genotyping in DNA pools reduces the cost and the time required to complete large genotyping projects. The aim of the present study was to evaluate pooling as part of a strategy for fine mapping in regions of significant linkage. Thirty-nine single nucleotide polymorphisms (SNPs) were analyzed in two genomic DNA pools of 384 individuals each and results compared with data after typing all individuals used in the pools. There were no significant differences using data from either 2 or 8 heterozygous individuals to correct frequency estimates for unequal allelic amplification. After correction, the mean difference between estimates from the genomic pool and individual allele frequencies was .033. A major limitation of the use of DNA pools is the time and effort required to carefully adjust the concentration of each individual DNA sample before mixing aliquots. Pools were also constructed by combining DNA after Multiple Displacement Amplification (MDA). The MDA pools gave similar results to pools constructed after careful DNA quantitation (mean difference from individual genotyping .040) and MDA provides a rapid method to generate pools suitable for some applications. Pools provide a rapid and cost-effective screen to eliminate SNPs that are not polymorphic in a test population and can detect minor allele frequencies as low as 1% in the pooled samples. With current levels of accuracy, pooling is best suited to an initial screen in the SNP validation process that can provide high-throughput comparisons between cases and controls to prioritize SNPs for subsequent individual genotyping.
Resumo:
The silver gemfish Rexea solandri is an important economic resource but vulnerable to overfishing in Australian waters. The complete mitochondrial genome sequence is described from 1.6 million reads obtained via next generation sequencing. The total length of the mitogenome is 16,350 bp comprising 2 rRNA, 13 protein-coding genes, 22 tRNA and 2 non-coding regions. The mitogenome sequence was validated against sequences of PCR fragments and BLAST queries of Genbank. Gene order was equivalent to that found in marine fishes.
Resumo:
Background Next-generation sequencing technology is an important tool for the rapid, genome-wide identification of genetic variations. However, it is difficult to resolve the ‘signal’ of variations of interest and the ‘noise’ of stochastic sequencing and bioinformatic errors in the large datasets that are generated. We report a simple approach to identify regional linkage to a trait that requires only two pools of DNA to be sequenced from progeny of a defined genetic cross (i.e. bulk segregant analysis) at low coverage (<10×) and without parentage assignment of individual SNPs. The analysis relies on regional averaging of pooled SNP frequencies to rapidly scan polymorphisms across the genome for differential regional homozygosity, which is then displayed graphically. Results Progeny from defined genetic crosses of Tribolium castaneum (F4 and F19) segregating for the phosphine resistance trait were exposed to phosphine to select for the resistance trait while the remainders were left unexposed. Next generation sequencing was then carried out on the genomic DNA from each pool of selected and unselected insects from each generation. The reads were mapped against the annotated T. castaneum genome from NCBI (v3.0) and analysed for SNP variations. Since it is difficult to accurately call individual SNP frequencies when the depth of sequence coverage is low, variant frequencies were averaged across larger regions. Results from regional SNP frequency averaging identified two loci, tc_rph1 on chromosome 8 and tc_rph2 on chromosome 9, which together are responsible for high level resistance. Identification of the two loci was possible with only 5-7× average coverage of the genome per dataset. These loci were subsequently confirmed by direct SNP marker analysis and fine-scale mapping. Individually, homozygosity of tc_rph1 or tc_rph2 results in only weak resistance to phosphine (estimated at up to 1.5-2.5× and 3-5× respectively), whereas in combination they interact synergistically to provide a high-level resistance >200×. The tc_rph2 resistance allele resulted in a significant fitness cost relative to the wild type allele in unselected beetles over eighteen generations. Conclusion We have validated the technique of linkage mapping by low-coverage sequencing of progeny from a simple genetic cross. The approach relied on regional averaging of SNP frequencies and was used to successfully identify candidate gene loci for phosphine resistance in T. castaneum. This is a relatively simple and rapid approach to identifying genomic regions associated with traits in defined genetic crosses that does not require any specialised statistical analysis.
Resumo:
Sorghum is a food and feed cereal crop adapted to heat and drought and a staple for 500 million of the world’s poorest people. Its small diploid genome and phenotypic diversity make it an ideal C4 grass model as a complement to C3 rice. Here we present high coverage (16-45 × ) resequenced genomes of 44 sorghum lines representing the primary gene pool and spanning dimensions of geographic origin, end-use and taxonomic group. We also report the first resequenced genome of S. propinquum, identifying 8 M high-quality SNPs, 1.9 M indels and specific gene loss and gain events in S. bicolor. We observe strong racial structure and a complex domestication history involving at least two distinct domestication events. These assembled genomes enable the leveraging of existing cereal functional genomics data against the novel diversity available in sorghum, providing an unmatched resource for the genetic improvement of sorghum and other grass species.
Resumo:
The first complete genome sequence of capsicum chlorosis virus (CaCV) from Australia was determined using a combination of Illumina HiSeq RNA and Sanger sequencing technologies. Australian CaCV had a tripartite genome structure like other CaCV isolates. The large (L) RNA was 8913 nucleotides (nt) in length and contained a single open reading frame (ORF) of 8634 nt encoding a predicted RNA-dependent RNA polymerase (RdRp) in the viral-complementary (vc) sense. The medium (M) and small (S) RNA segments were 4846 and 3944 nt in length, respectively, each containing two non-overlapping ORFs in ambisense orientation, separated by intergenic regions (IGR). The M segment contained ORFs encoding the predicted non-structural movement protein (NSm; 927 nt) and precursor of glycoproteins (GP; 3366 nt) in the viral sense (v) and vc strand, respectively, separated by a 449-nt IGR. The S segment coded for the predicted nucleocapsid (N) protein (828 nt) and non-structural suppressor of silencing protein (NSs; 1320 nt) in the vc and v strand, respectively. The S RNA contained an IGR of 1663 nt, being the largest IGR of all CaCV isolates sequenced so far. Comparison of the Australian CaCV genome with complete CaCV genome sequences from other geographic regions showed highest sequence identity with a Taiwanese isolate. Genome sequence comparisons and phylogeny of all available CaCV isolates provided evidence for at least two highly diverged groups of CaCV isolates that may warrant re-classification of AIT-Thailand and CP-China isolates as unique tospoviruses, separate from CaCV.
Resumo:
Summary We have determined the full-length 14,491-nucleotide genome sequence of a new plant rhabdovirus, alfalfa dwarf virus (ADV). Seven open reading frames (ORFs) were identified in the antigenomic orientation of the negative-sense, single-stranded viral RNA, in the order 3′-N-P-P3-M-G-P6-L-5′. The ORFs are separated by conserved intergenic regions and the genome coding region is flanked by complementary 3′ leader and 5′ trailer sequences. Phylogenetic analysis of the nucleoprotein amino acid sequence indicated that this alfalfa-infecting rhabdovirus is related to viruses in the genus Cytorhabdovirus. When transiently expressed as GFP fusions in Nicotiana benthamiana leaves, most ADV proteins accumulated in the cell periphery, but unexpectedly P protein was localized exclusively in the nucleus. ADV P protein was shown to have a homotypic, and heterotypic nuclear interactions with N, P3 and M proteins by bimolecular fluorescence complementation. ADV appears unique in that it combines properties of both cytoplasmic and nuclear plant rhabdoviruses.
Resumo:
A limited number of plant rhabdovirus genomes have been fully sequenced, making taxonomic classification, evolutionary analysis and molecular characterization of this virus group difficult. We have for the first time determined the complete genome sequence of 13,188 nucleotides of Datura yellow vein nucleorhabdovirus (DYVV). DYVV genome organization resembles that of its closest relative, Sonchus yellow net virus (SYNV), with six ORFs in antigenomic orientation, separated by highly conserved intergenic regions and flanked by complementary 3′ leader and 5′ trailer sequences. As is typical for nucleorhabdoviruses, all viral proteins, except the glycoprotein, which is targeted to the endoplasmic reticulum, are localized to the nucleus. Nucleocapsid (N) protein, matrix (M) protein and polymerase, as components of nuclear viroplasms during replication, have predicted strong canonical nuclear localization signals, and N and M proteins exclusively localize to the nucleus when transiently expressed as GFP fusions. As in all nucleorhabdoviruses studied so far, N and phosphoprotein P interact when co-expressed, significantly increasing P nuclear localization in the presence of N protein. This research adds to the list of complete genomes of plant-infecting rhabdoviruses, provides molecular tools for further characterization and supports classification of DYVV as a nucleorhabdovirus closely related to but with some distinct differences from SYNV.
Resumo:
Brassica napus is one of the most important oil crops in the world, and stem rot caused by the fungus Sclerotinia sclerotiorum results in major losses in yield and quality. To elucidate resistance genes and pathogenesis-related genes, genome-wide association analysis of 347 accessions was performed using the Illumina 60K Brassica SNP (single nucleotide polymorphism) array. In addition, the detached stem inoculation assay was used to select five highly resistant (R) and susceptible (S) B. napus lines, 48 h postinoculation with S. sclerotiorum for transcriptome sequencing. We identified 17 significant associations for stem resistance on chromosomes A8 and C6, five of which were on A8 and 12 on C6. The SNPs identified on A8 were located in a 409-kb haplotype block, and those on C6 were consistent with previous QTL mapping efforts. Transcriptome analysis suggested that S. sclerotiorum infection activates the immune system, sulphur metabolism, especially glutathione (GSH) and glucosinolates in both R and S genotypes. Genes found to be specific to the R genotype related to the jasmonic acid pathway, lignin biosynthesis, defence response, signal transduction and encoding transcription factors. Twenty-four genes were identified in both the SNP-trait association and transcriptome sequencing analyses, including a tau class glutathione S-transferase (GSTU) gene cluster. This study provides useful insight into the molecular mechanisms underlying the plant's response to S. sclerotiorum.
Resumo:
Metabolism is the cellular subsystem responsible for generation of energy from nutrients and production of building blocks for larger macromolecules. Computational and statistical modeling of metabolism is vital to many disciplines including bioengineering, the study of diseases, drug target identification, and understanding the evolution of metabolism. In this thesis, we propose efficient computational methods for metabolic modeling. The techniques presented are targeted particularly at the analysis of large metabolic models encompassing the whole metabolism of one or several organisms. We concentrate on three major themes of metabolic modeling: metabolic pathway analysis, metabolic reconstruction and the study of evolution of metabolism. In the first part of this thesis, we study metabolic pathway analysis. We propose a novel modeling framework called gapless modeling to study biochemically viable metabolic networks and pathways. In addition, we investigate the utilization of atom-level information on metabolism to improve the quality of pathway analyses. We describe efficient algorithms for discovering both gapless and atom-level metabolic pathways, and conduct experiments with large-scale metabolic networks. The presented gapless approach offers a compromise in terms of complexity and feasibility between the previous graph-theoretic and stoichiometric approaches to metabolic modeling. Gapless pathway analysis shows that microbial metabolic networks are not as robust to random damage as suggested by previous studies. Furthermore the amino acid biosynthesis pathways of the fungal species Trichoderma reesei discovered from atom-level data are shown to closely correspond to those of Saccharomyces cerevisiae. In the second part, we propose computational methods for metabolic reconstruction in the gapless modeling framework. We study the task of reconstructing a metabolic network that does not suffer from connectivity problems. Such problems often limit the usability of reconstructed models, and typically require a significant amount of manual postprocessing. We formulate gapless metabolic reconstruction as an optimization problem and propose an efficient divide-and-conquer strategy to solve it with real-world instances. We also describe computational techniques for solving problems stemming from ambiguities in metabolite naming. These techniques have been implemented in a web-based sofware ReMatch intended for reconstruction of models for 13C metabolic flux analysis. In the third part, we extend our scope from single to multiple metabolic networks and propose an algorithm for inferring gapless metabolic networks of ancestral species from phylogenetic data. Experimenting with 16 fungal species, we show that the method is able to generate results that are easily interpretable and that provide hypotheses about the evolution of metabolism.
Resumo:
Sorghum (Sorghum bicolor) is one of the most important cereal crops globally and a potential energy plant for biofuel production. In order to explore genetic gain for a range of important quantitative traits, such as drought and heat tolerance, grain yield, stem sugar accumulation, and biomass production, via the use of molecular breeding and genomic selection strategies, knowledge of the available genetic variation and the underlying sequence polymorphisms, is required.