304 resultados para Genomic selection
em Université de Lausanne, Switzerland
Resumo:
Nowadays, genome-wide association studies (GWAS) and genomic selection (GS) methods which use genome-wide marker data for phenotype prediction are of much potential interest in plant breeding. However, to our knowledge, no studies have been performed yet on the predictive ability of these methods for structured traits when using training populations with high levels of genetic diversity. Such an example of a highly heterozygous, perennial species is grapevine. The present study compares the accuracy of models based on GWAS or GS alone, or in combination, for predicting simple or complex traits, linked or not with population structure. In order to explore the relevance of these methods in this context, we performed simulations using approx 90,000 SNPs on a population of 3,000 individuals structured into three groups and corresponding to published diversity grapevine data. To estimate the parameters of the prediction models, we defined four training populations of 1,000 individuals, corresponding to these three groups and a core collection. Finally, to estimate the accuracy of the models, we also simulated four breeding populations of 200 individuals. Although prediction accuracy was low when breeding populations were too distant from the training populations, high accuracy levels were obtained using the sole core-collection as training population. The highest prediction accuracy was obtained (up to 0.9) using the combined GWAS-GS model. We thus recommend using the combined prediction model and a core-collection as training population for grapevine breeding or for other important economic crops with the same characteristics.
Resumo:
The limited ability of common variants to account for the genetic contribution to complex disease has prompted searches for rare variants of large effect, to partly explain the 'missing heritability'. Analyses of genome-wide genotyping data have identified genomic structural variants (GSVs) as a source of such rare causal variants. Recent studies have reported multiple GSV loci associated with risk of obesity. We attempted to replicate these associations by similar analysis of two familial-obesity case-control cohorts and a population cohort, and detected GSVs at 11 out of 18 loci, at frequencies similar to those previously reported. Based on their reported frequencies and effect sizes (OR≥25), we had sufficient statistical power to detect the large majority (80%) of genuine associations at these loci. However, only one obesity association was replicated. Deletion of a 220 kb region on chromosome 16p11.2 has a carrier population frequency of 2×10(-4) (95% confidence interval [9.6×10(-5)-3.1×10(-4)]); accounts overall for 0.5% [0.19%-0.82%] of severe childhood obesity cases (P = 3.8×10(-10); odds ratio = 25.0 [9.9-60.6]); and results in a mean body mass index (BMI) increase of 5.8 kg.m(-2) [1.8-10.3] in adults from the general population. We also attempted replication using BMI as a quantitative trait in our population cohort; associations with BMI at or near nominal significance were detected at two further loci near KIF2B and within FOXP2, but these did not survive correction for multiple testing. These findings emphasise several issues of importance when conducting rare GSV association, including the need for careful cohort selection and replication strategy, accurate GSV identification, and appropriate correction for multiple testing and/or control of false discovery rate. Moreover, they highlight the potential difficulty in replicating rare CNV associations across different populations. Nevertheless, we show that such studies are potentially valuable for the identification of variants making an appreciable contribution to complex disease.
Resumo:
Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
Resumo:
An essential step of the life cycle of retroviruses is the stable insertion of a copy of their DNA genome into the host cell genome, and lentiviruses are no exception. This integration step, catalyzed by the viral-encoded integrase, ensures long-term expression of the viral genes, thus allowing a productive viral replication and rendering retroviral vectors also attractive for the field of gene therapy. At the same time, this ability to integrate into the host genome raises safety concerns regarding the use of retroviral-based gene therapy vectors, due to the genomic locations of integration sites. The availability of the human genome sequence made possible the analysis of the integration site preferences, which revealed to be nonrandom and retrovirus-specific, i.e. all lentiviruses studied so far favor integration in active transcription units, while other retroviruses have a different integration site distribution. Several mechanisms have been proposed that may influence integration targeting, which include (i) chromatin accessibility, (ii) cell cycle effects, and (iii) tethering proteins. Recent data provide evidence that integration site selection can occur via a tethering mechanism, through the recruitment of the lentiviral integrase by the cellular LEDGF/p75 protein, both proteins being the two major players in lentiviral integration targeting.
Resumo:
Understanding the genomic basis of evolutionary adaptation requires insight into the molecular basis underlying phenotypic variation. However, even changes in molecular pathways associated with extreme variation, gains and losses of specific phenotypes, remain largely uncharacterized. Here, we investigate the large interspecific differences in the ability to survive infection by parasitoids across 11 Drosophila species and identify genomic changes associated with gains and losses of parasitoid resistance. We show that a cellular immune defense, encapsulation, and the production of a specialized blood cell, lamellocytes, are restricted to a sublineage of Drosophila, but that encapsulation is absent in one species of this sublineage, Drosophila sechellia. Our comparative analyses of hemopoiesis pathway genes and of genes differentially expressed during the encapsulation response revealed that hemopoiesis-associated genes are highly conserved and present in all species independently of their resistance. In contrast, 11 genes that are differentially expressed during the response to parasitoids are novel genes, specific to the Drosophila sublineage capable of lamellocyte-mediated encapsulation. These novel genes, which are predominantly expressed in hemocytes, arose via duplications, whereby five of them also showed signatures of positive selection, as expected if they were recruited for new functions. Three of these novel genes further showed large-scale and presumably loss-of-function sequence changes in D. sechellia, consistent with the loss of resistance in this species. In combination, these convergent lines of evidence suggest that co-option of duplicated genes in existing pathways and subsequent neofunctionalization are likely to have contributed to the evolution of the lamellocyte-mediated encapsulation in Drosophila.
Resumo:
The stable insertion of a copy of their genome into the host cell genome is an essential step of the life cycle of retroviruses. The site of viral DNA integration, mediated by the viral-encoded integrase enzyme, has important consequences for both the virus and the host cell. The analysis of retroviral integration site distribution was facilitated by the availability of the human genome sequence, revealing the non-random feature of integration site selection and identifying different favored and disfavored genomic locations for individual retroviruses. This review will summarize the current knowledge about retroviral differences in their integration site preferences as well as the mechanisms involved in this process.
Resumo:
Aleppo pine (Pinus halepensis Mill.) is a relevant conifer species for studying adaptive responses to drought and fire regimes in the Mediterranean region. In this study, we performed Illumina next-generation sequencing of two phenotypically divergent Aleppo pine accessions with the aims of (i) characterizing the transcriptome through Illumina RNA-Seq on trees phenotypically divergent for adaptive traits linked to fire adaptation and drought, (ii) performing a functional annotation of the assembled transcriptome, (iii) identifying genes with accelerated evolutionary rates, (iv) studying the expression levels of the annotated genes and (v) developing gene-based markers for population genomic and association genetic studies. The assembled transcriptome consisted of 48,629 contigs and covered about 54.6 Mbp. The comparison of Aleppo pine transcripts to Picea sitchensis protein-coding sequences resulted in the detection of 34,014 SNPs across species, with a Ka /Ks average value of 0.216, suggesting that the majority of the assembled genes are under negative selection. Several genes were differentially expressed across the two pine accessions with contrasted phenotypes, including a glutathione-s-transferase, a cellulose synthase and a cobra-like protein. A large number of new markers (3334 amplifiable SSRs and 28,236 SNPs) have been identified which should facilitate future population genomics and association genetics in this species. A 384-SNP Oligo Pool Assay for genotyping with the Illumina VeraCode technology has been designed which showed an high overall SNP conversion rate (76.6%). Our results showed that Illumina next-generation sequencing is a valuable technology to obtain an extensive overview on whole transcriptomes of nonmodel species with large genomes.
Resumo:
The use of molecular data to reconstruct the history of divergence and gene flow between populations of closely related taxa represents a challenging problem. It has been proposed that the long-standing debate about the geography of speciation can be resolved by comparing the likelihoods of a model of isolation with migration and a model of secondary contact. However, data are commonly only fit to a model of isolation with migration and rarely tested against the secondary contact alternative. Furthermore, most demographic inference methods have neglected variation in introgression rates and assume that the gene flow parameter (Nm) is similar among loci. Here, we show that neglecting this source of variation can give misleading results. We analysed DNA sequences sampled from populations of the marine mussels, Mytilus edulis and M. galloprovincialis, across a well-studied mosaic hybrid zone in Europe and evaluated various scenarios of speciation, with or without variation in introgression rates, using an Approximate Bayesian Computation (ABC) approach. Models with heterogeneous gene flow across loci always outperformed models assuming equal migration rates irrespective of the history of gene flow being considered. By incorporating this heterogeneity, the best-supported scenario was a long period of allopatric isolation during the first three-quarters of the time since divergence followed by secondary contact and introgression during the last quarter. By contrast, constraining migration to be homogeneous failed to discriminate among any of the different models of gene flow tested. Our simulations thus provide statistical support for the secondary contact scenario in the European Mytilus hybrid zone that the standard coalescent approach failed to confirm. Our results demonstrate that genomic variation in introgression rates can have profound impacts on the biological conclusions drawn from inference methods and needs to be incorporated in future studies.
Resumo:
Natural selection can drive the repeated evolution of reproductive isolation, but the genomic basis of parallel speciation remains poorly understood. We analyzed whole-genome divergence between replicate pairs of stick insect populations that are adapted to different host plants and undergoing parallel speciation. We found thousands of modest-sized genomic regions of accentuated divergence between populations, most of which are unique to individual population pairs. We also detected parallel genomic divergence across population pairs involving an excess of coding genes with specific molecular functions. Regions of parallel genomic divergence in nature exhibited exceptional allele frequency changes between hosts in a field transplant experiment. The results advance understanding of biological diversification by providing convergent observational and experimental evidence for selection's role in driving repeatable genomic divergence.
Resumo:
The transformer (tra) gene is a key regulator in the signalling hierarchy controlling all aspects of somatic sexual differentiation in Drosophila and other insects. Here, we show that six of the seven sequenced ants have two copies of tra. Surprisingly, the two paralogues are always more similar within species than among species. Comparative sequence analyses indicate that this pattern is owing to the ongoing concerted evolution after an ancestral duplication rather than independent duplications in each of the six species. In particular, there was strong support for inter-locus recombination between the paralogues of the ant Atta cephalotes. In the five species where the location of paralogues is known, they are adjacent to each other in four cases and separated by only few genes in the fifth case. Because there have been extensive genomic rearrangements in these lineages, this suggests selection acting to conserve their synteny. In three species, we also find a signature of positive selection in one of the paralogues. In three bee species where information is available, the tra gene is also duplicated, the copies are adjacent and in at least one species there was recombination between paralogues. These results suggest that concerted evolution plays an adaptive role in the evolution of this gene family.
Resumo:
The human Me14-D12 antigen is a cell surface glycoprotein regulated by interferon-gamma (IFN-gamma) on tumor cell lines of neuroectodermal origin. It consists of two non-convalently linked subunits with apparent mol. wt sizes of 33,000 and 38,000. Here we describe the molecular cloning of a genomic probe for the Me14-D12 gene using the gene transfer approach. Mouse Ltk- cells were stably cotransfected with human genomic DNA and the Herpes Simplex virus thymidine kinase (TK) gene. Primary and secondary transfectants expressing the Me14-D12 antigen were isolated after selection in HAT medium by repeated sorting on a fluorescence activated cell sorter (FACS). A recombinant phage harboring a 14.3 kb insert of human DNA was isolated from a genomic library made from a positive secondary transfectant cell line. A specific probe derived from the phage DNA insert allowed the identification of two mRNAs of 3.5 kb and 2.2 kb in primary and secondary L cell transfectants, as well as in human melanoma cell lines expressing the Me14-D12 antigen. The regulation of Me14-D12 antigen by INF-gamma was retained in the L cell transfectants and could be detected both at the level of protein and mRNA expression.
Resumo:
A recent randomized EORTC phase III trial, comparing two doses of imatinib in patients with advanced gastrointestinal stromal tumours (GISTs), reported dose dependency for progression-free survival. The current analysis of that study aimed to assess if tumour mutational status correlates with clinical response to imatinib. Pre-treatment samples of GISTs from 377 patients enrolled in phase III study were analyzed for mutations of KIT or PDGFRA by combination of D-HPLC and direct sequencing of tumour genomic DNA. Mutation types were correlated with patients' survival data. The presence of exon 9-activating mutations in KIT was the strongest adverse prognostic factor for response to imatinib, increasing the relative risk of progression by 171% (P<0.0001) and the relative risk of death by 190% (P<0.0001) when compared with KIT exon 11 mutants. Similarly, the relative risk of progression was increased by 108% (P<0.0001) and the relative risk of death by 76% (P=0.028) in patients without detectable KIT or PDGFRA mutations. In patients whose tumours expressed an exon 9 KIT oncoprotein, treatment with the high-dose regimen resulted in a significantly superior progression-free survival (P=0.0013), with a reduction of the relative risk of 61%. We conclude that tumour genotype is of major prognostic significance for progression-free survival and overall survival in patients treated with imatinib for advanced GISTs. Our findings suggest the need for differential treatment of patients with GISTs, with KIT exon 9 mutant patients benefiting the most from the 800 mg daily dose of the drug.
Resumo:
The integration of the Human Immunodeficiency Virus (HIV) genetic information into the host genome is fundamental for its replication and long-term persistence in the host. Isolating and characterizing the integration sites can be useful for obtaining data such as identifying the specific genomic location of integration or understanding the forces dictating HIV integration site selection. The methods outlined in this article describe a highly efficient and precise technique for identifying HIV integration sites in the host genome on a small scale using molecular cloning techniques and standard sequencing or on a massive scale using 454 pyrosequencing.
Resumo:
How phenomena like helping, dispersal, or the sex ratio evolve depends critically on demographic and life-history factors. One phenotype that is of particular interest to biologists is genomic imprinting, which results in parent-of-origin-specific gene expression and thus deviates from the predictions of Mendel's rules. The most prominent explanation for the evolution of genomic imprinting, the kinship theory, originally specified that multiple paternity can cause the evolution of imprinting when offspring affect maternal resource provisioning. Most models of the kinship theory do not detail how population subdivision, demography, and life history affect the evolution of imprinting. In this work, we embed the classic kinship theory within an island model of population structure and allow for diverse demographic and life-history features to affect the direction of selection on imprinting. We find that population structure does not change how multiple paternity affects the evolution of imprinting under the classic kinship theory. However, if the degree of multiple paternity is not too large, we find that sex-specific migration and survival and generation overlap are the primary factors determining which allele is silenced. This indicates that imprinting can evolve purely as a result of sex-related asymmetries in the demographic structure or life history of a species.
Resumo:
BACKGROUND: Known antiretroviral restriction factors are encoded by genes that are under positive selection pressure, induced during HIV-1 infection, up-regulated by interferons, and/or interact with viral proteins. To identify potential novel restriction factors, we performed genome-wide scans for human genes sharing molecular and evolutionary signatures of known restriction factors and tested the anti-HIV-1 activity of the most promising candidates. RESULTS: Our analyses identified 30 human genes that share characteristics of known restriction factors. Functional analyses of 27 of these candidates showed that over-expression of a strikingly high proportion of them significantly inhibited HIV-1 without causing cytotoxic effects. Five factors (APOL1, APOL6, CD164, TNFRSF10A, TNFRSF10D) suppressed infectious HIV-1 production in transfected 293T cells by >90% and six additional candidates (FCGR3A, CD3E, OAS1, GBP5, SPN, IFI16) achieved this when the virus was lacking intact accessory vpr, vpu and nef genes. Unexpectedly, over-expression of two factors (IL1A, SP110) significantly increased infectious HIV-1 production. Mechanistic studies suggest that the newly identified potential restriction factors act at different steps of the viral replication cycle, including proviral transcription and production of viral proteins. Finally, we confirmed that mRNA expression of most of these candidate restriction factors in primary CD4+ T cells is significantly increased by type I interferons. CONCLUSIONS: A limited number of human genes share multiple characteristics of genes encoding for known restriction factors. Most of them display anti-retroviral activity in transient transfection assays and are expressed in primary CD4+ T cells.