879 resultados para next-generation sequencing
Resumo:
Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Resumo:
Marine mammals are often reported to possess reduced variation of major histocompatibility complex (MHC) genes compared with their terrestrial counterparts. We evaluated diversity at two MHC class II B genes, DQB and DRB, in the New Zealand sea lion (Phocarctos hookeri, NZSL) a species that has suffered high mortality owing to bacterial epizootics, using Sanger sequencing and haplotype reconstruction, together with next-generation sequencing. Despite this species' prolonged history of small population size and highly restricted distribution, we demonstrate extensive diversity at MHC DRB with 26 alleles, whereas MHC DQB is dimorphic. We identify four DRB codons, predicted to be involved in antigen binding, that are evolving under adaptive evolution. Our data suggest diversity at DRB may be maintained by balancing selection, consistent with the role of this locus as an antigen-binding region and the species' recent history of mass mortality during a series of bacterial epizootics. Phylogenetic analyses of DQB and DRB sequences from pinnipeds and other carnivores revealed significant allelic diversity, but little phylogenetic depth or structure among pinniped alleles; thus, we could neither confirm nor refute the possibility of trans-species polymorphism in this group. The phylogenetic pattern observed however, suggests some significant evolutionary constraint on these loci in the recent past, with the pattern consistent with that expected following an epizootic event. These data may help further elucidate some of the genetic factors underlying the unusually high susceptibility to bacterial infection of the threatened NZSL, and help us to better understand the extent and pattern of MHC diversity in pinnipeds.
Resumo:
Dengue virus (DENV) infections represent a significant concern for public health worldwide, being considered as the most prevalent arthropod-borne virus regarding the number of reported cases. In this study, we report the complete genome sequencing of a DENV serotype 4 isolate, genotype II, obtained in the city of Manaus, directly from the serum sample, applying Ion Torrent sequencing technology. The use of a massive sequencing technology allowed the detection of two variable sites, one in the coding region for the viral envelope protein and the other in the nonstructural 1 coding region within viral populations.
Resumo:
Aleppo pine (Pinus halepensis Mill.) is a relevant conifer species for studying adaptive responses to drought and fire regimes in the Mediterranean region. In this study, we performed Illumina next-generation sequencing of two phenotypically divergent Aleppo pine accessions with the aims of (i) characterizing the transcriptome through Illumina RNA-Seq on trees phenotypically divergent for adaptive traits linked to fire adaptation and drought, (ii) performing a functional annotation of the assembled transcriptome, (iii) identifying genes with accelerated evolutionary rates, (iv) studying the expression levels of the annotated genes and (v) developing gene-based markers for population genomic and association genetic studies. The assembled transcriptome consisted of 48,629 contigs and covered about 54.6 Mbp. The comparison of Aleppo pine transcripts to Picea sitchensis protein-coding sequences resulted in the detection of 34,014 SNPs across species, with a Ka /Ks average value of 0.216, suggesting that the majority of the assembled genes are under negative selection. Several genes were differentially expressed across the two pine accessions with contrasted phenotypes, including a glutathione-s-transferase, a cellulose synthase and a cobra-like protein. A large number of new markers (3334 amplifiable SSRs and 28,236 SNPs) have been identified which should facilitate future population genomics and association genetics in this species. A 384-SNP Oligo Pool Assay for genotyping with the Illumina VeraCode technology has been designed which showed an high overall SNP conversion rate (76.6%). Our results showed that Illumina next-generation sequencing is a valuable technology to obtain an extensive overview on whole transcriptomes of nonmodel species with large genomes.
Resumo:
The recent advance in high-throughput sequencing and genotyping protocols allows rapid investigation of Mendelian and complex diseases on a scale not previously been possible. In my thesis research I took advantage of these modern techniques to study retinitis pigmentosa (RP), a rare inherited disease characterized by progressive loss of photoreceptors and leading to blindness; and hypertension, a common condition affecting 30% of the adult population. Firstly, I compared the performance of different next generation sequencing (NGS) platforms in the sequencing of the RP-linked gene PRPF31. The gene contained a mutation in an intronic repetitive element, which presented difficulties for both classic sequencing methods and NGS. We showed that all NGS platforms are powerful tools to identify rare and common DNA variants, also in case of more complex sequences. Moreover, we evaluated the features of different NGS platforms that are important in re-sequencing projects. The main focus of my thesis was then to investigate the involvement of pre-mRNA splicing factors in autosomal dominant RP (adRP). I screened 5 candidate genes in a large cohort of patients by using long-range PCR as enrichment step, followed by NGS. We tested two different approaches: in one, all target PCRs from all patients were pooled and sequenced as a single DNA library; in the other, PCRs from each patient were separated within the pool by DNA barcodes. The first solution was more cost-effective, while the second one allowed obtaining faster and more accurate results, but overall they both proved to be effective strategies for gene screenings in many samples. We could in fact identify novel missense mutations in the SNRNP200 gene, encoding an essential RNA helicase for splicing catalysis. Interestingly, one of these mutations showed incomplete penetrance in one family with adRP. Thus, we started to study the possible molecular causes underlying phenotypic differences between asymptomatic and affected members of this family. For the study of hypertension, I joined a European consortium to perform genome-wide association studies (GWAS). Thanks to the use of very informative genotyping arrays and of phenotipically well-characterized cohorts, we could identify a novel susceptibility locus for hypertension in the promoter region of the endothelial nitric oxide synthase gene (NOS3). Moreover, we have proven the direct causality of the associated SNP using three different methods: 1) targeted resequencing, 2) luciferase assay, and 3) population study. - Le récent progrès dans le Séquençage à haut Débit et les protocoles de génotypage a permis une plus vaste et rapide étude des maladies mendéliennes et multifactorielles à une échelle encore jamais atteinte. Durant ma thèse de recherche, j'ai utilisé ces nouvelles techniques de séquençage afin d'étudier la retinite pigmentale (RP), une maladie héréditaire rare caractérisée par une perte progressive des photorécepteurs de l'oeil qui entraine la cécité; et l'hypertension, une maladie commune touchant 30% de la population adulte. Tout d'abord, j'ai effectué une comparaison des performances de différentes plateformes de séquençage NGS (Next Generation Sequencing) lors du séquençage de PRPF31, un gène lié à RP. Ce gène contenait une mutation dans un élément répétable intronique, qui présentait des difficultés de séquençage avec la méthode classique et les NGS. Nous avons montré que les plateformes de NGS analysées sont des outils très puissants pour identifier des variations de l'ADN rares ou communes et aussi dans le cas de séquences complexes. De plus, nous avons exploré les caractéristiques des différentes plateformes NGS qui sont importantes dans les projets de re-séquençage. L'objectif principal de ma thèse a été ensuite d'examiner l'effet des facteurs d'épissage de pre-ARNm dans une forme autosomale dominante de RP (adRP). Un screening de 5 gènes candidats issus d'une large cohorte de patients a été effectué en utilisant la long-range PCR comme étape d'enrichissement, suivie par séquençage avec NGS. Nous avons testé deux approches différentes : dans la première, toutes les cibles PCRs de tous les patients ont été regroupées et séquencées comme une bibliothèque d'ADN unique; dans la seconde, les PCRs de chaque patient ont été séparées par code barres d'ADN. La première solution a été la plus économique, tandis que la seconde a permis d'obtenir des résultats plus rapides et précis. Dans l'ensemble, ces deux stratégies se sont démontrées efficaces pour le screening de gènes issus de divers échantillons. Nous avons pu identifier des nouvelles mutations faux-sens dans le gène SNRNP200, une hélicase ayant une fonction essentielle dans l'épissage. Il est intéressant de noter qu'une des ces mutations montre une pénétrance incomplète dans une famille atteinte d'adRP. Ainsi, nous avons commencé une étude sur les causes moléculaires entrainant des différences phénotypiques entre membres affectés et asymptomatiques de cette famille. Lors de l'étude de l'hypertension, j'ai rejoint un consortium européen pour réaliser une étude d'association Pangénomique ou genome-wide association study Grâce à l'utilisation de tableaux de génotypage très informatifs et de cohortes extrêmement bien caractérisées au niveau phénotypique, un nouveau locus lié à l'hypertension a été identifié dans la région promotrice du gène endothélial nitric oxide sinthase (NOS3). Par ailleurs, nous avons prouvé la cause directe du SNP associé au moyen de trois méthodes différentes: i) en reséquençant la cible avec NGS, ii) avec des essais à la luciférase et iii) une étude de population.
Resumo:
ABSTRACT: Identification of small polymorphisms from next generation sequencing short read data is relatively easy, but detection of larger deletions is less straightforward. Here, we analyzed four divergent Arabidopsis accessions and found that intersection of absent short read coverage with weak tiling array hybridization signal reliably flags deletions. Interestingly, individual deletions were frequently observed in two or more of the accessions examined, suggesting that variation in gene content partly reflects a common history of deletion events.
Resumo:
Heterozygous mutations in the PRPF31 gene cause autosomal dominant retinitis pigmentosa (adRP), a hereditary disorder leading to progressive blindness. In some cases, such mutations display incomplete penetrance, implying that certain carriers develop retinal degeneration while others have no symptoms at all. Asymptomatic carriers are protected from the disease by a higher than average expression of the PRPF31 allele that is not mutated, mainly through the action of an unknown modifier gene mapping to chromosome 19q13.4. We investigated a large family with adRP segregating an 11-bp deletion in PRPF31. The analysis of cell lines derived from asymptomatic and affected individuals revealed that the expression of only one gene among a number of candidates within the 19q13.4 interval significantly correlated with that of PRPF31, both at the mRNA and protein levels, and according to an inverse relationship. This gene was CNOT3, encoding a subunit of the Ccr4-not transcription complex. In cultured cells, siRNA-mediated silencing of CNOT3 provoked an increase in PRPF31 expression, confirming a repressive nature of CNOT3 on PRPF31. Furthermore, chromatin immunoprecipitation revealed that CNOT3 directly binds to a specific PRPF31 promoter sequence, while next-generation sequencing of the CNOT3 genomic region indicated that its variable expression is associated with a common intronic SNP. In conclusion, we identify CNOT3 as the main modifier gene determining penetrance of PRPF31 mutations, via a mechanism of transcriptional repression. In asymptomatic carriers CNOT3 is expressed at low levels, allowing higher amounts of wild-type PRPF31 transcripts to be produced and preventing manifestation of retinal degeneration.
Resumo:
The obesity epidemic is associated with the recent availability of highly palatable and inexpensive caloric food as well as important changes in lifestyle. Genetic factors, however, play a key role in regulating energy balance and numerous twin studies have estimated the BMI heritability between 40 and 70%. While common variants, identified through genome-wide association studies (GWAS) point toward new pathways, their effect size are too low to be of any use in the clinic. This review therefore concentrates on genes and genomic regions associated with very high risks of human obesity. Although there are no consensus guidelines, we review how the knowledge on these "causal factors" can be translated into the clinic for diagnostic purposes. We propose genetic workups guided by clinical manifestations in patients with severe early-onset obesity. While etiological diagnoses are unequivocal in a minority of patients, new genomic tools such as Comparative Genomic Hybridization (CGH) array, have allowed the identification of novel "causal" loci and next-generation sequencing brings the promise of accelerated pace for discoveries relevant to clinical practice.
Genetic Variations and Diseases in UniProtKB/Swiss-Prot: The Ins and Outs of Expert Manual Curation.
Resumo:
During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.
Resumo:
Recent technological progress has greatly facilitated de novo genome sequencing. However, de novo assemblies consist in many pieces of contiguous sequence (contigs) arranged in thousands of scaffolds instead of small numbers of chromosomes. Confirming and improving the quality of such assemblies is critical for subsequent analysis. We present a method to evaluate genome scaffolding by aligning independently obtained transcriptome sequences to the genome and visually summarizing the alignments using the Cytoscape software. Applying this method to the genome of the red fire ant Solenopsis invicta allowed us to identify inconsistencies in 7%, confirm contig order in 20% and extend 16% of scaffolds.Scripts that generate tables for visualization in Cytoscape from FASTA sequence and scaffolding information files are publicly available at https://github.com/ksanao/TGNet.
Resumo:
The discovery of genes implicated in familial forms of Parkinson's disease (PD) has provided new insights into the molecular events leading to neurodegeneration. Clinically, patients with genetically determined PD can be difficult to distinguish from those with sporadic PD. Monogenic causes include autosomal dominantly (SNCA, LRRK2, VPS35, EIF4G1) as well as recessively (PARK2, PINK1, DJ-1) inherited mutations. Additional recessive forms of parkinsonism present with atypical signs, including very early disease onset, dystonia, dementia and pyramidal signs. New techniques in the search for phenotype-associated genes (next-generation sequencing, genome-wide association studies) have expanded the spectrum of both monogenic PD and variants that alter risk to develop PD. Examples of risk genes include the two lysosomal enzyme coding genes GBA and SMPD1, which are associated with a 5-fold and 9-fold increased risk of PD, respectively. It is hoped that further knowledge of the genetic makeup of PD will allow designing treatments that alter the course of the disease.
Resumo:
Cancer omics data are exponentially created and associated with clinical variables, and important findings can be extracted based on bioinformatics approaches which can then be experimentally validated. Many of these findings are related to a specific class of non-coding RNA molecules called microRNAs (miRNAs) (post-transcriptional regulators of mRNA expression). The related research field is quite heterogeneous and bioinformaticians, clinicians, statisticians and biologists, as well as data miners and engineers collaborate to cure stored data and on new impulses coming from the output of the latest Next Generation Sequencing technologies. Here we review the main research findings on miRNA of the first 10 years in colon cancer research with an emphasis on possible uses in clinical practice. This review intends to provide a road map in the jungle of publications of miRNA in colorectal cancer, focusing on data availability and new ways to generate biologically relevant information out of these huge amounts of data.
Resumo:
Progress in genomics with, in particular, high throughput next generation sequencing is revolutionizing oncology. The impact of these techniques is seen on the one hand the identification of germline mutations that predispose to a given type of cancer, allowing for a personalized care of patients or healthy carriers and, on the other hand, the characterization of all acquired somatic mutation of the tumor cell, opening the door to personalized treatment targeting the driver oncogenes. In both cases, next generation sequencing techniques allow a global approach whereby the integrality of the genome mutations is analyzed and correlated with the clinical data. The benefits on the quality of care delivered to our patients are extremely impressive.
Resumo:
BACKGROUND: Genetic predisposition to life-threatening cardiac arrhythmias such as congenital long-QT syndrome (LQTS) and catecholaminergic polymorphic ventricular tachycardia (CPVT) represent treatable causes of sudden cardiac death in young adults and children. Recently, mutations in calmodulin (CALM1, CALM2) have been associated with severe forms of LQTS and CPVT, with life-threatening arrhythmias occurring very early in life. Additional mutation-positive cases are needed to discern genotype-phenotype correlations associated with calmodulin mutations. METHODS AND RESULTS: We used conventional and next-generation sequencing approaches, including exome analysis, in genotype-negative LQTS probands. We identified 5 novel de novo missense mutations in CALM2 in 3 subjects with LQTS (p.N98S, p.N98I, p.D134H) and 2 subjects with clinical features of both LQTS and CPVT (p.D132E, p.Q136P). Age of onset of major symptoms (syncope or cardiac arrest) ranged from 1 to 9 years. Three of 5 probands had cardiac arrest and 1 of these subjects did not survive. The clinical severity among subjects in this series was generally less than that originally reported for CALM1 and CALM2 associated with recurrent cardiac arrest during infancy. Four of 5 probands responded to β-blocker therapy, whereas 1 subject with mutation p.Q136P died suddenly during exertion despite this treatment. Mutations affect conserved residues located within Ca(2+)-binding loops III (p.N98S, p.N98I) or IV (p.D132E, p.D134H, p.Q136P) and caused reduced Ca(2+)-binding affinity. CONCLUSIONS: CALM2 mutations can be associated with LQTS and with overlapping features of LQTS and CPVT.
Resumo:
Using rice (Oryza sativa) as a model crop species, we performed an in-depth temporal transcriptome analysis, covering the early and late stages of Pi deprivation as well as Pi recovery in roots and shoots, using next-generation sequencing. Analyses of 126 paired-end RNA sequencing libraries, spanning nine time points, provided a comprehensive overview of the dynamic responses of rice to Pi stress. Differentially expressed genes were grouped into eight sets based on their responses to Pi starvation and recovery, enabling the complex signaling pathways involved in Pi homeostasis to be untangled. A reference annotation-based transcript assembly was also generated, identifying 438 unannotated loci that were differentially expressed under Pi starvation. Several genes also showed induction of unannotated splice isoforms under Pi starvation. Among these, PHOSPHATE2 (PHO2), a key regulator of Pi homeostasis, displayed a Pi starvation-induced isoform, which was associated with increased translation activity. In addition, microRNA (miRNA) expression profiles after long-term Pi starvation in roots and shoots were assessed, identifying 20 miRNA families that were not previously associated with Pi starvation, such as miR6250. In this article, we present a comprehensive spatio-temporal transcriptome analysis of plant responses to Pi stress, revealing a large number of potential key regulators of Pi homeostasis in plants.