946 resultados para pooled sequencing
Resumo:
BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Resumo:
BACKGROUND: Retinal dystrophies (RD) are a group of hereditary diseases that lead to debilitating visual impairment and are usually transmitted as a Mendelian trait. Pathogenic mutations can occur in any of the 100 or more disease genes identified so far, making molecular diagnosis a rather laborious process. In this work we explored the use of whole exome sequencing (WES) as a tool for identification of RD mutations, with the aim of assessing its applicability in a diagnostic context. METHODOLOGY/PRINCIPAL FINDINGS: We ascertained 12 Spanish families with seemingly recessive RD. All of the index patients underwent mutational pre-screening by chip-based sequence hybridization and resulted to be negative for known RD mutations. With the exception of one pedigree, to simulate a standard diagnostic scenario we processed by WES only the DNA from the index patient of each family, followed by in silico data analysis. We successfully identified causative mutations in patients from 10 different families, which were later verified by Sanger sequencing and co-segregation analyses. Specifically, we detected pathogenic DNA variants (∼50% novel mutations) in the genes RP1, USH2A, CNGB3, NMNAT1, CHM, and ABCA4, responsible for retinitis pigmentosa, Usher syndrome, achromatopsia, Leber congenital amaurosis, choroideremia, or recessive Stargardt/cone-rod dystrophy cases. CONCLUSIONS/SIGNIFICANCE: Despite the absence of genetic information from other family members that could help excluding nonpathogenic DNA variants, we could detect causative mutations in a variety of genes known to represent a wide spectrum of clinical phenotypes in 83% of the patients analyzed. Considering the constant drop in costs for human exome sequencing and the relative simplicity of the analyses made, this technique could represent a valuable tool for molecular diagnostics or genetic research, even in cases for which no genotypes from family members are available.
Resumo:
The recent advance in high-throughput sequencing and genotyping protocols allows rapid investigation of Mendelian and complex diseases on a scale not previously been possible. In my thesis research I took advantage of these modern techniques to study retinitis pigmentosa (RP), a rare inherited disease characterized by progressive loss of photoreceptors and leading to blindness; and hypertension, a common condition affecting 30% of the adult population. Firstly, I compared the performance of different next generation sequencing (NGS) platforms in the sequencing of the RP-linked gene PRPF31. The gene contained a mutation in an intronic repetitive element, which presented difficulties for both classic sequencing methods and NGS. We showed that all NGS platforms are powerful tools to identify rare and common DNA variants, also in case of more complex sequences. Moreover, we evaluated the features of different NGS platforms that are important in re-sequencing projects. The main focus of my thesis was then to investigate the involvement of pre-mRNA splicing factors in autosomal dominant RP (adRP). I screened 5 candidate genes in a large cohort of patients by using long-range PCR as enrichment step, followed by NGS. We tested two different approaches: in one, all target PCRs from all patients were pooled and sequenced as a single DNA library; in the other, PCRs from each patient were separated within the pool by DNA barcodes. The first solution was more cost-effective, while the second one allowed obtaining faster and more accurate results, but overall they both proved to be effective strategies for gene screenings in many samples. We could in fact identify novel missense mutations in the SNRNP200 gene, encoding an essential RNA helicase for splicing catalysis. Interestingly, one of these mutations showed incomplete penetrance in one family with adRP. Thus, we started to study the possible molecular causes underlying phenotypic differences between asymptomatic and affected members of this family. For the study of hypertension, I joined a European consortium to perform genome-wide association studies (GWAS). Thanks to the use of very informative genotyping arrays and of phenotipically well-characterized cohorts, we could identify a novel susceptibility locus for hypertension in the promoter region of the endothelial nitric oxide synthase gene (NOS3). Moreover, we have proven the direct causality of the associated SNP using three different methods: 1) targeted resequencing, 2) luciferase assay, and 3) population study. - Le récent progrès dans le Séquençage à haut Débit et les protocoles de génotypage a permis une plus vaste et rapide étude des maladies mendéliennes et multifactorielles à une échelle encore jamais atteinte. Durant ma thèse de recherche, j'ai utilisé ces nouvelles techniques de séquençage afin d'étudier la retinite pigmentale (RP), une maladie héréditaire rare caractérisée par une perte progressive des photorécepteurs de l'oeil qui entraine la cécité; et l'hypertension, une maladie commune touchant 30% de la population adulte. Tout d'abord, j'ai effectué une comparaison des performances de différentes plateformes de séquençage NGS (Next Generation Sequencing) lors du séquençage de PRPF31, un gène lié à RP. Ce gène contenait une mutation dans un élément répétable intronique, qui présentait des difficultés de séquençage avec la méthode classique et les NGS. Nous avons montré que les plateformes de NGS analysées sont des outils très puissants pour identifier des variations de l'ADN rares ou communes et aussi dans le cas de séquences complexes. De plus, nous avons exploré les caractéristiques des différentes plateformes NGS qui sont importantes dans les projets de re-séquençage. L'objectif principal de ma thèse a été ensuite d'examiner l'effet des facteurs d'épissage de pre-ARNm dans une forme autosomale dominante de RP (adRP). Un screening de 5 gènes candidats issus d'une large cohorte de patients a été effectué en utilisant la long-range PCR comme étape d'enrichissement, suivie par séquençage avec NGS. Nous avons testé deux approches différentes : dans la première, toutes les cibles PCRs de tous les patients ont été regroupées et séquencées comme une bibliothèque d'ADN unique; dans la seconde, les PCRs de chaque patient ont été séparées par code barres d'ADN. La première solution a été la plus économique, tandis que la seconde a permis d'obtenir des résultats plus rapides et précis. Dans l'ensemble, ces deux stratégies se sont démontrées efficaces pour le screening de gènes issus de divers échantillons. Nous avons pu identifier des nouvelles mutations faux-sens dans le gène SNRNP200, une hélicase ayant une fonction essentielle dans l'épissage. Il est intéressant de noter qu'une des ces mutations montre une pénétrance incomplète dans une famille atteinte d'adRP. Ainsi, nous avons commencé une étude sur les causes moléculaires entrainant des différences phénotypiques entre membres affectés et asymptomatiques de cette famille. Lors de l'étude de l'hypertension, j'ai rejoint un consortium européen pour réaliser une étude d'association Pangénomique ou genome-wide association study Grâce à l'utilisation de tableaux de génotypage très informatifs et de cohortes extrêmement bien caractérisées au niveau phénotypique, un nouveau locus lié à l'hypertension a été identifié dans la région promotrice du gène endothélial nitric oxide sinthase (NOS3). Par ailleurs, nous avons prouvé la cause directe du SNP associé au moyen de trois méthodes différentes: i) en reséquençant la cible avec NGS, ii) avec des essais à la luciférase et iii) une étude de population.
Resumo:
With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.
Resumo:
There are suggestions of an inverse association between folate intake and serum folate levels and the risk of oral cavity and pharyngeal cancers (OPCs), but most studies are limited in sample size, with only few reporting information on the source of dietary folate. Our study aims to investigate the association between folate intake and the risk of OPC within the International Head and Neck Cancer Epidemiology (INHANCE) Consortium. We analyzed pooled individual-level data from ten case-control studies participating in the INHANCE consortium, including 5,127 cases and 13,249 controls. Odds ratios (ORs) and the corresponding 95% confidence intervals (CIs) were estimated for the associations between total folate intake (natural, fortification and supplementation) and natural folate only, and OPC risk. We found an inverse association between total folate intake and overall OPC risk (the adjusted OR for the highest vs. the lowest quintile was 0.65, 95% CI: 0.43-0.99), with a stronger association for oral cavity (OR = 0.57, 95% CI: 0.43-0.75). A similar inverse association, though somewhat weaker, was observed for folate intake from natural sources only in oral cavity cancer (OR = 0.64, 95% CI: 0.45-0.91). The highest OPC risk was observed in heavy alcohol drinkers with low folate intake as compared to never/light drinkers with high folate (OR = 4.05, 95% CI: 3.43-4.79); the attributable proportion (AP) owing to interaction was 11.1% (95% CI: 1.4-20.8%). Lastly, we reported an OR of 2.73 (95% CI:2.34-3.19) for those ever tobacco users with low folate intake, compared with nevere tobacco users and high folate intake (AP of interaction =10.6%, 95% CI: 0.41-20.8%). Our project of a large pool of case-control studies supports a protective effect of total folate intake on OPC risk.
Resumo:
Background In a previous study, the European Organisation for Research and Treatment of Cancer (EORTC) reported a scoring system to predict survival of patients with low-grade gliomas (LGGs). A major issue in the diagnosis of brain tumors is the lack of agreement among pathologists. New models in patients with LGGs diagnosed by central pathology review are needed. Methods Data from 339 EORTC patients with LGGs diagnosed by central pathology review were used to develop new prognostic models for progression-free survival (PFS) and overall survival (OS). Data from 450 patients with centrally diagnosed LGGs recruited into 2 large studies conducted by North American cooperative groups were used to validate the models. Results Both PFS and OS were negatively influenced by the presence of baseline neurological deficits, a shorter time since first symptoms (<30 wk), an astrocytic tumor type, and tumors larger than 5 cm in diameter. Early irradiation improved PFS but not OS. Three risk groups have been identified (low, intermediate, and high) and validated. Conclusions We have developed new prognostic models in a more homogeneous LGG population diagnosed by central pathology review. This population better fits with modern practice, where patients are enrolled in clinical trials based on central or panel pathology review. We could validate the models in a large, external, and independent dataset. The models can divide LGG patients into 3 risk groups and provide reliable individual survival predictions. Inclusion of other clinical and molecular factors might still improve models' predictions.
Resumo:
Mathematical methods combined with measurements of single-cell dynamics provide a means to reconstruct intracellular processes that are only partly or indirectly accessible experimentally. To obtain reliable reconstructions, the pooling of measurements from several cells of a clonal population is mandatory. However, cell-to-cell variability originating from diverse sources poses computational challenges for such process reconstruction. We introduce a scalable Bayesian inference framework that properly accounts for population heterogeneity. The method allows inference of inaccessible molecular states and kinetic parameters; computation of Bayes factors for model selection; and dissection of intrinsic, extrinsic and technical noise. We show how additional single-cell readouts such as morphological features can be included in the analysis. We use the method to reconstruct the expression dynamics of a gene under an inducible promoter in yeast from time-lapse microscopy data.
Resumo:
We performed whole genome sequencing in 16 unrelated patients with autosomal recessive retinitis pigmentosa (ARRP), a disease characterized by progressive retinal degeneration and caused by mutations in over 50 genes, in search of pathogenic DNA variants. Eight patients were from North America, whereas eight were Japanese, a population for which ARRP seems to have different genetic drivers. Using a specific workflow, we assessed both the coding and noncoding regions of the human genome, including the evaluation of highly polymorphic SNPs, structural and copy number variations, as well as 69 control genomes sequenced by the same procedures. We detected homozygous or compound heterozygous mutations in 7 genes associated with ARRP (USH2A, RDH12, CNGB1, EYS, PDE6B, DFNB31, and CERKL) in eight patients, three Japanese and five Americans. Fourteen of the 16 mutant alleles identified were previously unknown. Among these, there was a 2.3-kb deletion in USH2A and an inverted duplication of ∼446 kb in EYS, which would have likely escaped conventional screening techniques or exome sequencing. Moreover, in another Japanese patient, we identified a homozygous frameshift (p.L206fs), absent in more than 2,500 chromosomes from ethnically matched controls, in the ciliary gene NEK2, encoding a serine/threonine-protein kinase. Inactivation of this gene in zebrafish induced retinal photoreceptor defects that were rescued by human NEK2 mRNA. In addition to identifying a previously undescribed ARRP gene, our study highlights the importance of rare structural DNA variations in Mendelian diseases and advocates the need for screening approaches that transcend the analysis of the coding sequences of the human genome.
Resumo:
PURPOSE OF REVIEW: To review major findings on the T-cell receptor (TCR) repertoire diversity in response to several viral infections based on conventional methods of PCR, cloning and sequencing and to discuss their limitations in light of the recent methodological advances in deep sequencing.¦RECENT FINDINGS: Direct sequencing of TCR expressed by Ag-specific T cells isolated ex vivo has revealed that the TCR repertoire is not as restricted as previously estimated. Furthermore, analyses performed independently of the T-cell clonal hierarchy have brought to light an unexpected diversity. The choice of methods is critical to characterize the complexity of the repertoire. Recent advances in deep sequencing have uncovered the diversity of the TCR repertoire and shown that the size of the repertoire in naive and Ag-experienced memory T cells is three-fold to 15-fold larger than formerly estimated. Interestingly, the TCR complementary determining region 3 sequences are not randomly selected and a certain degree of shared TCR repertoire has been observed between different individuals.¦SUMMARY: Deep sequencing is a major methodological advance allowing more accurate molecular characterization of the TCR repertoire. In the near future, such technologies will further contribute to delineate the complexity of pathogen-specific T-cell response and help defining correlates of a protective immunity.
Resumo:
We performed exome sequencing to detect somatic mutations in protein-coding regions in seven melanoma cell lines and donor-matched germline cells. All melanoma samples had high numbers of somatic mutations, which showed the hallmark of UV-induced DNA repair. Such a hallmark was absent in tumor sample-specific mutations in two metastases derived from the same individual. Two melanomas with non-canonical BRAF mutations harbored gain-of-function MAP2K1 and MAP2K2 (MEK1 and MEK2, respectively) mutations, resulting in constitutive ERK phosphorylation and higher resistance to MEK inhibitors. Screening a larger cohort of individuals with melanoma revealed the presence of recurring somatic MAP2K1 and MAP2K2 mutations, which occurred at an overall frequency of 8%. Furthermore, missense and nonsense somatic mutations were frequently found in three candidate melanoma genes, FAT4, LRP1B and DSC1.
Resumo:
The molecular diagnosis of retinal dystrophies (RD) is difficult because of genetic and clinical heterogeneity. Previously, the molecular screening of genes was done one by one, sometimes in a scheme based on the frequency of sequence variants and the number of exons/length of the candidate genes. Payment for these procedures was complicated and the sequential billing of several genes created endless paperwork. We therefore evaluated the costs of generating and sequencing a hybridization-based DNA library enriched for the 64 most frequently mutated genes in RD, called IROme, and compared them to the costs of amplifying and sequencing these genes by the Sanger method. The production cost generated by the high-throughput (HT) sequencing of IROme was established at CHF 2,875.75 per case. Sanger sequencing of the same exons cost CHF 69,399.02. Turnaround time of the analysis was 3 days for IROme. For Sanger sequencing, it could only be estimated, as we never sequenced all 64 genes in one single patient. Sale cost for IROme calculated on the basis of the sale cost of one exon by Sanger sequencing is CHF 8,445.88, which corresponds to the sale price of 40 exons. In conclusion, IROme is cheaper and faster than Sanger sequencing and therefore represents a sound approach for the diagnosis of RD, both scientifically and economically. As a drop in the costs of HT sequencing is anticipated, target resequencing might become the new gold standard in the molecular diagnosis of RD.
Resumo:
Opsismodysplasia (OPS) is a severe autosomal-recessive chondrodysplasia characterized by pre- and postnatal micromelia with extremely short hands and feet. The main radiological features are severe platyspondyly, squared metacarpals, delayed skeletal ossification, and metaphyseal cupping. In order to identify mutations causing OPS, a total of 16 cases (7 terminated pregnancies and 9 postnatal cases) from 10 unrelated families were included in this study. We performed exome sequencing in three cases from three unrelated families and only one gene was found to harbor mutations in all three cases: inositol polyphosphate phosphatase-like 1 (INPPL1). Screening INPPL1 in the remaining cases identified a total of 12 distinct INPPL1 mutations in the 10 families, present at the homozygote state in 7 consanguinous families and at the compound heterozygote state in the 3 remaining families. Most mutations (6/12) resulted in premature stop codons, 2/12 were splice site, and 4/12 were missense mutations located in the catalytic domain, 5-phosphatase. INPPL1 belongs to the inositol-1,4,5-trisphosphate 5-phosphatase family, a family of signal-modulating enzymes that govern a plethora of cellular functions by regulating the levels of specific phosphoinositides. Our finding of INPPL1 mutations in OPS, a severe spondylodysplastic dysplasia with major growth plate disorganization, supports a key and specific role of this enzyme in endochondral ossification.
Resumo:
Although cigarette smoking and alcohol consumption increase risk for head and neck cancers, there have been few attempts to model risks quantitatively and to formally evaluate cancer site-specific risks. The authors pooled data from 15 case-control studies and modeled the excess odds ratio (EOR) to assess risk by total exposure (pack-years and drink-years) and its modification by exposure rate (cigarettes/day and drinks/day). The smoking analysis included 1,761 laryngeal, 2,453 pharyngeal, and 1,990 oral cavity cancers, and the alcohol analysis included 2,551 laryngeal, 3,693 pharyngeal, and 3,116 oval cavity cancers, with over 8,000 controls. Above 15 cigarettes/day, the EOR/pack-year decreased with increasing cigarettes/day, suggesting that greater cigarettes/day for a shorter duration was less deleterious than fewer cigarettes/day for a longer duration. Estimates of EOR/pack-year were homogeneous across sites, while the effects of cigarettes/day varied, indicating that the greater laryngeal cancer risk derived from differential cigarettes/day effects and not pack-years. EOR/drink-year estimates increased through 10 drinks/day, suggesting that greater drinks/day for a shorter duration was more deleterious than fewer drinks/day for a longer duration. Above 10 drinks/day, data were limited. EOR/drink-year estimates varied by site, while drinks/day effects were homogeneous, indicating that the greater pharyngeal/oral cavity cancer risk with alcohol consumption derived from the differential effects of drink-years and not drinks/day.
Resumo:
We describe an original case of disseminated infection with Histoplasma capsulatum (Hc) var. duboisii in an African patient with AIDS who migrated to Switzerland. The diagnosis of histoplasmosis was suggested using direct examination of tissues and confirmed in 24 h with a panfungal polymerase chain reaction assay. The variety duboisii of Hc was established using DNA sequencing of the polymorphic genomic region OLE. Molecular tools allow diagnosis of histoplasmosis in 24 h, which is drastically shorter than culture procedures.
Resumo:
BACKGROUND: The magnitude of risk conferred by the interaction between tobacco and alcohol use on the risk of head and neck cancers is not clear because studies have used various methods to quantify the excess head and neck cancer burden. METHODS: We analyzed individual-level pooled data from 17 European and American case-control studies (11,221 cases and 16,168 controls) participating in the International Head and Neck Cancer Epidemiology consortium. We estimated the multiplicative interaction parameter (psi) and population attributable risks (PAR). RESULTS: A greater than multiplicative joint effect between ever tobacco and alcohol use was observed for head and neck cancer risk (psi = 2.15; 95% confidence interval, 1.53-3.04). The PAR for tobacco or alcohol was 72% (95% confidence interval, 61-79%) for head and neck cancer, of which 4% was due to alcohol alone, 33% was due to tobacco alone, and 35% was due to tobacco and alcohol combined. The total PAR differed by subsite (64% for oral cavity cancer, 72% for pharyngeal cancer, 89% for laryngeal cancer), by sex (74% for men, 57% for women), by age (33% for cases <45 years, 73% for cases >60 years), and by region (84% in Europe, 51% in North America, 83% in Latin America). CONCLUSIONS: Our results confirm that the joint effect between tobacco and alcohol use is greater than multiplicative on head and neck cancer risk. However, a substantial proportion of head and neck cancers cannot be attributed to tobacco or alcohol use, particularly for oral cavity cancer and for head and neck cancer among women and among young-onset cases.