117 resultados para MiSeq paired-end technology
em Université de Lausanne, Switzerland
Resumo:
BACKGROUND: Accurate catalogs of structural variants (SVs) in mammalian genomes are necessary to elucidate the potential mechanisms that drive SV formation and to assess their functional impact. Next generation sequencing methods for SV detection are an advance on array-based methods, but are almost exclusively limited to four basic types: deletions, insertions, inversions and copy number gains. RESULTS: By visual inspection of 100 Mbp of genome to which next generation sequence data from 17 inbred mouse strains had been aligned, we identify and interpret 21 paired-end mapping patterns, which we validate by PCR. These paired-end mapping patterns reveal a greater diversity and complexity in SVs than previously recognized. In addition, Sanger-based sequence analysis of 4,176 breakpoints at 261 SV sites reveal additional complexity at approximately a quarter of structural variants analyzed. We find micro-deletions and micro-insertions at SV breakpoints, ranging from 1 to 107 bp, and SNPs that extend breakpoint micro-homology and may catalyze SV formation. CONCLUSIONS: An integrative approach using experimental analyses to train computational SV calling is essential for the accurate resolution of the architecture of SVs. We find considerable complexity in SV formation; about a quarter of SVs in the mouse are composed of a complex mixture of deletion, insertion, inversion and copy number gain. Computational methods can be adapted to identify most paired-end mapping patterns.
Resumo:
Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Resumo:
Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.
Resumo:
Using rice (Oryza sativa) as a model crop species, we performed an in-depth temporal transcriptome analysis, covering the early and late stages of Pi deprivation as well as Pi recovery in roots and shoots, using next-generation sequencing. Analyses of 126 paired-end RNA sequencing libraries, spanning nine time points, provided a comprehensive overview of the dynamic responses of rice to Pi stress. Differentially expressed genes were grouped into eight sets based on their responses to Pi starvation and recovery, enabling the complex signaling pathways involved in Pi homeostasis to be untangled. A reference annotation-based transcript assembly was also generated, identifying 438 unannotated loci that were differentially expressed under Pi starvation. Several genes also showed induction of unannotated splice isoforms under Pi starvation. Among these, PHOSPHATE2 (PHO2), a key regulator of Pi homeostasis, displayed a Pi starvation-induced isoform, which was associated with increased translation activity. In addition, microRNA (miRNA) expression profiles after long-term Pi starvation in roots and shoots were assessed, identifying 20 miRNA families that were not previously associated with Pi starvation, such as miR6250. In this article, we present a comprehensive spatio-temporal transcriptome analysis of plant responses to Pi stress, revealing a large number of potential key regulators of Pi homeostasis in plants.
Resumo:
Reference collections of multiple Drosophila lines with accumulating collections of "omics" data have proven especially valuable for the study of population genetics and complex trait genetics. Here we present a description of a resource collection of 84 strains of Drosophila melanogaster whose genome sequences were obtained after 12 generations of full-sib inbreeding. The initial rationale for this resource was to foster development of a systems biology platform for modeling metabolic regulation by the use of natural polymorphisms as perturbations. As reference lines, they are amenable to repeated phenotypic measurements, and already a large collection of metabolic traits have been assayed. Another key feature of these strains is their widespread geographic origin, coming from Beijing, Ithaca, Netherlands, Tasmania, and Zimbabwe. After obtaining 12.5× coverage of paired-end Illumina sequence reads, SNP and indel calls were made with the GATK platform. Thorough quality control was enabled by deep sequencing one line to >100×, and single-nucleotide polymorphisms and indels were validated using ddRAD-sequencing as an orthogonal platform. In addition, a series of preliminary population genetic tests were performed with these single-nucleotide polymorphism data for assessment of data quality. We found 83 segregating inversions among the lines, and as expected these were especially abundant in the African sample. We anticipate that this will make a useful addition to the set of reference D. melanogaster strains, thanks to its geographic structuring and unusually high level of genetic diversity.
Resumo:
In this work we present a first feasibility study of the ClearPEM technology for simultaneous PET-MR imaging. The mutual electromagnetic interference (EMI) effects between both systems were evaluated on a 7 T magnet by characterizing the response behavior of the ClearPEM detectors and front-end electronics to pulsed RF power and switched magnetic field gradients; and by analyzing the MR system performance degradation from noise pickup into the RF receiver chain, and from magnetic susceptibility artifacts caused by PET front-end materials.
Resumo:
Staphylococcus aureus harbors redundant adhesins mediating tissue colonization and infection. To evaluate their intrinsic role outside of the staphylococcal background, a system was designed to express them in Lactococcus lactis subsp. cremoris 1363. This bacterium is devoid of virulence factors and has a known genetic background. A new Escherichia coli-L. lactis shuttle and expression vector was constructed for this purpose. First, the high-copy-number lactococcal plasmid pIL253 was equipped with the oriColE1 origin, generating pOri253 that could replicate in E. coli. Second, the lactococcal promoters P23 or P59 were inserted at one end of the pOri253 multicloning site. Gene expression was assessed by a luciferase reporter system. The plasmid carrying P23 (named pOri23) expressed luciferase constitutively at a level 10,000 times greater than did the P59-containing plasmid. Transcription was absent in E. coli. The staphylococcal clumping factor A (clfA) gene was cloned into pOri23 and used as a model system. Lactococci carrying pOri23-clfA produced an unaltered and functional 130-kDa ClfA protein attached to their cell walls. This was indicated both by the presence of the protein in Western blots of solubilized cell walls and by the ability of ClfA-positive lactococci to clump in the presence of plasma. ClfA-positive lactococci had clumping titers (titer of 4,112) similar to those of S. aureus Newman in soluble fibrinogen and bound equally well to solid-phase fibrinogen. These experiments provide a new way to study individual staphylococcal pathogenic factors and might complement both classical knockout mutagenesis and modern in vivo expression technology and signature tag mutagenesis.