981 resultados para MiSeq paired-end technology
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Thirty-four microsatellite loci were isolated from three reef fish species; golden snapper Lutjanus johnii, blackspotted croaker Protonibea diacanthus and grass emperor Lethrinus laticaudis using a next generation sequencing approach. Both IonTorrent single reads and Illumina MiSeq paired-end reads were used, with the latter demonstrating a higher quality of reads than the IonTorrent. From the 1–1.5 million raw reads per species, we successfully obtained 10–13 polymorphic loci for each species, which satisfied stringent design criteria. We developed multiplex panels for the amplification of the golden snapper and the blackspotted croaker loci, as well as post-amplification pooling panels for the grass emperor loci. The microsatellites characterized in this work were tested across three locations of northern Australia. The microsatellites we developed can detect population differentiation across northern Australia and may be used for genetic structure studies and stock identification.
Resumo:
Objective: In Southern European countries up to one-third of the patients with hereditary hemochromatosis (HH) do not present the common HFE risk genotype. In order to investigate the molecular basis of these cases we have designed a gene panel for rapid and simultaneous analysis of 6 HH-related genes (HFE, TFR2, HJV, HAMP, SLC40A1 and FTL) by next-generation sequencing (NGS). Materials and Methods: Eighty-eight iron overload Portuguese patients, negative for the common HFE mutations, were analysed. A TruSeq Custom Amplicon kit (TSCA, by Illumina) was designed in order to generate 97 amplicons covering exons, intron/exon junctions and UTRs of the mentioned genes with a cumulative target sequence of 12115bp. Amplicons were sequenced in the MiSeq instrument (IIlumina) using 250bp paired-end reads. Sequences were aligned against human genome reference hg19 using alignment and variant caller algorithms in the MiSeq reporter software. Novel variants were validated by Sanger sequencing and their pathogenic significance were assessed by in silico studies. Results: We found a total of 55 different genetic variants. These include novel pathogenic missense and splicing variants (in HFE and TFR2), a very rare variant in IRE of FTL, a variant that originates a novel translation initiation codon in the HAMP gene, among others. Conclusion: The merging of TSCA methodology and NGS technology appears to be an appropriate tool for simultaneous and fast analysis of HH-related genes in a large number of samples. However, establishing the clinical relevance of NGS-detected variants for HH development remains a hard-working task, requiring further functional studies.
Resumo:
Mycoplasma suis, the causative agent of porcine infectious anemia, has never been cultured in vitro and mechanisms by which it causes disease are poorly understood. Thus, the objective herein was to use whole genome sequencing and analysis of M. suis to define pathogenicity mechanisms and biochemical pathways. M. suis was harvested from the blood of an experimentally infected pig. Following DNA extraction and construction of a paired end library, whole-genome sequencing was performed using GS-FLX (454) and Titanium chemistry. Reads on paired-end constructs were assembled using GS De Novo Assembler and gaps closed by primer walking; assembly was validated by PFGE. Glimmer and Manatee Annotation Engine were used to predict and annotate protein-coding sequences (CDS). The M. suis genome consists of a single, 742,431 bp chromosome with low G+C content of 31.1%. A total of 844 CDS, 3 single copies, unlinked rRNA genes and 32 tRNAs were identified. Gene homologies and GC skew graph show that M. suis has a typical Mollicutes oriC. The predicted metabolic pathway is concise, showing evidence of adaptation to blood environment. M. suis is a glycolytic species, obtaining energy through sugars fermentation and ATP-synthase. The pentose-phosphate pathway, metabolism of cofactors and vitamins, pyruvate dehydrogenase and NAD(+) kinase are missing. Thus, ribose, NADH, NADPH and coenzyme A are possibly essential for its growth. M. suis can generate purines from hypoxanthine, which is secreted by RBCs, and cytidine nucleotides from uracil. Toxins orthologs were not identified. We suggest that M. suis may cause disease by scavenging and competing for host nutrients, leading to decreased life-span of RBCs. In summary, genome analysis shows that M. suis is dependent on host cell metabolism and this characteristic is likely to be linked to its pathogenicity. The prediction of essential nutrients will aid the development of in vitro cultivation systems.
Resumo:
BACKGROUND: Accurate catalogs of structural variants (SVs) in mammalian genomes are necessary to elucidate the potential mechanisms that drive SV formation and to assess their functional impact. Next generation sequencing methods for SV detection are an advance on array-based methods, but are almost exclusively limited to four basic types: deletions, insertions, inversions and copy number gains. RESULTS: By visual inspection of 100 Mbp of genome to which next generation sequence data from 17 inbred mouse strains had been aligned, we identify and interpret 21 paired-end mapping patterns, which we validate by PCR. These paired-end mapping patterns reveal a greater diversity and complexity in SVs than previously recognized. In addition, Sanger-based sequence analysis of 4,176 breakpoints at 261 SV sites reveal additional complexity at approximately a quarter of structural variants analyzed. We find micro-deletions and micro-insertions at SV breakpoints, ranging from 1 to 107 bp, and SNPs that extend breakpoint micro-homology and may catalyze SV formation. CONCLUSIONS: An integrative approach using experimental analyses to train computational SV calling is essential for the accurate resolution of the architecture of SVs. We find considerable complexity in SV formation; about a quarter of SVs in the mouse are composed of a complex mixture of deletion, insertion, inversion and copy number gain. Computational methods can be adapted to identify most paired-end mapping patterns.
Resumo:
Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Resumo:
Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.
Resumo:
Using rice (Oryza sativa) as a model crop species, we performed an in-depth temporal transcriptome analysis, covering the early and late stages of Pi deprivation as well as Pi recovery in roots and shoots, using next-generation sequencing. Analyses of 126 paired-end RNA sequencing libraries, spanning nine time points, provided a comprehensive overview of the dynamic responses of rice to Pi stress. Differentially expressed genes were grouped into eight sets based on their responses to Pi starvation and recovery, enabling the complex signaling pathways involved in Pi homeostasis to be untangled. A reference annotation-based transcript assembly was also generated, identifying 438 unannotated loci that were differentially expressed under Pi starvation. Several genes also showed induction of unannotated splice isoforms under Pi starvation. Among these, PHOSPHATE2 (PHO2), a key regulator of Pi homeostasis, displayed a Pi starvation-induced isoform, which was associated with increased translation activity. In addition, microRNA (miRNA) expression profiles after long-term Pi starvation in roots and shoots were assessed, identifying 20 miRNA families that were not previously associated with Pi starvation, such as miR6250. In this article, we present a comprehensive spatio-temporal transcriptome analysis of plant responses to Pi stress, revealing a large number of potential key regulators of Pi homeostasis in plants.
Resumo:
Reference collections of multiple Drosophila lines with accumulating collections of "omics" data have proven especially valuable for the study of population genetics and complex trait genetics. Here we present a description of a resource collection of 84 strains of Drosophila melanogaster whose genome sequences were obtained after 12 generations of full-sib inbreeding. The initial rationale for this resource was to foster development of a systems biology platform for modeling metabolic regulation by the use of natural polymorphisms as perturbations. As reference lines, they are amenable to repeated phenotypic measurements, and already a large collection of metabolic traits have been assayed. Another key feature of these strains is their widespread geographic origin, coming from Beijing, Ithaca, Netherlands, Tasmania, and Zimbabwe. After obtaining 12.5× coverage of paired-end Illumina sequence reads, SNP and indel calls were made with the GATK platform. Thorough quality control was enabled by deep sequencing one line to >100×, and single-nucleotide polymorphisms and indels were validated using ddRAD-sequencing as an orthogonal platform. In addition, a series of preliminary population genetic tests were performed with these single-nucleotide polymorphism data for assessment of data quality. We found 83 segregating inversions among the lines, and as expected these were especially abundant in the African sample. We anticipate that this will make a useful addition to the set of reference D. melanogaster strains, thanks to its geographic structuring and unusually high level of genetic diversity.
Resumo:
Le but de ce projet était de développer des méthodes d'assemblage de novo dans le but d'assembler de petits génomes, principalement bactériens, à partir de données de séquençage de nouvelle-génération. Éventuellement, ces méthodes pourraient être appliquées à l'assemblage du génome de StachEndo, une Alpha-Protéobactérie inconnue endosymbiote de l'amibe Stachyamoeba lipophora. Suite à plusieurs analyses préliminaires, il fut observé que l’utilisation de lectures Illumina avec des assembleurs par graphe DeBruijn produisait les meilleurs résultats. Ces expériences ont également montré que les contigs produits à partir de différentes tailles de k-mères étaient complémentaires pour la finition des génomes. L’ajout de longues paires de lectures chevauchantes se montra essentiel pour la finition complète des grandes répétitions génomiques. Ces méthodes permirent d'assembler le génome de StachEndo (1,7 Mb). L'annotation de ce génome permis de montrer que StachEndo possède plusieurs caractéristiques inhabituelles chez les endosymbiotes. StachEndo constitue une espèce d'intérêt pour l'étude du développement endosymbiotique.
Resumo:
The objective of this paper is to utilize the SIPOC, flowchart and IDEF0 modeling techniques combined to elaborate the conceptual model of a simulation project. It is intended to identify the contribution of these techniques in the elaboration of the computational model. To illustrate such application, a practical case of a high-end technology enterprise is presented. The paper concludes that the proposed approach eases the elaboration of the computational model. © 2008 IEEE.
Resumo:
High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, < 1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.
Resumo:
BACKGROUND A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions. METHODS We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute. RESULTS The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%. CONCLUSIONS Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.
Resumo:
Supervisory Control & Data Acquisition (SCADA) systems are used by many industries because of their ability to manage sensors and control external hardware. The problem with commercially available systems is that they are restricted to a local network of users that use proprietary software. There was no Internet development guide to give remote users out of the network, control and access to SCADA data and external hardware through simple user interfaces. To solve this problem a server/client paradigm was implemented to make SCADAs available via the Internet. Two methods were applied and studied: polling of a text file as a low-end technology solution and implementing a Transmission Control Protocol (TCP/IP) socket connection. Users were allowed to login to a website and control remotely a network of pumps and valves interfaced to a SCADA. This enabled them to sample the water quality of different reservoir wells. The results were based on real time performance, stability and ease of use of the remote interface and its programming. These indicated that the most feasible server to implement is the TCP/IP connection. For the user interface, Java applets and Active X controls provide the same real time access.
Resumo:
Many authors point out that the front-end of new product development (NPD) is a critical success factor in the NPD process and that numerous companies face difficulties in carrying it out appropriately. Therefore, it is important to develop new theories and proposals that support the effective implementation of this earliest phase of NPD. This paper presents a new method to support the development of front-end activities based on integrating technology roadmapping (TRM) and project portfolio management (PPM). This new method, called the ITP Method, was implemented at a small Brazilian high-tech company in the nanotechnology industry to explore the integration proposal. The case study demonstrated that the ITP Method provides a systematic procedure for the fuzzy front-end and integrates innovation perspectives into a single roadmap, which allows for a better alignment of business efforts and communication of product innovation goals. Furthermore, the results indicated that the method may also improve quality, functional integration and strategy alignment. (C) 2010 Elsevier Inc. All rights reserved.