931 resultados para wide genome sequencing
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The bovine species have witnessed and played a major role in the drastic socio-economical changes that shaped our culture over the last 10,000 years. During this journey, cattle hitchhiked on human development and colonized the world, facing strong selective pressures such as dramatic environmental changes and disease challenge. Consequently, hundreds of specialized cattle breeds emerged and spread around the globe, making up a rich spectrum of genomic resources. Their DNA still carry the scars left from adapting to this wide range of conditions, and we are now empowered with data and analytical tools to track the milestones of past selection in their genomes. In this review paper, we provide a summary of the reconstructed demographic events that shaped cattle diversity, offer a critical synthesis of popular methodologies applied to the search for signatures of selection (SS) in genomic data, and give examples of recent SS studies in cattle. Then, we outline the potential and challenges of the application of SS analysis in cattle, and discuss the future directions in this field.
Resumo:
Prevotella is one of the most abundant genera in bovine rumen, although no genome has yet been assembled by a metagenomics approach applied to Brazilian Nelore. We report the draft genome sequence of Prevotella sp., comprising 2,971,040 bp, obtained using the Illumina sequencing platform. This genome includes 127 contigs and presents a low 48% GC.
Resumo:
Transposable elements (TEs) are widespread in insect´s genomes. However, there are wide differences in the proportion of the total DNA content occupied by these repetitive sequences in different species. We have analyzed the TEs present in R. prolixus (vector of the Chagas disease) and showed that 3.0% of this genome is occupied by Class II TEs, belonging mainly to the Tc1-mariner superfamily (1.65%) and MITEs (1.84%). Interestingly, most of this genomic content is due to the expansion of two subfamilies belonging to: irritans himar, a well characterized subfamily of mariners, and prolixus1, one of the two novel subfamilies here described. The high amount of sequences in these subfamilies suggests that bursts of transposition occurred during the life cycle of this family. In an attempt to characterize these elements, we performed an in silico analysis of the sequences corresponding to the DDD/E domain of the transposase gene. We performed an evolutionary analysis including network and Bayesian coalescent-based methods in order to infer the dynamics of the amplification, as well as to estimate the time of the bursts identified in these subfamilies. Given our data, we hypothesized that the TE expansions occurred around the time of speciation of R. prolixus around 1.4 mya. This suggestion lays on the Transposon Model of TE evolution, in which the members of a TE population that are replicative active are present at multiple loci in the genome, but their replicative potential varies, and of the Life Cycle Model that states that when present-day TEs have been involved in amplification bursts, they share an ancestral copy that dates back to this initial amplification.
Resumo:
Recent studies have identified the genetic underpinnings of a growing number of diseases through targeted exome sequencing. However, this strategy ignores the large component of the genome that does not code for proteins, but is nonetheless biologically functional. To address the possible involvement of regulatory variation in congenital heart diseases (CHDs), we searched for regulatory mutations impacting the activity of TBX5, a dosage-dependent transcription factor with well-defined roles in the heart and limb development that has been associated with the HoltOram syndrome (hearthand syndrome), a condition that affects 1/100 000 newborns. Using a combination of genomics, bioinformatics and mouse genetic engineering, we scanned approximate to 700 kb of the TBX5 locus in search of cis-regulatory elements. We uncovered three enhancers that collectively recapitulate the endogenous expression pattern of TBX5 in the developing heart. We re-sequenced these enhancer elements in a cohort of non-syndromic patients with isolated atrial and/or ventricular septal defects, the predominant cardiac defects of the HoltOram syndrome, and identified a patient with a homozygous mutation in an enhancer approximate to 90 kb downstream of TBX5. Notably, we demonstrate that this single-base-pair mutation abrogates the ability of the enhancer to drive expression within the heart in vivo using both mouse and zebrafish transgenic models. Given the population-wide frequency of this variant, we estimate that 1/100 000 individuals would be homozygous for this variant, highlighting that a significant number of CHD associated with TBX5 dysfunction might arise from non-coding mutations in TBX5 heart enhancers, effectively decoupling the heart and hand phenotypes of the HoltOram syndrome.
Resumo:
Background: Great efforts have been made to increase accessibility of HIV antiretroviral therapy (ART) in low and middle-income countries. The threat of wide-scale emergence of drug resistance could severely hamper ART scale-up efforts. Population-based surveillance of transmitted HIV drug resistance ensures the use of appropriate first-line regimens to maximize efficacy of ART programs where drug options are limited. However, traditional HIV genotyping is extremely expensive, providing a cost barrier to wide-scale and frequent HIV drug resistance surveillance. Methods/Results: We have developed a low-cost laboratory-scale next-generation sequencing-based genotyping method to monitor drug resistance. We designed primers specifically to amplify protease and reverse transcriptase from Brazilian HIV subtypes and developed a multiplexing scheme using multiplex identifier tags to minimize cost while providing more robust data than traditional genotyping techniques. Using this approach, we characterized drug resistance from plasma in 81 HIV infected individuals collected in Sao Paulo, Brazil. We describe the complexities of analyzing next-generation sequencing data and present a simplified open-source workflow to analyze drug resistance data. From this data, we identified drug resistance mutations in 20% of treatment naive individuals in our cohort, which is similar to frequencies identified using traditional genotyping in Brazilian patient samples. Conclusion: The developed ultra-wide sequencing approach described here allows multiplexing of at least 48 patient samples per sequencing run, 4 times more than the current genotyping method. This method is also 4-fold more sensitive (5% minimal detection frequency vs. 20%) at a cost 3-5 x less than the traditional Sanger-based genotyping method. Lastly, by using a benchtop next-generation sequencer (Roche/454 GS Junior), this approach can be more easily implemented in low-resource settings. This data provides proof-of-concept that next-generation HIV drug resistance genotyping is a feasible and low-cost alternative to current genotyping methods and may be particularly beneficial for in-country surveillance of transmitted drug resistance.
Resumo:
Background: Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. However, rearrangements that may exist between the query and reference genomes may result in incorrect scaffolds, if these rearrangements are not taken into account. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome. Results: We present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of SIS, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that SIS has overall better performance. Conclusions: SIS is a new easy-to-use tool to generate contig scaffolds, available both as stand-alone and as a web server. The good performance of SIS in our tests adds evidence that large-scale inversions are widespread in prokaryotic genomes.
Resumo:
BACKGROUND & AIMS: Homozygous loss of function mutations in interleukin-10 (IL10) and interleukin-10 receptors (IL10R) cause severe infantile (very early onset) inflammatory bowel disease (IBD). Allogeneic hematopoietic stem cell transplantation (HSCT) was reported to induce sustained remission in 1 patient with IL-10R deficiency. We investigated heterogeneity among patients with very early onset IBD, its mechanisms, and the use of allogeneic HSCT to treat this disorder. METHODS: We analyzed 66 patients with early onset IBD (younger than 5 years of age) for mutations in the genes encoding IL-10, IL-10R1, and IL-10R2. IL-10R deficiency was confirmed by functional assays on patients' peripheral blood mononuclear cells (immunoblot and enzyme-linked immunosorbent assay analyses). We assessed the therapeutic effects of standardized allogeneic HSCT. RESULTS: Using a candidate gene sequencing approach, we identified 16 patients with IL-10 or IL-10R deficiency: 3 patients had mutations in IL-10, 5 had mutations in IL-10R1, and 8 had mutations in IL-10R2. Refractory colitis became manifest in all patients within the first 3 months of life and was associated with perianal disease (16 of 16 patients). Extraintestinal symptoms included folliculitis (11 of 16) and arthritis (4 of 16). Allogeneic HSCT was performed in 5 patients and induced sustained clinical remission with a median follow-up time of 2 years. In vitro experiments confirmed reconstitution of IL-10R-mediated signaling in all patients who received the transplant. CONCLUSIONS: We identified loss of function mutations in IL-10 and IL-10R in patients with very early onset IBD. These findings indicate that infantile IBD patients with perianal disease should be screened for IL-10 and IL-10R deficiency and that allogeneic HSCT can induce remission in those with IL-10R deficiency.
Resumo:
A small supernumerary marker chromosome (sSMC) derived from chromosome 22 is a relatively common cytogenetic finding. This sSMC typically results in tetrasomy for a chromosomal region that spans the chromosome 22p arm and the proximal 2 Mb of 22q11.21. Using classical cytogenetics, fluorescence in situ hybridization, multiplex ligation-dependent probe amplification, and array techniques, 7 patients with sSMCs derived from chromosome 22 were studied: 4 non-related and 3 from the same family (mother, daughter, and son). The sSMCs in all patients were dicentric and bisatellited chromosomes with breakpoints in the chromosome 22 low-copy repeat A region, resulting in cat eye syndrome (CES) due to chromosome 22 partial tetrasomy 22pter -> q11.2 including the cat eye chromosome region. Although all subjects presented the same chromosomal abnormality, they showed a wide range of phenotypic differences, even in the 3 patients from the same family. There are no previous reports of CES occurring within 3 patients in the same family. Thus, the clinical and follow-up data presented here contribute to a better delineation of the phenotypes and outcomes of CES patients and will be useful for genetic counseling. Copyright (C) 2012 S. Karger AG, Basel
Resumo:
Schistosoma mansoni is one of the agents of schistosomiasis, a chronic and debilitating disease. Here we, present a transcriptome-wide characterization of adult S. mansoni males by high-throughput RNA-sequencing. We obtained 1,620,432 high-quality ESTs from a directional strand-specific cDNA library, resulting in a 26% higher coverage of genome bases than that of the public ESTs available at NCBI. With a 15 x-deep coverage of transcribed genomic regions, our data were able to (i) confirm for the first time 990 predictions without previous evidence of transcription; (ii) correct gene predictions; (iii) discover 989 and 1196 RNA-seq contigs that map to intergenic and intronic genomic regions, respectively, where no gene had been predicted before. These contigs could represent new protein-coding genes or non-coding RNAs (ncRNAs). Interestingly, we identified 11 novel Micro-exon genes (MEGs). These data reveal new features of the S. mansoni transcriptional landscape and significantly advance our understanding of the parasite transcriptome. (c) 2011 Elsevier Inc. All rights reserved.
Resumo:
Abstract Background The implication of post-transcriptional regulation by microRNAs in molecular mechanisms underlying cancer disease is well documented. However, their interference at the cellular level is not fully explored. Functional in vitro studies are fundamental for the comprehension of their role; nevertheless results are highly dependable on the adopted cellular model. Next generation small RNA transcriptomic sequencing data of a tumor cell line and keratinocytes derived from primary culture was generated in order to characterize the microRNA content of these systems, thus helping in their understanding. Both constitute cell models for functional studies of microRNAs in head and neck squamous cell carcinoma (HNSCC), a smoking-related cancer. Known microRNAs were quantified and analyzed in the context of gene regulation. New microRNAs were investigated using similarity and structural search, ab initio classification, and prediction of the location of mature microRNAs within would-be precursor sequences. Results were compared with small RNA transcriptomic sequences from HNSCC samples in order to access the applicability of these cell models for cancer phenotype comprehension and for novel molecule discovery. Results Ten miRNAs represented over 70% of the mature molecules present in each of the cell types. The most expressed molecules were miR-21, miR-24 and miR-205, Accordingly; miR-21 and miR-205 have been previously shown to play a role in epithelial cell biology. Although miR-21 has been implicated in cancer development, and evaluated as a biomarker in HNSCC progression, no significant expression differences were seen between cell types. We demonstrate that differentially expressed mature miRNAs target cell differentiation and apoptosis related biological processes, indicating that they might represent, with acceptable accuracy, the genetic context from which they derive. Most miRNAs identified in the cancer cell line and in keratinocytes were present in tumor samples and cancer-free samples, respectively, with miR-21, miR-24 and miR-205 still among the most prevalent molecules at all instances. Thirteen miRNA-like structures, containing reads identified by the deep sequencing, were predicted from putative miRNA precursor sequences. Strong evidences suggest that one of them could be a new miRNA. This molecule was mostly expressed in the tumor cell line and HNSCC samples indicating a possible biological function in cancer. Conclusions Critical biological features of cells must be fully understood before they can be chosen as models for functional studies. Expression levels of miRNAs relate to cell type and tissue context. This study provides insights on miRNA content of two cell models used for cancer research. Pathways commonly deregulated in HNSCC might be targeted by most expressed and also by differentially expressed miRNAs. Results indicate that the use of cell models for cancer research demands careful assessment of underlying molecular characteristics for proper data interpretation. Additionally, one new miRNA-like molecule with a potential role in cancer was identified in the cell lines and clinical samples.
Resumo:
The comparative genomic sequence analysis of a region in human chromosome 11p15.3 and its homologous segment in mouse chromosome 7 between ST5 and LMO1 genes has been performed. 158,201 bases were sequenced in the mouse and compared with the syntenic region in human, partially available in the public databases. The analysed region exhibits the typical eukaryotic genomic structure and compared with the close neighbouring regions, strikingly reflexes the mosaic pattern distribution of (G+C) and repeats content despites its relative short size. Within this region the novel gene STK33 was discovered (Stk33 in the mouse), that codes for a serine/threonine kinase. The finding of this gene constitutes an excellent example of the strength of the comparative sequencing approach. Poor gene-predictions in the mouse genomic sequence were corrected and improved by the comparison with the unordered data from the human genomic sequence publicly available. Phylogenetical analysis suggests that STK33 belongs to the calcium/calmodulin-dependent protein kinases group and seems to be a novelty in the chordate lineage. The gene, as a whole, seems to evolve under purifying selection whereas some regions appear to be under strong positive selection. Both human and mouse versions of serine/threonine kinase 33, consists of seventeen exons highly conserved in the coding regions, particularly in those coding for the core protein kinase domain. Also the exon/intron structure in the coding regions of the gene is conserved between human and mouse. The existence and functionality of the gene is supported by the presence of entries in the EST databases and was in vivo fully confirmed by isolating specific transcripts from human uterus total RNA and from several mouse tissues. Strong evidence for alternative splicing was found, which may result in tissue-specific starting points of transcription and in some extent, different protein N-termini. RT-PCR and hybridisation experiments suggest that STK33/Stk33 is differentially expressed in a few tissues and in relative low levels. STK33 has been shown to be reproducibly down-regulated in tumor tissues, particularly in ovarian tumors. RNA in-situ hybridisation experiments using mouse Stk33-specific probes showed expression in dividing cells from lung and germinal epithelium and possibly also in macrophages from kidney and lungs. Preliminary experimentation with antibodies designed in this work, performed in parallel to the preparation of this manuscript, seems to confirm this expression pattern. The fact that the chromosomal region 11p15 in which STK33 is located may be associated with several human diseases including tumor development, suggest further investigation is necessary to establish the role of STK33 in human health.
Resumo:
This 9p21 locus, encode for important proteins involved in cell cycle regulation and apoptosis containing the p16/CDKN2A (cyclin-dependent kinase inhibitor 2a) tumor suppressor gene and two other related genes, p14/ARF and p15/CDKN2B. This locus, is a major target of inactivation in the pathogenesis of a number of human tumors, both solid and haematologic, and is a frequent site of loss or deletion also in acute lymphoblastic leukemia (ALL) ranging from 18% to 45% 1. In order to explore, at high resolution, the frequency and size of alterations affecting this locus in adult BCR-ABL1-positive ALL and to investigate their prognostic value, 112 patients (101 de novo and 11 relapse cases) were analyzed by genome-wide single nucleotide polymorphisms arrays and gene candidate deep exon sequencing. Paired diagnosis-relapse samples were further available and analyzed for 19 (19%) cases. CDKN2A/ARF and CDKN2B genomic alterations were identified in 29% and 25% of newly diagnosed patients, respectively. Deletions were monoallelic in 72% of cases and in 43% the minimal overlapping region of the lost area spanned only the CDKN2A/2B gene locus. The analysis at the time of relapse showed an almost significant increase in the detection rate of CDKN2A/ARF loss (47%) compared to diagnosis (p = 0.06). Point mutations within the 9p21 locus were found at very low level with only a non-synonymous substition in the exon 2 of CDKN2A. Finally, correlation with clinical outcome showed that deletions of CDKN2A/B are significantly associated with poor outcome in terms of overall survival (p = 0.0206), disease free-survival (p = 0.0010) and cumulative incidence of relapse (p = 0.0014). The inactivation of 9p21 locus by genomic deletions is a frequent event in BCR-ABL1-positive ALL. Deletions are frequently acquired at the leukemia progression and work as a poor prognostic marker.
Resumo:
The research presented in my PhD thesis is part of a wider European project, FishPopTrace, focused on traceability of fish populations and products. My work was aimed at developing and analyzing novel genetic tools for a widely distributed marine fish species, the European hake (Merluccius merluccius), in order to investigate population genetic structure and explore potential applications to traceability scenarios. A total of 395 SNPs (Single Nucleotide Polymorphisms) were discovered from a massive collection of Expressed Sequence Tags, obtained by high-throughput sequencing, and validated on 19 geographic samples from Atlantic and Mediterranean. Genome-scan approaches were applied to identify polymorphisms on genes potentially under divergent selection (outlier SNPs), showing higher genetic differentiation among populations respect to the average observed across loci. Comparative analysis on population structure were carried out on putative neutral and outlier loci at wide (Atlantic and Mediterranean samples) and regional (samples within each basin) spatial scales, to disentangle the effects of demographic and adaptive evolutionary forces on European hake populations genetic structure. Results demonstrated the potential of outlier loci to unveil fine scale genetic structure, possibly identifying locally adapted populations, despite the weak signal showed from putative neutral SNPs. The application of outlier SNPs within the framework of fishery resources management was also explored. A minimum panel of SNP markers showing maximum discriminatory power was selected and applied to a traceability scenario aiming at identifying the basin (and hence the stock) of origin, Atlantic or Mediterranean, of individual fish. This case study illustrates how molecular analytical technologies have operational potential in real-world contexts, and more specifically, potential to support fisheries control and enforcement and fish and fish product traceability.
Resumo:
Ziel der vorliegenden Arbeit war die vergleichende Sequenzierung und nachfolgende Analyse des syntänen chromosomalen Abschnitts auf dem kurzen Arm des humanen Chromosoms 11 in der Region 11p15.3 mit den Genen LMO1, TUB und dem orthologen Genomabschnitt der Maus auf Chromosom 7 F2. Die im Rahmen dieser Arbeit durchgeführte Kartierung dieser beiden chromosomalen Bereiche ermöglichte die Komplettierung einer genomischen Karte auf insgesamt über eine Megabase, die im Kooperationssequenzierprojekt der Universitäts-Kinderklinik und dem Institut für Molekulargenetik in Mainz erstellt wurde. Mit Hilfe von 28 PAC- und Cosmid-Klonen konnten in dieser Arbeit 383 kb an genomischer DNA des Menschen und mit sechs BAC- und PAC-Klonen 412 kb an genomischer DNA der Maus dargestellt werden. Dies ermöglichte erstmals die exakte Festlegung der Reihenfolge der in diesem chromosomalen Abschnitt enthaltenen Gene und die genaue Kartierung von acht STS-Markern des Menschen, bzw. vier STS-Sonden der Maus. Es zeigte sich dabei, dass die chromosomale Orientierung telomer-/centromerwärts des orthologen Bereichs in der Maus im Vergleich zum Menschen in invertierter Ausrichtung vorliegt. Die Sequenzierung von drei humanen Klonen ermöglichte die Bestimmung von 319.119 bp an zusammenhängender genomischer DNA. Dadurch konnte die genaue Lokalisation und Strukturaufklärung der Gene LMO1, ein putatives Tumorsuppressorgen, das mit der Entstehung von Leukämien assoziiert ist, und TUB, ein Transkriptionsmodulator, der in die Fettstoffwechselregulation involviert ist, vorgenommen werden. Für das murine Genom wurden 412.827 bp an neuer DNA-Sequenz durch Sequenzierung von ebenfalls drei Klonen generiert. Der im Vergleich zum Menschen ca. 100 kb größere Genombereich beinhaltete zudem die neuen Gene Stk33 und Eif3. Es handelte sich dabei um zwei Gene, die erst im Rahmen dieser Arbeit entdeckt und charakterisiert wurden. Die parallele Bearbeitung beider Genombereiche ermöglichte eine umfassende komparative Analyse nach kodierenden, funktionellen und strukturgebenden Sequenzabschnitten in beiden Spezies. Es konnten dabei für beide Organismen die Exon-Intron-Strukturen der Gene LMO1/Lmo1 und TUB/Tub geklärt. Zudem konnten vier neue Exons und zwei neue speziesspezifischer Spleißvarianten für TUB/Tub beschrieben werden. Die Identifizierung dieser neuen Spleißvarianten offenbart neue Möglichkeiten für alternative Regulation und Funktion, oder für eine veränderte Proteinstruktur, die weitere Erklärungsansätze für die Entstehung der mit diesen Genen assoziierten Erkrankungen zulässt. In der sequenzierten, größeren Genomsequenz der Maus konnte in den flankierenden, nicht mit der sequenzierten Humansequenz überlappenden Bereich das neue Gen Eif3 in seiner Exon-Intron-Struktur und die beiden letzten Exons 11 und 12 des Gens Stk33 kartiert und charakterisiert werden. Die umfangreiche Sequenzanalyse beider sequenzierter Genombereiche ergab für den Abschnitt des Menschen insgesamt 229 potentielle Exonsequenzen und für den Bereich der Maus 527 mögliche Exonbereiche. Davon konnten beim Menschen explizit 21 Exons und bei der Maus 31 Exons als exprimierte Bereiche identifiziert und experimentell mittels RT-PCR, bzw. durch cDNA-Sequenzierung verifiziert werden. Diese Abschnitte beschrieben nicht nur die Exonbereiche der oben genannten vier Gene, sondern konnten auch neuen nicht weiter definierten EST-Sequenzen zugeordnet werden. Mittels des Interspeziesvergleiches war darüber hinaus auch die Analyse der nichtkodierenden Intergen-Bereiche möglich. So konnten beispielsweise im ersten Intron des LMO1/Lmo1 sieben Sequenzbereiche mit Konservierungen von ca. 90% bestimmt werden. Auch die Charakterisierung von Promotor- und putativ regulatorischen Sequenzabschnitten konnte mit Hilfe unterschiedlicher bioinformatischer Analyse-Tools durchgeführt werden. Die konservierten Sequenzbereiche der DNA zeigen im Durchschnitt eine Homologie von mehr als 65% auf. Auch die Betrachtung der Genomorganisation zeigte Gemeinsamkeiten, die sich meist nur in ihrer graduellen Ausprägung unterschieden. So weist ein knapp 80 kb großer Bereich proximal zum humanen TUB-Gen einen deutlich erhöhten AT-Gehalt auf, der ebenso im murinen Genom nur in verkürzter Version und schwächer ausgeprägt in Erscheinung tritt. Die zusätzliche Vergleichsanalyse mit einer weiteren Spezies, den orthologen Genomabschnitten von Fugu, zeigte, dass es sich bei den untersuchten Genen LMO1 und TUB um sehr konservierte und evolutiv alte Gene handelt, deren genomisches Organisationsmuster sich auch bei den paralogen Genfamilienmitglieder innerhalb derselben Spezies wiederfindet. Insgesamt konnte durch die Kartierung, Sequenzierung und Analyse eine umfassende Datenbasis für die betrachtete Genomregion und die beschriebenen Gene generiert werden, die für zukünftige Untersuchungen und Fragestellungen wertvolle Informationen bereithält.