927 resultados para GENOME ASSEMBLY
Resumo:
Background: Freshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system. Results: In order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function. Conclusions: We report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most expressed genes of S. mediterranea. Analysis of the annotated data promises to contribute to identification of gene families poorly characterized at a functional level. The Smed454 transcriptome data will assist in the molecular characterization of S. mediterranea as a model organism, which will be useful to a broad scientific community.
Resumo:
Background Freshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system. Results In order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function. Conclusions We report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most expressed genes of S. mediterranea. Analysis of the annotated data promises to contribute to identification of gene families poorly characterized at a functional level. The Smed454 transcriptome data will assist in the molecular characterization of S. mediterranea as a model organism, which will be useful to a broad scientific community.
Resumo:
The genomic loci occupied by RNA polymerase (RNAP) III have been characterized in human culture cells by genome-wide chromatin immunoprecipitations, followed by deep sequencing (ChIP-seq). These studies have shown that only ∼40% of the annotated 622 human tRNA genes and pseudogenes are occupied by RNAP-III, and that these genes are often in open chromatin regions rich in active RNAP-II transcription units. We have used ChIP-seq to characterize RNAP-III-occupied loci in a differentiated tissue, the mouse liver. Our studies define the mouse liver RNAP-III-occupied loci including a conserved mammalian interspersed repeat (MIR) as a potential regulator of an RNAP-III subunit-encoding gene. They reveal that synteny relationships can be established between a number of human and mouse RNAP-III genes, and that the expression levels of these genes are significantly linked. They establish that variations within the A and B promoter boxes, as well as the strength of the terminator sequence, can strongly affect RNAP-III occupancy of tRNA genes. They reveal correlations with various genomic features that explain the observed variation of 81% of tRNA scores. In mouse liver, loci represented in the NCBI37/mm9 genome assembly that are clearly occupied by RNAP-III comprise 50 Rn5s (5S RNA) genes, 14 known non-tRNA RNAP-III genes, nine Rn4.5s (4.5S RNA) genes, and 29 SINEs. Moreover, out of the 433 annotated tRNA genes, half are occupied by RNAP-III. Transfer RNA gene expression levels reflect both an underlying genomic organization conserved in dividing human culture cells and resting mouse liver cells, and the particular promoter and terminator strengths of individual genes.
Resumo:
Le but de ce projet était de développer des méthodes d'assemblage de novo dans le but d'assembler de petits génomes, principalement bactériens, à partir de données de séquençage de nouvelle-génération. Éventuellement, ces méthodes pourraient être appliquées à l'assemblage du génome de StachEndo, une Alpha-Protéobactérie inconnue endosymbiote de l'amibe Stachyamoeba lipophora. Suite à plusieurs analyses préliminaires, il fut observé que l’utilisation de lectures Illumina avec des assembleurs par graphe DeBruijn produisait les meilleurs résultats. Ces expériences ont également montré que les contigs produits à partir de différentes tailles de k-mères étaient complémentaires pour la finition des génomes. L’ajout de longues paires de lectures chevauchantes se montra essentiel pour la finition complète des grandes répétitions génomiques. Ces méthodes permirent d'assembler le génome de StachEndo (1,7 Mb). L'annotation de ce génome permis de montrer que StachEndo possède plusieurs caractéristiques inhabituelles chez les endosymbiotes. StachEndo constitue une espèce d'intérêt pour l'étude du développement endosymbiotique.
Resumo:
Background: Human infection by the pork tapeworm Taenia solium affects more than 50 million people worldwide, particularly in underdeveloped and developing countries. Cysticercosis which arises from larval encystation can be life threatening and difficult to treat. Here, we investigate for the first time the transcriptome of the clinically relevant cysticerci larval form. Results: Using Expressed Sequence Tags (ESTs) produced by the ORESTES method, a total of 1,520 high quality ESTs were generated from 20 ORESTES cDNA mini-libraries and its analysis revealed fragments of genes with promising applications including 51 ESTs matching antigens previously described in other species, as well as 113 sequences representing proteins with potential extracellular localization, with obvious applications for immune-diagnosis or vaccine development. Conclusion: The set of sequences described here will contribute to deciphering the expression profile of this important parasite and will be informative for the genome assembly and annotation, as well as for studies of intra- and inter-specific sequence variability. Genes of interest for developing new diagnostic and therapeutic tools are described and discussed.
Resumo:
In DNA microarray experiments, the gene fragments that are spotted on the slides are usually obtained by the synthesis of specific oligonucleotides that are able to amplify genes through PCR. Shotgun library sequences are an alternative to synthesis of primers for the study of each gene in the genome. The possibility of putting thousands of gene sequences into a single slide allows the use of shotgun clones in order to proceed with microarray analysis without a completely sequenced genome. We developed an OC Identifier tool (optimal clone identifier for genomic shotgun libraries) for the identification of unique genes in shotgun libraries based on a partially sequenced genome; this allows simultaneous use of clones in projects such as transcriptome and phylogeny studies, using comparative genomic hybridization and genome assembly. The OC Identifier tool allows comparative genome analysis, biological databases, query language in relational databases, and provides bioinformatics tools to identify clones that contain unique genes as alternatives to primer synthesis. The OC Identifier allows analysis of clones during the sequencing phase, making it possible to select genes of interest for construction of a DNA microarray. ©FUNPEC-RP.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Bacterial artificial chromosomes (BAC) have been widely used for fluorescence in situ hybridization (FISH) mapping of chromosome landmarks in different organisms, including a few in teleosts. In this study, we used BAC-FISH to consolidate the previous genetic and cytogenetic maps of the turbot (Scophthalmus maximus), a commercially important pleuronectiform. The maps consisted of 24 linkage groups (LGs) but only 22 chromosomes. All turbot LGs were assigned to specific chromosomes using BAC probes obtained from a turbot 5x genomic BAC library. It consisted of 46,080 clones with inserts of at least 100 kb and < 5 % empty vectors. These BAC probes contained gene-derived or anonymous markers, most of them linked to quantitative trait loci (QTL) related to productive traits. BAC clones were mapped by FISH to unique marker-specific chromosomal positions, which showed a notable concordance with previous genetic mapping data. The two metacentric pairs were cytogenetically assigned to LG2 and LG16, and the nucleolar organizer region (NOR)-bearing pair was assigned to LG15. Double-color FISH assays enabled the consolidation of the turbot genetic map into 22 linkage groups by merging LG8 with LG18 and LG21 with LG24. In this work, a first-generation probe panel of BAC clones anchored to the turbot linkage and cytogenetical map was developed. It is a useful tool for chromosome traceability in turbot, but also relevant in the context of pleuronectiform karyotypes, which often show small hardly identifiable chromosomes. This panel will also be valuable for further integrative genomics of turbot within Pleuronectiformes and teleosts, especially for fine QTL mapping for aquaculture traits, comparative genomics, and whole-genome assembly.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Pós-graduação em Genética e Melhoramento Animal - FCAV
Resumo:
Hereditary nasal parakeratosis (HNPK), an inherited monogenic autosomal recessive skin disorder, leads to crusts and fissures on the nasal planum of Labrador Retrievers. We performed a genome-wide association study (GWAS) using 13 HNPK cases and 23 controls. We obtained a single strong association signal on chromosome 2 (p(raw) = 4.4×10⁻¹⁴). The analysis of shared haplotypes among the 13 cases defined a critical interval of 1.6 Mb with 25 predicted genes. We re-sequenced the genome of one case at 38× coverage and detected 3 non-synonymous variants in the critical interval with respect to the reference genome assembly. We genotyped these variants in larger cohorts of dogs and only one was perfectly associated with the HNPK phenotype in a cohort of more than 500 dogs. This candidate causative variant is a missense variant in the SUV39H2 gene encoding a histone 3 lysine 9 (H3K9) methyltransferase, which mediates chromatin silencing. The variant c.972T>G is predicted to change an evolutionary conserved asparagine into a lysine in the catalytically active domain of the enzyme (p.N324K). We further studied the histopathological alterations in the epidermis in vivo. Our data suggest that the HNPK phenotype is not caused by hyperproliferation, but rather delayed terminal differentiation of keratinocytes. Thus, our data provide evidence that SUV39H2 is involved in the epigenetic regulation of keratinocyte differentiation ensuring proper stratification and tight sealing of the mammalian epidermis.
Resumo:
Hereditary footpad hyperkeratosis (HFH) represents a palmoplantar hyperkeratosis, which is inherited as a monogenic autosomal recessive trait in several dog breeds, such as e.g. Kromfohrländer and Irish Terriers. We performed genome-wide association studies (GWAS) in both breeds. In Kromfohrländer we obtained a single strong association signal on chromosome 5 (p(raw) = 1.0×10(-13)) using 13 HFH cases and 29 controls. The association signal replicated in an independent cohort of Irish Terriers with 10 cases and 21 controls (p(raw) = 6.9×10(-10)). The analysis of shared haplotypes among the combined Kromfohrländer and Irish Terrier cases defined a critical interval of 611 kb with 13 predicted genes. We re-sequenced the genome of one affected Kromfohrländer at 23.5× coverage. The comparison of the sequence data with 46 genomes of non-affected dogs from other breeds revealed a single private non-synonymous variant in the critical interval with respect to the reference genome assembly. The variant is a missense variant (c.155G>C) in the FAM83G gene encoding a protein with largely unknown function. It is predicted to change an evolutionary conserved arginine into a proline residue (p.R52P). We genotyped this variant in a larger cohort of dogs and found perfect association with the HFH phenotype. We further studied the clinical and histopathological alterations in the epidermis in vivo. Affected dogs show a moderate to severe orthokeratotic hyperplasia of the palmoplantar epidermis. Thus, our data provide the first evidence that FAM83G has an essential role for maintaining the integrity of the palmoplantar epidermis.
Resumo:
In this thesis we will see that the DNA sequence is constantly shaped by the interactions with its environment at multiple levels, showing footprints of DNA methylation, of its 3D organization and, in the case of bacteria, of the interaction with the host organisms. In the first chapter, we will see that analyzing the distribution of distances between consecutive dinucleotides of the same type along the sequence, we can detect epigenetic and structural footprints. In particular, we will see that CG distance distribution allows to distinguish among organisms of different biological complexity, depending on how much CG sites are involved in DNA methylation. Moreover, we will see that CG and TA can be described by the same fitting function, suggesting a relationship between the two. We will also provide an interpretation of the observed trend, simulating a positioning process guided by the presence and absence of memory. In the end, we will focus on TA distance distribution, characterizing deviations from the trend predicted by the best fitting function, and identifying specific patterns that might be related to peculiar mechanical properties of the DNA and also to epigenetic and structural processes. In the second chapter, we will see how we can map the 3D structure of the DNA onto its sequence. In particular, we devised a network-based algorithm that produces a genome assembly starting from its 3D configuration, using as inputs Hi-C contact maps. Specifically, we will see how we can identify the different chromosomes and reconstruct their sequences by exploiting the spectral properties of the Laplacian operator of a network. In the third chapter, we will see a novel method for source clustering and source attribution, based on a network approach, that allows to identify host-bacteria interaction starting from the detection of Single-Nucleotide Polymorphisms along the sequence of bacterial genomes.
Resumo:
ABSTRACT Pneumocystis jirovecii is a fungus that causes severe pneumonia in immunocompromised patients. However, its study is hindered by the lack of an in vitro culture method. We report here the genome of P. jirovecii that was obtained from a single bronchoalveolar lavage fluid specimen from a patient. The major challenge was the in silico sorting of the reads from a mixture representing the different organisms of the lung microbiome. This genome lacks virulence factors and most amino acid biosynthesis enzymes and presents reduced GC content and size. Together with epidemiological observations, these features suggest that P. jirovecii is an obligate parasite specialized in the colonization of human lungs, which causes disease only in immune-deficient individuals. This genome sequence will boost research on this deadly pathogen. IMPORTANCE Pneumocystis pneumonia is a major cause of mortality in patients with impaired immune systems. The availability of the P. jirovecii genome sequence allows new analyses to be performed which open avenues to solve critical issues for this deadly human disease. The most important ones are (i) identification of nutritional supplements for development of culture in vitro, which is still lacking 100 years after discovery of the pathogen; (ii) identification of new targets for development of new drugs, given the paucity of present treatments and emerging resistance; and (iii) identification of targets for development of vaccines.
Resumo:
Prevotella is one of the most abundant genera in bovine rumen, although no genome has yet been assembled by a metagenomics approach applied to Brazilian Nelore. We report the draft genome sequence of Prevotella sp., comprising 2,971,040 bp, obtained using the Illumina sequencing platform. This genome includes 127 contigs and presents a low 48% GC.