931 resultados para wide genome sequencing
Resumo:
Data analysis, presentation and distribution is of utmost importance to a genome project. A public domain software, ACeDB, has been chosen as the common basis for parasite genome databases, and a first release of TcruziDB, the Trypanosoma cruzi genome database, is available by ftp from ftp://iris.dbbm.fiocruz.br/pub/genomedb/TcruziDB as well as versions of the software for different operating systems (ftp://iris.dbbm.fiocruz.br/pub/unixsoft/). Moreover, data originated from the project are available from the WWW server at http://www.dbbm.fiocruz.br. It contains biological and parasitological data on CL Brener, its karyotype, all available T. cruzi sequences from Genbank, data on the EST-sequencing project and on available libraries, a T. cruzi codon table and a listing of activities and participating groups in the genome project, as well as meeting reports. T. cruzi discussion lists (tcruzi-l@iris.dbbm.fiocruz.br and tcgenics@iris.dbbm.fiocruz.br) are being maintained for communication and to promote collaboration in the genome project
Resumo:
Random single pass sequencing of cDNA fragments, also known as generation of Expressed Sequence Tags (ESTs), has been highly successful in the study of the gene content of higher organisms, and forms an integral part of most genome projects, with the objective to identify new genes and targets for disease control and prevention and to generate mapping probes. In the Trypanosoma cruzi genome project, EST sequencing has also been a starting point, and here we report data on the first 797 sequences obtained, partly from a CL Brener epimastigote non-normalized library, partly on a normalized library. Only around 30% of the sequences obtained showed similarity with Genbank and dbEST databases, half of which with sequences already reported for T. cruzi.
Resumo:
Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.
Resumo:
PURPOSE: To identify cancer-linked genes, Sjöblom et al. and Wood et al. performed a genome-wide mutation screening in human breast and colorectal cancers. 140 CAN-genes were found in breast cancer, which in turn contained overall 334 mutations. These mutations could prove useful for diagnostic and therapeutic purposes. METHODS: We used a MALDI-TOF MS 40-plex assay for testing 40 loci within 21 high-ranking breast cancer CAN-genes. To confirm mutations, we performed single-plex assays and sequencing. RESULTS: In general, the mutation rate of the analyzed loci in our sample cohort was very low. No mutation from the 40 loci analyzed could be found in the 6 cell lines. In tissue samples, a single breast cancer tissue sample showed heterozygosity at locus c.5834G>A within the ZFYVE26 gene (Zinc finger FYVE domain-containing gene 26). CONCLUSIONS: Sjöblom et al./Wood et al. already showed that the vast majority of CAN-genes are mutated at very low frequency. Due to the fact that we only found one mutation in our cohort, we therefore assume that at the selected loci, mutations might be low-frequency events and therefore, more rarely detectable. However, further evaluation of the CAN-gene mutations in larger cohorts should be the aim of further studies.
Resumo:
BACKGROUND: DNA sequence integrity, mRNA concentrations and protein-DNA interactions have been subject to genome-wide analyses based on microarrays with ever increasing efficiency and reliability over the past fifteen years. However, very recently novel technologies for Ultra High-Throughput DNA Sequencing (UHTS) have been harnessed to study these phenomena with unprecedented precision. As a consequence, the extensive bioinformatics environment available for array data management, analysis, interpretation and publication must be extended to include these novel sequencing data types. DESCRIPTION: MIMAS was originally conceived as a simple, convenient and local Microarray Information Management and Annotation System focused on GeneChips for expression profiling studies. MIMAS 3.0 enables users to manage data from high-density oligonucleotide SNP Chips, expression arrays (both 3'UTR and tiling) and promoter arrays, BeadArrays as well as UHTS data using MIAME-compliant standardized vocabulary. Importantly, researchers can export data in MAGE-TAB format and upload them to the EBI's ArrayExpress certified data repository using a one-step procedure. CONCLUSION: We have vastly extended the capability of the system such that it processes the data output of six types of GeneChips (Affymetrix), two different BeadArrays for mRNA and miRNA (Illumina) and the Genome Analyzer (a popular Ultra-High Throughput DNA Sequencer, Illumina), without compromising on its flexibility and user-friendliness. MIMAS, appropriately renamed into Multiomics Information Management and Annotation System, is currently used by scientists working in approximately 50 academic laboratories and genomics platforms in Switzerland and France. MIMAS 3.0 is freely available via http://multiomics.sourceforge.net/.
Resumo:
Chloroquine has been the mainstay of malaria chemotherapy for the past five decades, but resistance is now widespread. Pyrimethamine or proguanil form an important component of some alternate drug combinations being used for treatment of uncomplicated Plasmodium falciparum infections in areas of chloroquine resistance. Both pyrimethamine and proguanil are dihydrofolate reductase (DHFR) inhibitors, the proguanil acting primarily through its major metabolite cycloguanil. Resistance to these drugs arises due to specific point mutations in the dhfr gene. Cross resistance between cycloguanil and pyrimethamine is not absolute. It is, therefore, important to investigate mutation rates in P. falciparum for pyrimethamine and proguanil so that DHFR inhibitor with less mutation rate is favored in drug combinations. Hence, we have compared mutation rates in P. falciparum genome for pyrimethamine and cycloguanil. Using erythrocytic stages of P. falciparum cultures, progressively drug resistant lines were selected in vitro and comparing their RFLP profile with a repeat sequence. Our finding suggests that pyrimethamine has higher mutation rate compared to cycloguanil. It enhances the degree of genomic polymorphism leading to diversity of natural parasite population which in turn is predisposes the parasites for faster selection of resistance to some other antimalarial drugs.
Resumo:
Simple sequence repeat anchored polymerase chain reaction amplification (SSR-PCR) is a genetic typing technique based on primers anchored at the 5' or 3' ends of microsatellites, at high primer annealing temperatures. This technique has already been used in studies of genetic variability of several organisms, using different primer designs. In order to conduct a detailed study of the SSR-PCR genomic targets, we cloned and sequenced 20 unique amplification products of two commonly used primers, CAA(CT)6 and (CA)8RY, using Biomphalaria glabrata genomic DNA as template. The sequences obtained were novel B. glabrata genomic sequences. It was observed that 15 clones contained microsatellites between priming sites. Out of 40 clones, seven contained complex sequence repetitions. One of the repeats that appeared in six of the amplified fragments generated a single band in Southern analysis, indicating that the sequence was not widespread in the genome. Most of the annealing sites for the CAA(CT)6 primer contained only the six repeats found within the primer sequence. In conclusion, SSR-PCR is a useful genotyping technique. However, the premise of the SSR-PCR technique, verified with the CAA(CT)6 primer, could not be supported since the amplification products did not result necessarily from microsatellite loci amplification.
Resumo:
Genetic diversity is the amount of variation observed between DNA sequences from distinct individuals of a given species. This pivotal concept of population genetics has implications for species health, domestication, management and conservation. Levels of genetic diversity seem to vary greatly in natural populations and species, but the determinants of this variation, and particularly the relative influences of species biology and ecology versus population history, are still largely mysterious. Here we show that the diversity of a species is predictable, and is determined in the first place by its ecological strategy. We investigated the genome-wide diversity of 76 non-model animal species by sequencing the transcriptome of two to ten individuals in each species. The distribution of genetic diversity between species revealed no detectable influence of geographic range or invasive status but was accurately predicted by key species traits related to parental investment: long-lived or low-fecundity species with brooding ability were genetically less diverse than short-lived or highly fecund ones. Our analysis demonstrates the influence of long-term life-history strategies on species response to short-term environmental perturbations, a result with immediate implications for conservation policies.
Resumo:
The complexity of mammalian genome organization demands a complex interplay of DNA and proteins to orchestrate proper gene regulation. CTCF, a highly conserved, ubiquitously expressed protein has been postulated as a primary organizer of genome architecture because of its roles in transcriptional activation/repression, insulation and imprinting. Diverse regulatory functions are exerted through genome wide binding via a central eleven zinc finger DNA binding domain and an array of diverse protein-protein interactions through N- and C- terminal domains. CTCFL has been identified as a paralog of CTCF expressed only in spermatogenic cells of the testis. CTCF and CTCFL have a highly homologous DNA-binding domain, while the flanking amino acid sequences exhibit no significant similarity. Genome- wide mapping of CTCF binding sites has been carried out in many cell types, but no data exist for CTCFL apart from a few identified loci. The lack of high quality antibodies prompted us to generate an endogenously flag-tagged CTCFL mouse model using BAC recombination. IHC staining using anti-flag antibodies confirmed CTCFL localization to type Β spermatogonia and preleptotene spermatocytes and a mutually exclusive pattern of expression with CTCF. ChIP followed by high-throughput sequencing identified 10,382 binding sites showing 70% overlap but representing only 20% of CTCF sites. Consensus sequence analysis identified a significantly longer binding motif with prominently less ambiguity of base calling at every position. The significant difference between CTCF and CTCFL genomic binding patterns proposes that their binding to DNA is differentially regulated. Analysis of CTCFL binding to methylated regions on a genome wide scale identified approximately 1,000 loci. Methylation-independent binding of CTCFL might be at least one of the mechanisms that ensures distinct binding patterns of CTCF and CTCFL since CTCF binding is methylation- sensitive. Co-localization of CTCF with cohesin has been well established and analysis of CTCFL and SMC3 overlap identified around 3,300 binding sites from which two related but distinct consensus sequence motifs were derived. Because virtually all data for cohesin binding originate from mitotically proliferating cells, the anticipated overlap is expected to be considerably higher in meiotic cells. Meiosis-specific cohesin subunit Rec8 is specific for spermatocytes and 6 out of the 12 identified binding sites are also bound by CTCFL. In conclusion, this was the first genome-wide mapping of CTCFL binding sites in spermatocytes, the only cell type where CTCF is not expressed. CTCFL has a unique binding site repertoire distinct from CTCF, binds to methylated sequences and shows a significant overlap with cohesin binding sites. Future efforts will be oriented towards deciphering the role CTCFL plays in conversion of chromatin structure and function from mitotic to meiotic chromosomes. - La complexité de l'organisation du génome des mammifères exige une interaction particulière entre ADN et protéines pour orchestrer une régulation appropriée de l'expression des gènes. CTCFL, une protéine ubiquitaire très conservée, serait le principal organisateur de l'architecture du génome de par son rôle dans l'activation / la répression de la transcription, la protection et la localisation des gènes. Diverses régulations sont opérées, d'une part au travers d'interactions à différents endroits du génome par le biais d'un domaine protéique central de liaison à l'ADN à onze doigts de zinc, et d'autre part par des interactions protéine-protéine variées au niveau de leur domaine N- et C-terminal. CTCFL a été identifié comme un paralogue de CTCF exprimé uniquement dans les cellules spermatiques du testicule. CTCFL et CTCF ont un domaine de liaison à l'ADN très homologue, tandis que les séquences d'acides aminés situées de part et d'autre de ce domaine ne présentent aucune similitude. Une cartographie générale des sites de liaison au CTCF a été réalisée pour de nombreux types cellulaires, mais il n'existe aucune donnée pour CTCFL à l'exception de l'identification de quelques loci. L'absence d'anticorps de bonne qualité nous a conduit à générer un modèle murin portant un CTCFL endogène taggué grâce à un procédé de recombinaison BAC. Une coloration IHC à l'aide d'anticorps anti-FLAG a confirmé la présence de CTCFL au niveau des spermatogonies de type Β et des spermatocytes au stade préleptotène, et une distribution mutuellement exclusive avec CTCF. Une méthode de Chromatine Immunoprecipitation (ChIP) suivie d'un séquençage à haut débit a permis d'identifier 10.382 sites de liaison montrant 70% d'homologie mais ne représentant que 20% des sites CTCF. L'analyse de la séquence consensus révèle un motif de fixation à l'ADN nettement plus long et qui comporte bien moins de bases aléatoires à chaque position nucléotidique. La différence significative entre les séquences génomiques des sites de liaison au CTCF et CTCFL suggère que leur fixation à l'ADN est régulée différemment. Appliquée à l'échelle du génome, l'étude de l'interaction de CTCFL avec des régions méthylées de l'ADN a permis d'identifier environ 1.000 loci. Contrairement à CTCFL, la liaison de CTCF dépend de l'état de méthylation de l'ADN ; cette modification épigénétique constitue donc au moins un des mécanismes de régulation expliquant une localisation de CTCF et CTCFL à des sites distincts du génome. La co- localisation de CTCF avec la cohésine étant établie, l'analyse de la superposition des séquences de CTCFL avec la sous-unité SMC3 identifie environ 3.300 sites de liaison parmi lesquels deux mêmes motifs consensus distincts par leur séquence sont mis en évidence. La presque quasi-totalité des données sur la cohésine ayant été établie à partir de cellules en prolifération mitotique, il est probable que la similitude au sein des séquences consensus soit encore plus grande dans le cas des cellules en méiose. La sous-unité Rec8 de la cohésine propre à l'état de méiose est spécifiquement exprimée dans les spermatocytes. Or 6 des 12 sites de liaison identifiés sont également utilisés par CTCFL. Pour conclure, ce travail constitue la première cartographie à l'échelle du génome des sites de liaison de CTCFL dans les spermatocytes, seul type cellulaire où CTCFL n'est pas exprimé. CTCFL possède un répertoire unique de sites de fixation à l'ADN distinct de CTCF, se lie à des séquences méthylées et présente un nombre important de sites de liaison communs avec la cohésine. Les perspectives futures sont d'élucider le rôle de CTCFL dans le remodelage de la structure de la chromatine et de définir sa fonction dans le processus de méiose.
Resumo:
Ants are some of the most abundant and familiar animals on Earth, and they play vital roles in most terrestrial ecosystems. Although all ants are eusocial, and display a variety of complex and fascinating behaviors, few genomic resources exist for them. Here, we report the draft genome sequence of a particularly widespread and well-studied species, the invasive Argentine ant (Linepithema humile), which was accomplished using a combination of 454 (Roche) and Illumina sequencing and community-based funding rather than federal grant support. Manual annotation of >1,000 genes from a variety of different gene families and functional classes reveals unique features of the Argentine ant's biology, as well as similarities to Apis mellifera and Nasonia vitripennis. Distinctive features of the Argentine ant genome include remarkable expansions of gustatory (116 genes) and odorant receptors (367 genes), an abundance of cytochrome P450 genes (>110), lineage-specific expansions of yellow/major royal jelly proteins and desaturases, and complete CpG DNA methylation and RNAi toolkits. The Argentine ant genome contains fewer immune genes than Drosophila and Tribolium, which may reflect the prominent role played by behavioral and chemical suppression of pathogens. Analysis of the ratio of observed to expected CpG nucleotides for genes in the reproductive development and apoptosis pathways suggests higher levels of methylation than in the genome overall. The resources provided by this genome sequence will offer an abundance of tools for researchers seeking to illuminate the fascinating biology of this emerging model organism.
Resumo:
Lancelets ('amphioxus') are the modern survivors of an ancient chordate lineage, with a fossil record dating back to the Cambrian period. Here we describe the structure and gene content of the highly polymorphic approximately 520-megabase genome of the Florida lancelet Branchiostoma floridae, and analyse it in the context of chordate evolution. Whole-genome comparisons illuminate the murky relationships among the three chordate groups (tunicates, lancelets and vertebrates), and allow not only reconstruction of the gene complement of the last common chordate ancestor but also partial reconstruction of its genomic organization, as well as a description of two genome-wide duplications and subsequent reorganizations in the vertebrate lineage. These genome-scale events shaped the vertebrate genome and provided additional genetic variation for exploitation during vertebrate evolution.
Resumo:
Differences between genomes can be due to single nucleotide variants, translocations, inversions, and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 500 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease. Hence there is a need for better-tailored and more robust tools for the detection and genome-wide analyses of CNVs. While a link between a given CNV and a disease may have often been established, the relative CNV contribution to disease progression and impact on drug response is not necessarily understood. In this review we discuss the progress, challenges, and limitations that occur at different stages of CNV analysis from the detection (using DNA microarrays and next-generation sequencing) and identification of recurrent CNVs to the association with phenotypes. We emphasize the importance of germline CNVs and propose strategies to aid clinicians to better interpret structural variations and assess their clinical implications.
Resumo:
The number of sequences generated by genome projects has increased exponentially, but gene characterization has not followed at the same rate. Sequencing and analysis of full-length cDNAs is an important step in gene characterization that has been used nowadays by several research groups. In this work, we have selected Schistosoma mansoni clones for full-length sequencing, using an algorithm that investigates the presence of the initial methionine in the parasite sequence based on the positions of alignment start between two sequences. BLAST searches to produce such alignments have been performed using parasite expressed sequence tags produced by Minas Gerais Genome Network against sequences from the database Eukaryotic Cluster of Orthologous Groups (KOG). This procedure has allowed the selection of clones representing 398 proteins which have not been deposited as S. mansoni complete CDS in any public database. Dedicated sequencing of 96 of such clones with reads from both 5' and 3' ends has been performed. These reads have been assembled using PHRAP, resulting in the production of 33 full-length sequences that represent novel S. mansoni proteins. These results shall contribute to construct a more complete view of the biology of this important parasite.
Resumo:
Background: Within the Coleoptera, the largest order in the animal kingdom, the exclusively herbivorous Chrysomelidae are recognized as one of the most species rich beetle families. The evolutionary processes that have fueled radiation into the more than thirty-five thousand currently recognized leaf beetle species remain partly unresolved. The prominent role of leaf beetles in the insect world, their omnipresence across all terrestrial biomes and their economic importance as common agricultural pest organisms make this family particularly interesting for studying the mechanisms that drive diversification. Here we specifically focus on two ecotypes of the alpine leaf beetle Oreina speciosissima (Scop.), which have been shown to exhibit morphological differences in male genitalia roughly corresponding to the subspecies Oreina speciosissima sensu stricto and Oreina speciosissima troglodytes. In general the two ecotypes segregate along an elevation gradient and by host plants: Oreina speciosissima sensu stricto colonizes high forb vegetation at low altitude and Oreina speciosissima troglodytes is found in stone run vegetation at higher elevations. Both host plants and leaf beetles have a patchy geographical distribution. Through use of gene sequencing and genome fingerprinting (AFLP) we analyzed the genetic structure and habitat use of Oreina speciosissima populations from the Swiss Alps to examine whether the two ecotypes have a genetic basis. By investigating a wide range of altitudes and focusing on the structuring effect of habitat types, we aim to provide answers regarding the factors that drive adaptive radiation in this phytophagous leaf beetle.Results: While little phylogenetic resolution was observed based on the sequencing of four DNA regions, the topology and clustering resulting from AFLP genotyping grouped specimens according to their habitat, mostly defined by plant associations. A few specimens with intermediate morphologies clustered with one of the two ecotypes or formed separate clusters consistent with habitat differences. These results were discussed in an ecological speciation framework.Conclusions: The question of whether this case of ecological differentiation occurred in sympatry or allopatry remains open. Still, the observed pattern points towards ongoing divergence between the two ecotypes which is likely driven by a recent shift in host plant use.
Resumo:
Next-generation sequencing offers an unprecedented opportunity to jointly analyze cellular and viral transcriptional activity without prerequisite knowledge of the nature of the transcripts. SupT1 cells were infected with a vesicular stomatitis virus G envelope protein (VSV-G)-pseudotyped HIV vector. At 24 h postinfection, both cellular and viral transcriptomes were analyzed by serial analysis of gene expression followed by high-throughput sequencing (SAGE-Seq). Read mapping resulted in 33 to 44 million tags aligning with the human transcriptome and 0.23 to 0.25 million tags aligning with the genome of the HIV-1 vector. Thus, at peak infection, 1 transcript in 143 is of viral origin (0.7%), including a small component of antisense viral transcription. Of the detected cellular transcripts, 826 (2.3%) were differentially expressed between mock- and HIV-infected samples. The approach also assessed whether HIV-1 infection modulates the expression of repetitive elements or endogenous retroviruses. We observed very active transcription of these elements, with 1 transcript in 237 being of such origin, corresponding on average to 123,123 reads in mock-infected samples (0.40%) and 129,149 reads in HIV-1-infected samples (0.45%) mapping to the genomic Repbase repository. This analysis highlights key details in the generation and interpretation of high-throughput data in the setting of HIV-1 cellular infection.