922 resultados para High-Throughput Nucleotide Sequencing
Resumo:
Sphingomonas wittichii RW1 is a bacterium isolated for its ability to degrade the xenobiotic compounds dibenzodioxin and dibenzofuran (DBF). A number of genes involved in DBF degradation have been previously characterized, such as the dxn cluster, dbfB, and the electron transfer components fdx1, fdx3, and redA2. Here we use a combination of whole genome transcriptome analysis and transposon library screening to characterize RW1 catabolic and other genes implicated in the reaction to or degradation of DBF. To detect differentially expressed genes upon exposure to DBF, we applied three different growth exposure experiments, using either short DBF exposures to actively growing cells or growing them with DBF as sole carbon and energy source. Genome-wide gene expression was examined using a custom-made microarray. In addition, proportional abundance determination of transposon insertions in RW1 libraries grown on salicylate or DBF by ultra-high throughput sequencing was used to infer genes whose interruption caused a fitness loss for growth on DBF. Expression patterns showed that batch and chemostat growth conditions, and short or long exposure of cells to DBF produced very different responses. Numerous other uncharacterized catabolic gene clusters putatively involved in aromatic compound metabolism increased expression in response to DBF. In addition, only very few transposon insertions completely abolished growth on DBF. Some of those (e.g., in dxnA1) were expected, whereas others (in a gene cluster for phenylacetate degradation) were not. Both transcriptomic data and transposon screening suggest operation of multiple redundant and parallel aromatic pathways, depending on DBF exposure. In addition, increased expression of other non-catabolic genes suggests that during initial exposure, S. wittichii RW1 perceives DBF as a stressor, whereas after longer exposure, the compound is recognized as a carbon source and metabolized using several pathways in parallel.
Resumo:
This study aimed to investigate the circulation of Orthobunyavirus species in the state of Mato Grosso (MT) Brazil. During a dengue outbreak in 2011/2012, 529 serum samples were collected from patients with acute febrile illness with symptoms for up to five days and 387 pools of female Culex quinquefasciatuscaptured in 2013 were subjected to nested-reverse transcription-polymerase chain reaction for segment S of the Simbu serogroup followed by nucleotide sequencing and virus isolation in Vero cells. Patients (5/529; 0.9%) from Cuiabá (n = 3), Várzea Grande (n = 1) and Nova Mutum (n = 1) municipalities were positive for the S segment of Oropouche virus (OROV). Additionally, eight/387 Cx. quinquefasciatuspools were positive for the segment, with a minimum infection rate of 2.3. Phylogenetic analysis indicated that all the samples belong to the subgenotype Ia, presenting high homology with OROV strains obtained from humans and animals in the Brazilian Amazon. The present paper reports the first detection of an Orthobunyavirus, possibly OROV, in patients and in Cx. quinquefasciatus mosquitoes in MT. This finding reinforces the notion that arboviruses frequently reported in the Amazon Region circulate sporadically in MT during dengue outbreaks.
Resumo:
The recent advance in high-throughput sequencing and genotyping protocols allows rapid investigation of Mendelian and complex diseases on a scale not previously been possible. In my thesis research I took advantage of these modern techniques to study retinitis pigmentosa (RP), a rare inherited disease characterized by progressive loss of photoreceptors and leading to blindness; and hypertension, a common condition affecting 30% of the adult population. Firstly, I compared the performance of different next generation sequencing (NGS) platforms in the sequencing of the RP-linked gene PRPF31. The gene contained a mutation in an intronic repetitive element, which presented difficulties for both classic sequencing methods and NGS. We showed that all NGS platforms are powerful tools to identify rare and common DNA variants, also in case of more complex sequences. Moreover, we evaluated the features of different NGS platforms that are important in re-sequencing projects. The main focus of my thesis was then to investigate the involvement of pre-mRNA splicing factors in autosomal dominant RP (adRP). I screened 5 candidate genes in a large cohort of patients by using long-range PCR as enrichment step, followed by NGS. We tested two different approaches: in one, all target PCRs from all patients were pooled and sequenced as a single DNA library; in the other, PCRs from each patient were separated within the pool by DNA barcodes. The first solution was more cost-effective, while the second one allowed obtaining faster and more accurate results, but overall they both proved to be effective strategies for gene screenings in many samples. We could in fact identify novel missense mutations in the SNRNP200 gene, encoding an essential RNA helicase for splicing catalysis. Interestingly, one of these mutations showed incomplete penetrance in one family with adRP. Thus, we started to study the possible molecular causes underlying phenotypic differences between asymptomatic and affected members of this family. For the study of hypertension, I joined a European consortium to perform genome-wide association studies (GWAS). Thanks to the use of very informative genotyping arrays and of phenotipically well-characterized cohorts, we could identify a novel susceptibility locus for hypertension in the promoter region of the endothelial nitric oxide synthase gene (NOS3). Moreover, we have proven the direct causality of the associated SNP using three different methods: 1) targeted resequencing, 2) luciferase assay, and 3) population study. - Le récent progrès dans le Séquençage à haut Débit et les protocoles de génotypage a permis une plus vaste et rapide étude des maladies mendéliennes et multifactorielles à une échelle encore jamais atteinte. Durant ma thèse de recherche, j'ai utilisé ces nouvelles techniques de séquençage afin d'étudier la retinite pigmentale (RP), une maladie héréditaire rare caractérisée par une perte progressive des photorécepteurs de l'oeil qui entraine la cécité; et l'hypertension, une maladie commune touchant 30% de la population adulte. Tout d'abord, j'ai effectué une comparaison des performances de différentes plateformes de séquençage NGS (Next Generation Sequencing) lors du séquençage de PRPF31, un gène lié à RP. Ce gène contenait une mutation dans un élément répétable intronique, qui présentait des difficultés de séquençage avec la méthode classique et les NGS. Nous avons montré que les plateformes de NGS analysées sont des outils très puissants pour identifier des variations de l'ADN rares ou communes et aussi dans le cas de séquences complexes. De plus, nous avons exploré les caractéristiques des différentes plateformes NGS qui sont importantes dans les projets de re-séquençage. L'objectif principal de ma thèse a été ensuite d'examiner l'effet des facteurs d'épissage de pre-ARNm dans une forme autosomale dominante de RP (adRP). Un screening de 5 gènes candidats issus d'une large cohorte de patients a été effectué en utilisant la long-range PCR comme étape d'enrichissement, suivie par séquençage avec NGS. Nous avons testé deux approches différentes : dans la première, toutes les cibles PCRs de tous les patients ont été regroupées et séquencées comme une bibliothèque d'ADN unique; dans la seconde, les PCRs de chaque patient ont été séparées par code barres d'ADN. La première solution a été la plus économique, tandis que la seconde a permis d'obtenir des résultats plus rapides et précis. Dans l'ensemble, ces deux stratégies se sont démontrées efficaces pour le screening de gènes issus de divers échantillons. Nous avons pu identifier des nouvelles mutations faux-sens dans le gène SNRNP200, une hélicase ayant une fonction essentielle dans l'épissage. Il est intéressant de noter qu'une des ces mutations montre une pénétrance incomplète dans une famille atteinte d'adRP. Ainsi, nous avons commencé une étude sur les causes moléculaires entrainant des différences phénotypiques entre membres affectés et asymptomatiques de cette famille. Lors de l'étude de l'hypertension, j'ai rejoint un consortium européen pour réaliser une étude d'association Pangénomique ou genome-wide association study Grâce à l'utilisation de tableaux de génotypage très informatifs et de cohortes extrêmement bien caractérisées au niveau phénotypique, un nouveau locus lié à l'hypertension a été identifié dans la région promotrice du gène endothélial nitric oxide sinthase (NOS3). Par ailleurs, nous avons prouvé la cause directe du SNP associé au moyen de trois méthodes différentes: i) en reséquençant la cible avec NGS, ii) avec des essais à la luciférase et iii) une étude de population.
Resumo:
Y chromosomes underlie sex determination in mammals, but their repeat-rich nature has hampered sequencing and associated evolutionary studies. Here we trace Y evolution across 15 representative mammals on the basis of high-throughput genome and transcriptome sequencing. We uncover three independent sex chromosome originations in mammals and birds (the outgroup). The original placental and marsupial (therian) Y, containing the sex-determining gene SRY, emerged in the therian ancestor approximately 180 million years ago, in parallel with the first of five monotreme Y chromosomes, carrying the probable sex-determining gene AMH. The avian W chromosome arose approximately 140 million years ago in the bird ancestor. The small Y/W gene repertoires, enriched in regulatory functions, were rapidly defined following stratification (recombination arrest) and erosion events and have remained considerably stable. Despite expression decreases in therians, Y/W genes show notable conservation of proto-sex chromosome expression patterns, although various Y genes evolved testis-specificities through differential regulatory decay. Thus, although some genes evolved novel functions through spatial/temporal expression shifts, most Y genes probably endured, at least initially, because of dosage constraints.
Resumo:
Human genetics has progressed at an unprecedented pace during the past 10 years. DNA microarrays currently allow screening of the entire human genome with high level of coverage and we are now entering the era of high-throughput sequencing. These remarkable technical advances are influencing the way medical research is conducted and have boosted our understanding of the structure of the human genome as well as of disease biology. In this context, it is crucial for clinicians to understand the main concepts and limitations of modern genetics. This review will describe key concepts in genetics, including the different types of genetic markers in the human genome, review current methods to detect DNA variation, describe major online public databases in genetics, explain key concepts in statistical genetics and finally present commonly used study designs in clinical and epidemiological research. This review will therefore concentrate on human genetic variation analysis.
Resumo:
Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are “genomic fossils” valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome’s structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.
Resumo:
Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual's DNA can be used to infer their geographic origin with surprising accuracy-often to within a few hundred kilometres.
Resumo:
Ultra-high-throughput sequencing (UHTS) techniques are evolving rapidly and may soon become an affordable and routine tool for sequencing plant DNA, even in smaller plant biology labs. Here we review recent insights into intraspecific genome variation gained from UHTS, which offers a glimpse of the rather unexpected levels of structural variability among Arabidopsis thaliana accessions. The challenges that will need to be addressed to efficiently assemble and exploit this information are also discussed.
Resumo:
MOTIVATION: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays.
Resumo:
BACKGROUND: Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-seq data. RESULTS: We conducted an extensive comparison of eleven methods for differential expression analysis of RNA-seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. We evaluate the methods based on both simulated data and real RNA-seq data. CONCLUSIONS: Very small sample sizes, which are still common in RNA-seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.
Resumo:
Progress in genomics with, in particular, high throughput next generation sequencing is revolutionizing oncology. The impact of these techniques is seen on the one hand the identification of germline mutations that predispose to a given type of cancer, allowing for a personalized care of patients or healthy carriers and, on the other hand, the characterization of all acquired somatic mutation of the tumor cell, opening the door to personalized treatment targeting the driver oncogenes. In both cases, next generation sequencing techniques allow a global approach whereby the integrality of the genome mutations is analyzed and correlated with the clinical data. The benefits on the quality of care delivered to our patients are extremely impressive.
Resumo:
Abstract : Transcriptional regulation is the result of a combination of positive and negative effectors, such as transcription factors, cofactors and chromatin modifiers. During my thesis project I studied chromatin association, and transcriptional and cell cycle regulatory functions of dHCF, the Drosophila homologue of the human protein HCF-1 (host cell factor-1). The human and Drosophila HCF proteins are synthesized as large polypeptides that are cleaved into two subunits (HCFN and HCFC), which remain associated with one another by non covalent interactions. Studies in mammalian cells over the past 20 years have been devoted to understanding the cellular functions of HCF-1 and have revealed that it is a key regulator of transcription and cell cycle regulation. In human cells, HCF-1 interacts with the histone methyltransferase Set1/Ash2 and MLL/Ash2 complexes and the histone deacetylase Sin3 complex, which are involved in transcriptional activation and repression, respectively. HCF-1 is also recruited to promoters to regulate G1 -to-S phase progression during the cell cycle by the activator transcription factors E2F1 and E2F3, and by the repressor transcription factor E2F4. HCF-1 protein structure and these interactions between HCP-1 and E2F transcriptional regulator proteins are also conserved in Drosophila. In this doctoral thesis, I use proliferating Drosophila SL2 cells to study both the genomic-binding sites of dHCF, using a combination of chromatin immunoprecipitation and ultra high throughput sequencing (ChIP-seq) analysis, and dHCF regulated genes, employing RNAi and microarray expression analysis. I show that dHCF is bound to over 7500 chromosomal sites in proliferating SL2 cells, and is located at +-200 bp relative to the transcriptional start sites of about 30% of Drosophila genes. There is also a direct relationship between dHCF promoter association and promoter- associated transcriptional activity. Thus, dHCF binding levels at promoters correlated directly with transcriptional activity. In contrast, expression studies showed that dHCF appears to be involved in both transcriptional activation and repression. Analysis of dHCF-binding sites identified nine dHCF-associated motifs, four of them linked dHCF to (i) two insulator proteins, GAGA and BEAF, (ii) the E-box motif, and (iii) a degenerated TATA-box. The dHCF-associated motifs allowed the organization of the dHCF-bound genes into five biological processes: differentiation, cell cycle and gene expression, regulation of endocytosis, and cellular localization. I further show that different mechanisms regulate dHCF association with chromatin. Despite that after dHCF cleavage the dHCFN and dHCFC subunits remain associated, the two subunits showed different affinities for chromatin and differential binding to a set of tested promoters, suggesting that dHCF could target specific promoters through each of the two subunits. Moreover, in addition to the interaction between dHCF and E2F transcription factors, the dHCF binding pattern is correlated with dE2F2 genomic 4 distribution. I show that dE2F factors are necessary for recruitment of dHCF to the promoter of a set of dHCF regulated genes. Therefore dHCF, as in mammals, is involved in regulation of G1 to S phase progression in collaboration with the dE2Fs transcription factors. In addition, gene expression arrays reveal that dHCF could indirectly regulate cell cycle progression by promoting expression of genes involved in gene expression and protein synthesis, and inhibiting expression of genes involved in cell-cell adhesion. Therefore, dHCF is an evolutionary conserved protein, which binds to many specific sites of the Drosophila genome via interaction with DNA of chromatin-binding proteins to regulate the expression of genes involved in many different cellular functions. Résumé : La regulation de la transcription est le résultat des effets positifs et négatifs des facteurs de transcription, cofacteurs et protéines effectrices qui modifient la chromatine. Pendant mon projet de thèse, j'ai étudié l'association a la chromatine, ainsi que la régulation de la transcription et du cycle cellulaire par dHCF, l'homologue chez la drosophile de la protéine humaine HCF-1 (host cell factor-1). Chez 1'humain et la V drosophile, les deux protéines HCF sont synthétisées sous la forme d'un long polypeptide, qui est ensuite coupé en deux sous-unités au centre de la protéine. Les deux sous-unités restent associées ensemble grâce a des interactions non-covalentes. Des études réalisées pendant les 20 dernières années ont permit d'établir que HCF-l et un facteur clé dans la régulation de la transcription et du cycle cellulaire. Dans les cellules humaines, HCF-1 active et réprime la transcription en interagissant avec des complexes de protéines qui activent la transcription en méthylant les histones (HMT), comme par Set1/Ash2 et MLL/Ash2, et d'autres complexes qui répriment la transcription et sont responsables de la déacétylation des histones (HDAC) comme la protéine Sin3. HCF-l est aussi recruté aux promoteurs par les activateurs de la transcription E2F l et E2F3a, et par le répresseur de la transcription E2F4 pour réguler la transition entre les phases G1 et S du cycle cellulaire. La structure de HCF-1 et les interactions entre HCF-l et les régulateurs de la transcription sont conservées chez la drosophile. Pendant ma these j'ai utilisé les cellules de la drosophile, SL2 en culture, pour étudier les endroits de liaisons de HCF-l à la chromatine, grâce a immunoprecipitation de la chromatine et du séquençage de l'ADN massif ainsi que les gènes régulés par dHCF 3 grâce a la technique de RNAi et des microarrays. Mes résultats on montré que dHCF se lie à environ 7565 endroits, et estimé a 1200 paire de bases autour des sites d'initiation de la transcription de 30% des gènes de la drosophile. J 'ai observe une relation entre dHCF et le niveau de la transcription. En effet, le niveau de liaison dHCF au promoteur corrèle avec l'activité de la transcription. Cependant, mes études d'expression ont montré que dHCF est implique dans le processus d'activation et mais aussi de répression de la transcription. L'analyse des séquences d'ADN liées par dHCF a révèle neuf motifs, quatre de ces motifs ont permis d'associer dl-ICF a deux protéines isolatrices GAGA et BEAF, au motif pour les E-boxes et a une TATA-box dégénérée. Les neuf motifs associes à dHCF ont permis d'associer les gènes lies par dHCF au promoteur a cinq processus biologiques: différentiation, cycle cellulaire, expression de gènes, régulation de l'endocytosis et la localisation cellulaire, J 'ai aussi montré qu'il y a plusieurs mécanismes qui régulent l'association de dHCF a la chromatine, malgré qu'après clivage, les deux sous-unites dHCFN and dHCFC, restent associées, elles montrent différentes affinités pour la chromatine et lient différemment un group de promoteurs, les résultats suggèrent que dHCF peut se lier aux promoteurs en utilisant chacune de ses sous-unitées. En plus de l'association de dHCF avec les facteurs de transcription dE2F s, la distribution de dHCF sur le génome corrèle avec celle du facteur de transcription dE2F2. J'ai aussi montré que les dE2Fs sont nécessaires pour le recrutement de dHCF aux promoteurs d'un sous-groupe de gènes régules par dHCF. Mes résultats ont aussi montré que chez la drosophile comme chez les humains, dl-ICF est implique dans la régulation de la progression de la phase G1 a la phase S du cycle cellulaire en collaboration avec dE2Fs. D'ailleurs, les arrays d'expression ont suggéré que dHCF pourrait réguler le cycle cellulaire de façon indirecte en activant l'expression de gènes impliqués dans l'expression génique et la synthèse de protéines, et en inhibant l'expression de gènes impliqués dans l'adhésion cellulaire. En conclusion, dHCF est une protéine, conservée dans l'évolution, qui se lie spécifiquement a beaucoup d'endroits du génome de Drosophile, grâce à l'interaction avec d'autres protéines, pour réguler l'expression des gènes impliqués dans plusieurs fonctions cellulaires.
Resumo:
BACKGROUND: Fourmidable is an infrastructure to curate and share the emerging genetic, molecular, and functional genomic data and protocols for ants. DESCRIPTION: The Fourmidable assembly pipeline groups nucleotide sequences into clusters before independently assembling each cluster. Subsequently, assembled sequences are annotated via Interproscan and BLAST against general and insect-specific databases. Gene-specific information can be retrieved using gene identifiers, searching for similar sequences or browsing through inferred Gene Ontology annotations. The database will readily scale as ultra-high throughput sequence data and sequences from additional species become available. CONCLUSION: Fourmidable currently houses EST data from two ant species and microarray gene expression data for one of these. Fourmidable is publicly available at http://fourmidable.unil.ch.
Resumo:
Background In recent years, planaria have emerged as an important model system for research into stem cells and regeneration. Attention is focused on their unique stem cells, the neoblasts, which can differentiate into any cell type present in the adult organism. Sequencing of the Schmidtea mediterranea genome and some expressed sequence tag projects have generated extensive data on the genetic profile of these cells. However, little information is available on their protein dynamics. Results We developed a proteomic strategy to identify neoblast-specific proteins. Here we describe the method and discuss the results in comparison to the genomic high-throughput analyses carried out in planaria and to proteomic studies using other stem cell systems. We also show functional data for some of the candidate genes selected in our proteomic approach. Conclusions We have developed an accurate and reliable mass-spectra-based proteomics approach to complement previous genomic studies and to further achieve a more accurate understanding and description of the molecular and cellular processes related to the neoblasts.
Resumo:
Background: Freshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system. Results: In order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function. Conclusions: We report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most expressed genes of S. mediterranea. Analysis of the annotated data promises to contribute to identification of gene families poorly characterized at a functional level. The Smed454 transcriptome data will assist in the molecular characterization of S. mediterranea as a model organism, which will be useful to a broad scientific community.