124 resultados para bioinformatics
Resumo:
During my PhD, my aim was to provide new tools to increase our capacity to analyse gene expression patterns, and to study on a large-scale basis the evolution of gene expression in animals. Gene expression patterns (when and where a gene is expressed) are a key feature in understanding gene function, notably in development. It appears clear now that the evolution of developmental processes and of phenotypes is shaped both by evolution at the coding sequence level, and at the gene expression level.Studying gene expression evolution in animals, with complex expression patterns over tissues and developmental time, is still challenging. No tools are available to routinely compare expression patterns between different species, with precision, and on a large-scale basis. Studies on gene expression evolution are therefore performed only on small genes datasets, or using imprecise descriptions of expression patterns.The aim of my PhD was thus to develop and use novel bioinformatics resources, to study the evolution of gene expression. To this end, I developed the database Bgee (Base for Gene Expression Evolution). The approach of Bgee is to transform heterogeneous expression data (ESTs, microarrays, and in-situ hybridizations) into present/absent calls, and to annotate them to standard representations of anatomy and development of different species (anatomical ontologies). An extensive mapping between anatomies of species is then developed based on hypothesis of homology. These precise annotations to anatomies, and this extensive mapping between species, are the major assets of Bgee, and have required the involvement of many co-workers over the years. My main personal contribution is the development and the management of both the Bgee database and the web-application.Bgee is now on its ninth release, and includes an important gene expression dataset for 5 species (human, mouse, drosophila, zebrafish, Xenopus), with the most data from mouse, human and zebrafish. Using these three species, I have conducted an analysis of gene expression evolution after duplication in vertebrates.Gene duplication is thought to be a major source of novelty in evolution, and to participate to speciation. It has been suggested that the evolution of gene expression patterns might participate in the retention of duplicate genes. I performed a large-scale comparison of expression patterns of hundreds of duplicated genes to their singleton ortholog in an outgroup, including both small and large-scale duplicates, in three vertebrate species (human, mouse and zebrafish), and using highly accurate descriptions of expression patterns. My results showed unexpectedly high rates of de novo acquisition of expression domains after duplication (neofunctionalization), at least as high or higher than rates of partitioning of expression domains (subfunctionalization). I found differences in the evolution of expression of small- and large-scale duplicates, with small-scale duplicates more prone to neofunctionalization. Duplicates with neofunctionalization seemed to evolve under more relaxed selective pressure on the coding sequence. Finally, even with abundant and precise expression data, the majority fate I recovered was neither neo- nor subfunctionalization of expression domains, suggesting a major role for other mechanisms in duplicate gene retention.
Resumo:
ABSTRACT: BACKGROUND: It is accepted that a woman's lifetime risk of developing breast cancer after menopause is reduced by early full term pregnancy and multiparity. This phenomenon is thought to be associated with the development and differentiation of the breast during pregnancy. METHODS: In order to understand the underlying molecular mechanisms of pregnancy induced breast cancer protection, we profiled and compared the transcriptomes of normal breast tissue biopsies from 71 parous (P) and 42 nulliparous (NP) healthy postmenopausal women using Affymetrix Human Genome U133 Plus 2.0 arrays. To validate the results, we performed real time PCR and immunohistochemistry. RESULTS: We identified 305 differentially expressed probesets (208 distinct genes). Of these, 267 probesets were up- and 38 down-regulated in parous breast samples; bioinformatics analysis using gene ontology enrichment revealed that up-regulated genes in the parous breast represented biological processes involving differentiation and development, anchoring of epithelial cells to the basement membrane, hemidesmosome and cell-substrate junction assembly, mRNA and RNA metabolic processes and RNA splicing machinery. The down-regulated genes represented biological processes that comprised cell proliferation, regulation of IGF-like growth factor receptor signaling, somatic stem cell maintenance, muscle cell differentiation and apoptosis. CONCLUSIONS: This study suggests that the differentiation of the breast imprints a genomic signature that is centered in the mRNA processing reactome. These findings indicate that pregnancy may induce a safeguard mechanism at post-transcriptional level that maintains the fidelity of the transcriptional process.
Resumo:
Centrifuge is a user-friendly system to simultaneously access Arabidopsis gene annotations and intra- and inter-organism sequence comparison data. The tool allows rapid retrieval of user-selected data for each annotated Arabidopsis gene providing, in any combination, data on the following features: predicted protein properties such as mass, pI, cellular location and transmembrane domains; SWISS-PROT annotations; Interpro domains; Gene Ontology records; verified transcription; BLAST matches to the proteomes of A.thaliana, Oryza sativa (rice), Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. The tool lends itself particularly well to the rapid analysis of contigs or of tens or hundreds of genes identified by high-throughput gene expression experiments. In these cases, a summary table of principal predicted protein features for all genes is given followed by more detailed reports for each individual gene. Centrifuge can also be used for single gene analysis or in a word search mode. AVAILABILITY: http://centrifuge.unil.ch/ CONTACT: edward.farmer@unil.ch.
Resumo:
One major methodological problem in analysis of sequence data is the determination of costs from which distances between sequences are derived. Although this problem is currently not optimally dealt with in the social sciences, it has some similarity with problems that have been solved in bioinformatics for three decades. In this article, the authors propose an optimization of substitution and deletion/insertion costs based on computational methods. The authors provide an empirical way of determining costs for cases, frequent in the social sciences, in which theory does not clearly promote one cost scheme over another. Using three distinct data sets, the authors tested the distances and cluster solutions produced by the new cost scheme in comparison with solutions based on cost schemes associated with other research strategies. The proposed method performs well compared with other cost-setting strategies, while it alleviates the justification problem of cost schemes.
Resumo:
Proteins located on the surface of the pathogenic malaria parasite Plasmodium falciparum are objects of intensive studies due to their important role in the invasion of human cells and the accessibility to host antibodies thus making these proteins attractive vaccine candidates. One of these proteins, merozoite surface protein 3 (MSP3) represents a leading component among vaccine candidates; however, little is known about its structure and function. Our biophysical studies suggest that the 40 residue C-terminal domain of MSP3 protein self-assembles into a four-stranded alpha-helical coiled coil structure where alpha-helices are packed "side-by-side". A bioinformatics analysis provides an extended list of known and putative proteins from different species of Plasmodium which have such MSP3-like C-terminal domains. This finding allowed us to extend some conclusions of our studies to a larger group of the malaria surface proteins. Possible structural and functional roles of these highly conserved oligomerization domains in the intact merozoite surface proteins are discussed.
Resumo:
EMBnet is a consortium of collaborating bioinformatics groups located mainly within Europe (http://www.embnet.org). Each member country is represented by a 'node', a group responsible for the maintenance of local services for their users (e.g. education, training, software, database distribution, technical support, helpdesk). Among these services a web portal with links and access to locally developed and maintained software is essential and different for each node. Our web portal targets biomedical scientists in Switzerland and elsewhere, offering them access to a collection of important sequence analysis tools mirrored from other sites or developed locally. We describe here the Swiss EMBnet node web site (http://www.ch.embnet.org), which presents a number of original services not available anywhere else.
Resumo:
Previous studies in the lab of Dr. Liliane Michalik, have shown thai the nuclear hormone receptor Peroxisome Proliferator Activated Receptor beta/delta (PPARß/ö) is an important regulator of skin homeostasis, being involved in the regulation of keratinocyte differentiation, inflammation, apoptosis, arid mouse skin wound healing. Studies of PPARß/ö knock out mice have suggested a possible role for this receptor in cancer. However, contradictory observations of the role for PPARß/ö on tumor growth have been published, depending on cellular contexts and biological models. Given the controversial role of PPARß/ö in skin carcinoma development, the main aim of this PhD work has been to further explore the implication of PPARß/ö in skin response to UV and skin tumor growth. This PhD dissertation is divided in four chapters. The first chapter describes the core part of the project, where I explored the changes in miRNA expression in the skin upon chronic UV irradiation of PPARß/ö wild type and knock-out mice. This analysis shed light on a miRNA- PPARß/ö signature and also predicted thai miR-21-3p (previously named miR-21*) is a key regulator of the PPARß/ö-dependent UV response in the pre-lesiona! skin. Using mice acutely UV-irradiated, ! further demonstrated that miR-21-3p is indirectly regulated by PPARß/ö through activation of Transforming Growth Factor (TGFß)-1 under UV exposure. I also show that miR-21-3p is deregulated in human cutaneous squamous celi carcinoma. In cultured keratinocytes, application of a miR-21 -3p mimic oligonucleotide sequence leads to the regulation of lipid metabolism-related pathway. In the second chapter, I demonstrate that the usage of an mRNA/miRNA combined bioinformatics analysis leads to the discovery of important pathways involved in the PPARß/ö-miRNA response of the skin to chronic UV irradiation, indeed, I validated angiogenesis and lipid metabolism as important functions regulated by PPARß/ö in this context. In the third chapter, we demonstrate that PPARß/5 knockout mice have decreased cutaneous squamous cell carcinomas incidence compared to wild type mice and that PPARß/5 directly activates the cSrc kinase gene. In the last chapter, we review novel insights into PPAR functions in keratinocytes and liver, with emphasis on PPARß/ö but also on PPARa. In summary, this PhD study shows that i) PPARß/5 is able to regulate biological function through regulation of miRNAs, and specifically through miR-21-3p, the passenger miRNA of the oncomiR miR-21, and that ii) the PPARß/5-dependent skin response to UV involves the regulation of angiogenesis and lipid metabolism. Furthermore, the bioinformatics study highlights the relevance of performing integrated mRNA and miRNA genome-wide studies in order to better screen mRNAs and/or miRNAs of interest in the biological context of diseases. - Des études préalables dans le laboratoire du Dr. Liliane Michalik ont démontré que le récepteur nucléaire PPARß/5 est un régulateur important de l'homéostasie de la peau, étant impliqué dans la régulation de la différenciation des keratinocytes, dans l'inflammation, dans l'apoptose et dans la cicatrisation de la peau chez !a souris. L'étude de souris knock-out pour le gène PPARß/5, ont suggérées un rôle possible de ce récepteur dans le cancer. Cependant, des observations opposées ont été publiées suggérant un rôle pro- ou anti- cancer selon le tissue impliqué et le type- cellulaire. En considérant cette controverse autour du rôle de PPARß/5 dans le développement des cancers de la peau, le but principal de mon projet de recherche aura été d'approfondir l'exploration du rôle de PPARß/5 dans la réponse de la peau aux UVs et dans le développement du cancer. Cette dissertation de thèse est divisée en quatre parties. Une première partie, représentant le coeur de mon travail de recherche, décrit la découverte de l'implication des microRNAs (rniRNAs) dans la réponse aux UVs de PPARß/ö et plus spécifiquement l'implication du miRNA miR- 21 -3p (précédemment nommé miR-21*). En étudiant un modèle de souris irradiées de manière aigüe aux UVs, nous montrons que ia régulation de miR-21-3p est PPARß/ö-däpenaante et que cette régulation à lieu par l'intermédiaire du facteur de transcription TGFß-1. Dans des cultures de keratinocytes Humains, la transfecticn d'une séquence oligonucléotidique similaire à celle de miR-21-3p (mimic), montre l'implication de rniR-21-3p dans des fonctions importantes pour le développement des cancers telles que le métabolisme des lipides. Dans un second chapitre, nous montrons que l'usage d'une méthode bioinformatique combinant l'expression des ARN messagers et des miRNAs permet de mettre en évidence des fonctions biologiques importantes lors de ia réponse de PPARß/ö à l'irradiation chronique. L'angiogenèse, le stress oxydatif et le métabolisme des lipides font partie de ces fonctions régulées par PPARß/5 dans la peau irradiée aux UVs. Nous mettons également en évidence la régulation du gène LpcatS par PPARß/5 dans la peau irradiée aux UV ainsi que dans des keratinocytes humains suggérant un rôle pour PPARß/5 dans le remodelage des lipides membranaires. Dans une troisième partie, nous établissons un lien entre la régulation de l'oncogène Src et l'activation de PPARß/5 dans les carcinomes spinocellulaires de la peau. Finalement dans un quatrième chapitre, nous faisons une revue des dernières recherches portées sur le rôle de PPARß/5 et de PPARa dans le foie et ia peau. En résumé ce projet de thèse représente un avancement pour la recherche sur rimplication de PPARß/5 dans la réponse aux UVs de la peau. Pour la première fois, un lien est établi entre ce facteur de transcription et la régulation de microRNAs dans le cadre du carcinome spinocellulare. Jusqu'alors resté dans l'ombre de rniR-21-5p, miR-21-3p est en fait fortement augmenté à la fois dans un modèle de souris d'irradiation aux UVs ainsi que dans ie carcinome spinocellulare chez i'humain. De nouvelles fonctions biologiques pour PPARß/5 ont été également mises en évidence dans ce travail, comme la régulation de l'angiogenèse ou du métabolisme des lipides dans Sa peau. De plus cette dissertation valorise l'intérêt d'une association entre le travail de laboratoire et celui de la bioinformatique.
Resumo:
BACKGROUND: The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations. RESULTS: Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females. CONCLUSION: ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.
Resumo:
Motivation: Genome-wide association studies have become widely used tools to study effects of genetic variants on complex diseases. While it is of great interest to extend existing analysis methods by considering interaction effects between pairs of loci, the large number of possible tests presents a significant computational challenge. The number of computations is further multiplied in the study of gene expression quantitative trait mapping, in which tests are performed for thousands of gene phenotypes simultaneously. Results: We present FastEpistasis, an efficient parallel solution extending the PLINK epistasis module, designed to test for epistasis effects when analyzing continuous phenotypes. Our results show that the algorithm scales with the number of processors and offers a reduction in computation time when several phenotypes are analyzed simultaneously. FastEpistasis is capable of testing the association of a continuous trait with all single nucleotide polymorphism ( SNP) pairs from 500 000 SNPs, totaling 125 billion tests, in a population of 5000 individuals in 29, 4 or 0.5 days using 8, 64 or 512 processors.
Resumo:
BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
Resumo:
BackgroundRecently, regulatory T (Treg) cells have gained interest in the fields of immunopathology, transplantation and oncoimmunology. Here, we investigated the microRNA expression profile of human natural CD8+CD25+ Treg cells and the impact of microRNAs on molecules associated with immune regulation.MethodsWe purified human natural CD8+ Treg cells and assessed the expression of FOXP3 and CTLA-4 by flow cytometry. We have also tested the ex vivo suppressive capacity of these cells in mixed leukocyte reactions. Using TaqMan low-density arrays and microRNA qPCR for validation, we could identify a microRNA `signature¿ for CD8+CD25+FOXP3+CTLA-4+ natural Treg cells. We used the `TargetScan¿ and `miRBase¿ bioinformatics programs to identify potential target sites for these microRNAs in the 3¿-UTR of important Treg cell-associated genes.ResultsThe human CD8+CD25+ natural Treg cell microRNA signature includes 10 differentially expressed microRNAs. We demonstrated an impact of this signature on Treg cell biology by showing specific regulation of FOXP3, CTLA-4 and GARP gene expression by microRNA using site-directed mutagenesis and a dual-luciferase reporter assay. Furthermore, we used microRNA transduction experiments to demonstrate that these microRNAs impacted their target genes in human primary Treg cells ex vivo.ConclusionsWe are examining the biological relevance of this `signature¿ by studying its impact on other important Treg cell-associated genes. These efforts could result in a better understanding of the regulation of Treg cell function and might reveal new targets for immunotherapy in immune disorders and cancer.
Resumo:
A haplotype is an m-long binary vector. The XOR-genotype of two haplotypes is the m-vector of their coordinate-wise XOR. We study the following problem: Given a set of XOR-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes can be mapped onto a perfect phylogeny (PP) tree. The question is motivated by studying population evolution in human genetics and is a variant of the PP haplotyping problem that has received intensive attention recently. Unlike the latter problem, in which the input is '' full '' genotypes, here, we assume less informative input and so may be more economical to obtain experimentally. Building on ideas of Gusfield, we show how to solve the problem in polynomial time by a reduction to the graph realization problem. The actual haplotypes are not uniquely determined by the tree they map onto and the tree itself may or may not be unique. We show that tree uniqueness implies uniquely determined haplotypes, up to inherent degrees of freedom, and give a sufficient condition for the uniqueness. To actually determine the haplotypes given the tree, additional information is necessary. We show that two or three full genotypes suffice to reconstruct all the haplotypes and present a linear algorithm for identifying those genotypes.
Resumo:
Long synthetic peptides (LSPs) have a variety of important clinical uses as synthetic vaccines and drugs. Techniques for peptide synthesis were revolutionized in the 1960s and 1980s, after which efficient techniques for purification and characterization of the product were developed. These improved techniques allowed the stepwise synthesis of increasingly longer products at a faster rate, greater purity, and lower cost for clinical use. A synthetic peptide approach, coupled with bioinformatics analysis of genomes, can tremendously expand the search for clinically relevant products. In this Review, we discuss efforts to develop a malaria vaccine from LSPs, among other clinically directed work.
Resumo:
Copy number variation (CNV) has recently gained considerable interest as a source of genetic variation likely to play a role in phenotypic diversity and evolution. Much effort has been put into the identification and mapping of regions that vary in copy number among seemingly normal individuals in humans and a number of model organisms, using bioinformatics or hybridization-based methods. These have allowed uncovering associations between copy number changes and complex diseases in whole-genome association studies, as well as identify new genomic disorders. At the genome-wide scale, however, the functional impact of CNV remains poorly studied. Here we review the current catalogs of CNVs, their association with diseases and how they link genotype and phenotype. We describe initial evidence which revealed that genes in CNV regions are expressed at lower and more variable levels than genes mapping elsewhere, and also that CNV not only affects the expression of genes varying in copy number, but also have a global influence on the transcriptome. Further studies are warranted for complete cataloguing and fine mapping of CNVs, as well as to elucidate the different mechanisms by which they influence gene expression.
Resumo:
The recognition that colorectal cancer (CRC) is a heterogeneous disease in terms of clinical behaviour and response to therapy translates into an urgent need for robust molecular disease subclassifiers that can explain this heterogeneity beyond current parameters (MSI, KRAS, BRAF). Attempts to fill this gap are emerging. The Cancer Genome Atlas (TGCA) reported two main CRC groups, based on the incidence and spectrum of mutated genes, and another paper reported an EMT expression signature defined subgroup. We performed a prior free analysis of CRC heterogeneity on 1113 CRC gene expression profiles and confronted our findings to established molecular determinants and clinical, histopathological and survival data. Unsupervised clustering based on gene modules allowed us to distinguish at least five different gene expression CRC subtypes, which we call surface crypt-like, lower crypt-like, CIMP-H-like, mesenchymal and mixed. A gene set enrichment analysis combined with literature search of gene module members identified distinct biological motifs in different subtypes. The subtypes, which were not derived based on outcome, nonetheless showed differences in prognosis. Known gene copy number variations and mutations in key cancer-associated genes differed between subtypes, but the subtypes provided molecular information beyond that contained in these variables. Morphological features significantly differed between subtypes. The objective existence of the subtypes and their clinical and molecular characteristics were validated in an independent set of 720 CRC expression profiles. Our subtypes provide a novel perspective on the heterogeneity of CRC. The proposed subtypes should be further explored retrospectively on existing clinical trial datasets and, when sufficiently robust, be prospectively assessed for clinical relevance in terms of prognosis and treatment response predictive capacity. Original microarray data were uploaded to the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) under Accession Nos E-MTAB-990 and E-MTAB-1026. © 2013 Swiss Institute of Bioinformatics. Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.