931 resultados para 060102 Bioinformatics
Resumo:
Background Diet plays a role on the development of the immune system, and polyunsaturated fatty acids can modulate the expression of a variety of genes. Human milk contains conjugated linoleic acid (CLA), a fatty acid that seems to contribute to immune development. Indeed, recent studies carried out in our group in suckling animals have shown that the immune function is enhanced after feeding them with an 80:20 isomer mix composed of c9,t11 and t10,c12 CLA. However, little work has been done on the effects of CLA on gene expression, and even less regarding immune system development in early life. Results The expression profile of mesenteric lymph nodes from animals supplemented with CLA during gestation and suckling through dam's milk (Group A) or by oral gavage (Group B), supplemented just during suckling (Group C) and control animals (Group D) was determined with the aid of the specific GeneChip® Rat Genome 230 2.0 (Affymettrix). Bioinformatics analyses were performed using the GeneSpring GX software package v10.0.2 and lead to the identification of 89 genes differentially expressed in all three dietary approaches. Generation of a biological association network evidenced several genes, such as connective tissue growth factor (Ctgf), tissue inhibitor of metalloproteinase 1 (Timp1), galanin (Gal), synaptotagmin 1 (Syt1), growth factor receptor bound protein 2 (Grb2), actin gamma 2 (Actg2) and smooth muscle alpha actin (Acta2), as highly interconnected nodes of the resulting network. Gene underexpression was confirmed by Real-Time RT-PCR. Conclusions Ctgf, Timp1, Gal and Syt1, among others, are genes modulated by CLA supplementation that may have a role on mucosal immune responses in early life.
Resumo:
MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.
Resumo:
Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.
Resumo:
BACKGROUND: Today, recognition and classification of sequence motifs and protein folds is a mature field, thanks to the availability of numerous comprehensive and easy to use software packages and web-based services. Recognition of structural motifs, by comparison, is less well developed and much less frequently used, possibly due to a lack of easily accessible and easy to use software. RESULTS: In this paper, we describe an extension of DeepView/Swiss-PdbViewer through which structural motifs may be defined and searched for in large protein structure databases, and we show that common structural motifs involved in stabilizing protein folds are present in evolutionarily and structurally unrelated proteins, also in deeply buried locations which are not obviously related to protein function. CONCLUSIONS: The possibility to define custom motifs and search for their occurrence in other proteins permits the identification of recurrent arrangements of residues that could have structural implications. The possibility to do so without having to maintain a complex software/hardware installation on site brings this technology to experts and non-experts alike.
Resumo:
MicroRNAs (miRNAs) are short non-coding RNA molecules playing regulatory roles by repressing translation or cleaving RNA transcripts. Although the number of verified human miRNA is still expanding, only few have been functionally described. However, emerging evidences suggest the potential involvement of altered regulation of miRNA in pathogenesis of cancers and these genes are thought to function as both tumours suppressor and oncogenes. In our study, we examined by Real-Time PCR the expression of 156 mature miRNA in colorectal cancer. The analysis by several bioinformatics algorithms of colorectal tumours and adjacent non-neoplastic tissues from patients and colorectal cancer cell lines allowed identifying a group of 13 miRNA whose expression is significantly altered in this tumor. The most significantly deregulated miRNA being miR-31, miR-96, miR-133b, miR-135b, miR-145, and miR-183. In addition, the expression level of miR-31 was correlated with the stage of CRC tumor. Our results suggest that miRNA expression profile could have relevance to the biological and clinical behavior of colorectal neoplasia.
Resumo:
During my PhD, my aim was to provide new tools to increase our capacity to analyse gene expression patterns, and to study on a large-scale basis the evolution of gene expression in animals. Gene expression patterns (when and where a gene is expressed) are a key feature in understanding gene function, notably in development. It appears clear now that the evolution of developmental processes and of phenotypes is shaped both by evolution at the coding sequence level, and at the gene expression level.Studying gene expression evolution in animals, with complex expression patterns over tissues and developmental time, is still challenging. No tools are available to routinely compare expression patterns between different species, with precision, and on a large-scale basis. Studies on gene expression evolution are therefore performed only on small genes datasets, or using imprecise descriptions of expression patterns.The aim of my PhD was thus to develop and use novel bioinformatics resources, to study the evolution of gene expression. To this end, I developed the database Bgee (Base for Gene Expression Evolution). The approach of Bgee is to transform heterogeneous expression data (ESTs, microarrays, and in-situ hybridizations) into present/absent calls, and to annotate them to standard representations of anatomy and development of different species (anatomical ontologies). An extensive mapping between anatomies of species is then developed based on hypothesis of homology. These precise annotations to anatomies, and this extensive mapping between species, are the major assets of Bgee, and have required the involvement of many co-workers over the years. My main personal contribution is the development and the management of both the Bgee database and the web-application.Bgee is now on its ninth release, and includes an important gene expression dataset for 5 species (human, mouse, drosophila, zebrafish, Xenopus), with the most data from mouse, human and zebrafish. Using these three species, I have conducted an analysis of gene expression evolution after duplication in vertebrates.Gene duplication is thought to be a major source of novelty in evolution, and to participate to speciation. It has been suggested that the evolution of gene expression patterns might participate in the retention of duplicate genes. I performed a large-scale comparison of expression patterns of hundreds of duplicated genes to their singleton ortholog in an outgroup, including both small and large-scale duplicates, in three vertebrate species (human, mouse and zebrafish), and using highly accurate descriptions of expression patterns. My results showed unexpectedly high rates of de novo acquisition of expression domains after duplication (neofunctionalization), at least as high or higher than rates of partitioning of expression domains (subfunctionalization). I found differences in the evolution of expression of small- and large-scale duplicates, with small-scale duplicates more prone to neofunctionalization. Duplicates with neofunctionalization seemed to evolve under more relaxed selective pressure on the coding sequence. Finally, even with abundant and precise expression data, the majority fate I recovered was neither neo- nor subfunctionalization of expression domains, suggesting a major role for other mechanisms in duplicate gene retention.
Resumo:
ABSTRACT: BACKGROUND: It is accepted that a woman's lifetime risk of developing breast cancer after menopause is reduced by early full term pregnancy and multiparity. This phenomenon is thought to be associated with the development and differentiation of the breast during pregnancy. METHODS: In order to understand the underlying molecular mechanisms of pregnancy induced breast cancer protection, we profiled and compared the transcriptomes of normal breast tissue biopsies from 71 parous (P) and 42 nulliparous (NP) healthy postmenopausal women using Affymetrix Human Genome U133 Plus 2.0 arrays. To validate the results, we performed real time PCR and immunohistochemistry. RESULTS: We identified 305 differentially expressed probesets (208 distinct genes). Of these, 267 probesets were up- and 38 down-regulated in parous breast samples; bioinformatics analysis using gene ontology enrichment revealed that up-regulated genes in the parous breast represented biological processes involving differentiation and development, anchoring of epithelial cells to the basement membrane, hemidesmosome and cell-substrate junction assembly, mRNA and RNA metabolic processes and RNA splicing machinery. The down-regulated genes represented biological processes that comprised cell proliferation, regulation of IGF-like growth factor receptor signaling, somatic stem cell maintenance, muscle cell differentiation and apoptosis. CONCLUSIONS: This study suggests that the differentiation of the breast imprints a genomic signature that is centered in the mRNA processing reactome. These findings indicate that pregnancy may induce a safeguard mechanism at post-transcriptional level that maintains the fidelity of the transcriptional process.
Resumo:
Centrifuge is a user-friendly system to simultaneously access Arabidopsis gene annotations and intra- and inter-organism sequence comparison data. The tool allows rapid retrieval of user-selected data for each annotated Arabidopsis gene providing, in any combination, data on the following features: predicted protein properties such as mass, pI, cellular location and transmembrane domains; SWISS-PROT annotations; Interpro domains; Gene Ontology records; verified transcription; BLAST matches to the proteomes of A.thaliana, Oryza sativa (rice), Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. The tool lends itself particularly well to the rapid analysis of contigs or of tens or hundreds of genes identified by high-throughput gene expression experiments. In these cases, a summary table of principal predicted protein features for all genes is given followed by more detailed reports for each individual gene. Centrifuge can also be used for single gene analysis or in a word search mode. AVAILABILITY: http://centrifuge.unil.ch/ CONTACT: edward.farmer@unil.ch.
Resumo:
One major methodological problem in analysis of sequence data is the determination of costs from which distances between sequences are derived. Although this problem is currently not optimally dealt with in the social sciences, it has some similarity with problems that have been solved in bioinformatics for three decades. In this article, the authors propose an optimization of substitution and deletion/insertion costs based on computational methods. The authors provide an empirical way of determining costs for cases, frequent in the social sciences, in which theory does not clearly promote one cost scheme over another. Using three distinct data sets, the authors tested the distances and cluster solutions produced by the new cost scheme in comparison with solutions based on cost schemes associated with other research strategies. The proposed method performs well compared with other cost-setting strategies, while it alleviates the justification problem of cost schemes.
Resumo:
Proteins located on the surface of the pathogenic malaria parasite Plasmodium falciparum are objects of intensive studies due to their important role in the invasion of human cells and the accessibility to host antibodies thus making these proteins attractive vaccine candidates. One of these proteins, merozoite surface protein 3 (MSP3) represents a leading component among vaccine candidates; however, little is known about its structure and function. Our biophysical studies suggest that the 40 residue C-terminal domain of MSP3 protein self-assembles into a four-stranded alpha-helical coiled coil structure where alpha-helices are packed "side-by-side". A bioinformatics analysis provides an extended list of known and putative proteins from different species of Plasmodium which have such MSP3-like C-terminal domains. This finding allowed us to extend some conclusions of our studies to a larger group of the malaria surface proteins. Possible structural and functional roles of these highly conserved oligomerization domains in the intact merozoite surface proteins are discussed.
Resumo:
Els avenços en les bases dels mètodes teòrics i l'espectacular desenvolupament de la potència de càlcul han fet possible progressar enormement en el somni dels fundadors de la química, és a dir, ser capaços d'estudiar amb mètodes computacionals el conjunt de processos químics. Actualment, la química teòrica està completant el darrer avenç: intentar esdevenir l'eina més recent per a comprendre la naturalesa química dels éssers vius. Aquesta revisió pretén mostrar com els mètodes de la química teòrica, originalment desenvolupats per a examinar molècules petites en fase gas, han evolucionat per a assolir la complexa descripció de sistemes biològics.
Resumo:
EMBnet is a consortium of collaborating bioinformatics groups located mainly within Europe (http://www.embnet.org). Each member country is represented by a 'node', a group responsible for the maintenance of local services for their users (e.g. education, training, software, database distribution, technical support, helpdesk). Among these services a web portal with links and access to locally developed and maintained software is essential and different for each node. Our web portal targets biomedical scientists in Switzerland and elsewhere, offering them access to a collection of important sequence analysis tools mirrored from other sites or developed locally. We describe here the Swiss EMBnet node web site (http://www.ch.embnet.org), which presents a number of original services not available anywhere else.
Resumo:
Previous studies in the lab of Dr. Liliane Michalik, have shown thai the nuclear hormone receptor Peroxisome Proliferator Activated Receptor beta/delta (PPARß/ö) is an important regulator of skin homeostasis, being involved in the regulation of keratinocyte differentiation, inflammation, apoptosis, arid mouse skin wound healing. Studies of PPARß/ö knock out mice have suggested a possible role for this receptor in cancer. However, contradictory observations of the role for PPARß/ö on tumor growth have been published, depending on cellular contexts and biological models. Given the controversial role of PPARß/ö in skin carcinoma development, the main aim of this PhD work has been to further explore the implication of PPARß/ö in skin response to UV and skin tumor growth. This PhD dissertation is divided in four chapters. The first chapter describes the core part of the project, where I explored the changes in miRNA expression in the skin upon chronic UV irradiation of PPARß/ö wild type and knock-out mice. This analysis shed light on a miRNA- PPARß/ö signature and also predicted thai miR-21-3p (previously named miR-21*) is a key regulator of the PPARß/ö-dependent UV response in the pre-lesiona! skin. Using mice acutely UV-irradiated, ! further demonstrated that miR-21-3p is indirectly regulated by PPARß/ö through activation of Transforming Growth Factor (TGFß)-1 under UV exposure. I also show that miR-21-3p is deregulated in human cutaneous squamous celi carcinoma. In cultured keratinocytes, application of a miR-21 -3p mimic oligonucleotide sequence leads to the regulation of lipid metabolism-related pathway. In the second chapter, I demonstrate that the usage of an mRNA/miRNA combined bioinformatics analysis leads to the discovery of important pathways involved in the PPARß/ö-miRNA response of the skin to chronic UV irradiation, indeed, I validated angiogenesis and lipid metabolism as important functions regulated by PPARß/ö in this context. In the third chapter, we demonstrate that PPARß/5 knockout mice have decreased cutaneous squamous cell carcinomas incidence compared to wild type mice and that PPARß/5 directly activates the cSrc kinase gene. In the last chapter, we review novel insights into PPAR functions in keratinocytes and liver, with emphasis on PPARß/ö but also on PPARa. In summary, this PhD study shows that i) PPARß/5 is able to regulate biological function through regulation of miRNAs, and specifically through miR-21-3p, the passenger miRNA of the oncomiR miR-21, and that ii) the PPARß/5-dependent skin response to UV involves the regulation of angiogenesis and lipid metabolism. Furthermore, the bioinformatics study highlights the relevance of performing integrated mRNA and miRNA genome-wide studies in order to better screen mRNAs and/or miRNAs of interest in the biological context of diseases. - Des études préalables dans le laboratoire du Dr. Liliane Michalik ont démontré que le récepteur nucléaire PPARß/5 est un régulateur important de l'homéostasie de la peau, étant impliqué dans la régulation de la différenciation des keratinocytes, dans l'inflammation, dans l'apoptose et dans la cicatrisation de la peau chez !a souris. L'étude de souris knock-out pour le gène PPARß/5, ont suggérées un rôle possible de ce récepteur dans le cancer. Cependant, des observations opposées ont été publiées suggérant un rôle pro- ou anti- cancer selon le tissue impliqué et le type- cellulaire. En considérant cette controverse autour du rôle de PPARß/5 dans le développement des cancers de la peau, le but principal de mon projet de recherche aura été d'approfondir l'exploration du rôle de PPARß/5 dans la réponse de la peau aux UVs et dans le développement du cancer. Cette dissertation de thèse est divisée en quatre parties. Une première partie, représentant le coeur de mon travail de recherche, décrit la découverte de l'implication des microRNAs (rniRNAs) dans la réponse aux UVs de PPARß/ö et plus spécifiquement l'implication du miRNA miR- 21 -3p (précédemment nommé miR-21*). En étudiant un modèle de souris irradiées de manière aigüe aux UVs, nous montrons que ia régulation de miR-21-3p est PPARß/ö-däpenaante et que cette régulation à lieu par l'intermédiaire du facteur de transcription TGFß-1. Dans des cultures de keratinocytes Humains, la transfecticn d'une séquence oligonucléotidique similaire à celle de miR-21-3p (mimic), montre l'implication de rniR-21-3p dans des fonctions importantes pour le développement des cancers telles que le métabolisme des lipides. Dans un second chapitre, nous montrons que l'usage d'une méthode bioinformatique combinant l'expression des ARN messagers et des miRNAs permet de mettre en évidence des fonctions biologiques importantes lors de ia réponse de PPARß/ö à l'irradiation chronique. L'angiogenèse, le stress oxydatif et le métabolisme des lipides font partie de ces fonctions régulées par PPARß/5 dans la peau irradiée aux UVs. Nous mettons également en évidence la régulation du gène LpcatS par PPARß/5 dans la peau irradiée aux UV ainsi que dans des keratinocytes humains suggérant un rôle pour PPARß/5 dans le remodelage des lipides membranaires. Dans une troisième partie, nous établissons un lien entre la régulation de l'oncogène Src et l'activation de PPARß/5 dans les carcinomes spinocellulaires de la peau. Finalement dans un quatrième chapitre, nous faisons une revue des dernières recherches portées sur le rôle de PPARß/5 et de PPARa dans le foie et ia peau. En résumé ce projet de thèse représente un avancement pour la recherche sur rimplication de PPARß/5 dans la réponse aux UVs de la peau. Pour la première fois, un lien est établi entre ce facteur de transcription et la régulation de microRNAs dans le cadre du carcinome spinocellulare. Jusqu'alors resté dans l'ombre de rniR-21-5p, miR-21-3p est en fait fortement augmenté à la fois dans un modèle de souris d'irradiation aux UVs ainsi que dans ie carcinome spinocellulare chez i'humain. De nouvelles fonctions biologiques pour PPARß/5 ont été également mises en évidence dans ce travail, comme la régulation de l'angiogenèse ou du métabolisme des lipides dans Sa peau. De plus cette dissertation valorise l'intérêt d'une association entre le travail de laboratoire et celui de la bioinformatique.
Resumo:
BACKGROUND: The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations. RESULTS: Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females. CONCLUSION: ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.
Resumo:
Motivation: Genome-wide association studies have become widely used tools to study effects of genetic variants on complex diseases. While it is of great interest to extend existing analysis methods by considering interaction effects between pairs of loci, the large number of possible tests presents a significant computational challenge. The number of computations is further multiplied in the study of gene expression quantitative trait mapping, in which tests are performed for thousands of gene phenotypes simultaneously. Results: We present FastEpistasis, an efficient parallel solution extending the PLINK epistasis module, designed to test for epistasis effects when analyzing continuous phenotypes. Our results show that the algorithm scales with the number of processors and offers a reduction in computation time when several phenotypes are analyzed simultaneously. FastEpistasis is capable of testing the association of a continuous trait with all single nucleotide polymorphism ( SNP) pairs from 500 000 SNPs, totaling 125 billion tests, in a population of 5000 individuals in 29, 4 or 0.5 days using 8, 64 or 512 processors.