124 resultados para bioinformatics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

MOTIVATION: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-seq data. RESULTS: We conducted an extensive comparison of eleven methods for differential expression analysis of RNA-seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. We evaluate the methods based on both simulated data and real RNA-seq data. CONCLUSIONS: Very small sample sizes, which are still common in RNA-seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SUMMARY: We present a tool designed for visualization of large-scale genetic and genomic data exemplified by results from genome-wide association studies. This software provides an integrated framework to facilitate the interpretation of SNP association studies in genomic context. Gene annotations can be retrieved from Ensembl, linkage disequilibrium data downloaded from HapMap and custom data imported in BED or WIG format. AssociationViewer integrates functionalities that enable the aggregation or intersection of data tracks. It implements an efficient cache system and allows the display of several, very large-scale genomic datasets. AVAILABILITY: The Java code for AssociationViewer is distributed under the GNU General Public Licence and has been tested on Microsoft Windows XP, MacOSX and GNU/Linux operating systems. It is available from the SourceForge repository. This also includes Java webstart, documentation and example datafiles.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Parasites of the Leishmania Viannia subgenus are major causative agents of mucocutaneous leishmaniasis (MCL), a disease characterised by parasite dissemination (metastasis) from the original cutaneous lesion to form debilitating secondary lesions in the nasopharyngeal mucosa. We employed a protein profiling approach to identify potential metastasis factors in laboratory clones of L. (V.) guyanensis with stable phenotypes ranging from highly metastatic (M+) through infrequently metastatic (M+/M-) to non-metastatic (M-). Comparison of the soluble proteomes of promastigotes by two-dimensional electrophoresis revealed two abundant protein spots specifically associated with M+ and M+/M- clones (Met2 and Met3) and two others exclusively expressed in M- parasites (Met1 and Met4). The association between clinical disease phenotype and differential expression of Met1-Met4 was less clear in L. Viannia strains from mucosal (M+) or cutaneous (M-) lesions of patients. Identification of Met1-Met4 by biological mass spectrometry (LC-ES-MS/MS) and bioinformatics revealed that M+ and M- clones express distinct acidic and neutral isoforms of both elongation factor-1 subunit beta (EF-1beta) and cytosolic tryparedoxin peroxidase (TXNPx). This interchange of isoforms may relate to the mechanisms by which the activities of EF-1beta and TXNPx are modulated, and/or differential post-translational modification of the gene product(s). The multiple metabolic functions of EF-1 and TXNPx support the plausibility of their participation in parasite survival and persistence and thereby, metastatic disease. Both polypeptides are active in resistance to chemical and oxidant stress, providing a basis for further elucidation of the importance of antioxidant defence in the pathogenesis underlying MCL.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

To identify malaria antigens for vaccine development, we selected alpha-helical coiled coil domains of proteins predicted to be present in the parasite erythrocytic stage. The corresponding synthetic peptides are expected to mimic structurally "native" epitopes. Indeed the 95 chemically synthesized peptides were all specifically recognized by human immune sera, though at various prevalence. Peptide specific antibodies were obtained both by affinity-purification from malaria immune sera and by immunization of mice. These antibodies did not show significant cross reactions, i.e., they were specific for the original peptide, reacted with native parasite proteins in infected erythrocytes and several were active in inhibiting in vitro parasite growth. Circular dichroism studies indicated that the selected peptides assumed partial or high alpha-helical content. Thus, we demonstrate that the bioinformatics/chemical synthesis approach described here can lead to the rapid identification of molecules which target biologically active antibodies, thus identifying suitable vaccine candidates. This strategy can be, in principle, extended to vaccine discovery in a wide range of other pathogens.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Persistent infection induces an adaptive immune response that is mediated by T and B lymphocytes. Upon triggering with an antigen, these cells become activated and turn into fast expanding cells able to efficiently defend the host. Lymphocyte activation is controlled by a complex composed of CARMA1, BCL10 and MALT1 which regulates the NF-KB signaling pathway upon antigen triggering. Abnormally high expression or activity of either one of these three proteins can favor the development of lymphomas, while genetic defects in the pathway are associated with immunodeficiency. MALT1 was identified as a paracaspase sharing homology with other cysteine proteases, namely caspases and metacaspases. In order to be active, caspases need to dimerize. Based on their sequence similarity with MALT1, we hypothesized that dimerization might also be a mechanism of activation employed by MALT1. To address this assumption, we performed a bioinformatics modelling based on the crystal structures of several caspases. Our model suggested that the MALT1 caspase-like domain can indeed form dimers. This finding was later confirmed by several published crystal structures of MALT1. In the dimer interface of our model, we noticed the presence of charged amino acids that could potentially form salt bridges and thereby hold both monomers together. Mutation of one of these residues, E549, into alanine completely blocked the catalytic activity of MALT1. Additionally, we provided evidence for a role of E549 in promoting the MALTl-dependent growth of cells derived from diffuse large B cell lymphoma (DLBCL) of the aggressive B cell-like type (ABC). To our initial surprise, the E549A mutation showed only a partial defect in dimerization, indicating that additional residues are essential to form a stable dimer. The MALT1 crystal structures revealed a key function for E549 in stabilizing the catalytic site of the protease via its interaction with an arginine which is located next to the catalytic active cysteine. In an additional study, we discovered that MALT1 monoubiquitination is required for the catalytic activity of the protease. Interestingly, we found that the MALT1 dimer interface mutant E549A could not be monoubiquitinated. Based on these findings, we suggest that correct formation of the dimer interface is a prerequisite for monoubiquitination. In a second project, we discovered a novel target of the protease MALT1, the ribonuclease Regnase¬la It was described that the RNase activity of Regnase-1 negatively regulates immune responses. We could show that in ABC DLBCL cell lines, Regnase-1 is not only cleaved by MALT1 but also phosphorylated, at least in part, by the inhibitor of KB kinase (IKK). Both regulations appear to restrain the RNase function of Regnase-1 and thereby allow the production of pro-survival proteins. In conclusion, our studies further highlight and explain the importance of the catalytic activity of MALT1 for the activation of lymphocytes and provide additional knowledge for the development of specific drugs targeting the catalytic activity of MALT1 for immunomodulation and treatment of lymphomas.  SUMMARY IN FRENCH PhD Thesis Katrin Cabalzar 2 SUMMARY IN FRENCH Une infection persistante induit une réponse immunitaire adaptative par l'intermédiaire des lymphocytes T et B. Quand elles reconnaissent l'antigène, ces cellules sont activées et se multiplient très rapidement pour défendre efficacement l'hôte. L'activation des lymphocytes est transmise par un complexe composé de trois protéines, CARMA1, BCL10 et MALT1, qui régule la voie de signalisation NF-KB lorsque l'antigène est reconnu. L'expression ou l'activité anormalement élevée de l'une de ces trois protéines peut favoriser le développement de lymphomes, tandis que des défauts génétiques de cette voie de signalisation sont associés à l'immunodéficience. MALT1 a été identifiée comme étant une paracaspase qui partage des séquences homologues avec d'autres protéases à cystéine, comme les caspases et les métacaspases. Pour être actives, les caspases ont besoin de dimériser. Etant donné leur similarité de séquence avec MALT1, nous avons supposé que la dimérisation pouvait aussi être un mécanisme d'activation utilisé par MALT1. Pour vérifier cette hypothèse, nous avons conçu un modèle bioinformatique à partir des structures cristallographiques de plusieurs caspases. Et notre modèle a suggéré que le domaine catalytique de MALT1 était effectivement capable de former des dimères. Cette découverte a été confirmée plus tard par des publications qui montrent des structures cristallographiques dimériques de MALT1. Dans l'interface du dimère de notre modèle, nous avons remarqué la présence d'acides aminés chargés qui pouvaient former des liaisons ioniques et ainsi réunir les deux monomères. La mutation de l'un de ces résidus, E549, pour une alanine, a complètement inhibé l'activité catalytique de MALT1. De plus, nous avons mis en évidence un rôle d'E549 dans la croissance dépendante de MALT1, des cellules dérivées de lymphomes B diffus à grandes cellules (DLBCL) de sous-type cellules B actives (ABC). Dans un premier temps nous avons été surpris de constater que cette mutation révélait seulement un défaut partiel de dimérisation, ce qui indique que des acides aminés supplémentaires sont indispensables pour former un dimère stable. Les structures cristallographiques de MALT1 ont révélé un rôle primordial d'E549 dans la stabilisation du site catalytique de la protéase via son interaction avec une arginine qui se trouve à côté de la cystéine du site actif. Dans une autre étude, nous avons découvert que la monoubiquitination de MALT1 est requise pour l'activité catalytique de la protéase. A remarquer que nous avons trouvé que le mutant E549A de l'interface dimère de MALT1 n'a pas pu être monoubiquitiné. Sur la base de ces résultats, nous suggérons que la formation correcte de l'interface du dimère est une condition préalable pour la monoubiquitination. Dans un second projet, nous avons découvert une nouvelle cible de la protéase MALT1, la ribonucléase Regnase-1. Il a été décrit que l'activité RNase de Regnase-1 régulait négativement les réponses immunitaires. Nous avons pu montrer que dans les lignées cellulaires ABC DLBCL, la Regnase-1 n'était pas seulement clivée par MALT1 mais également phosphorylée, au moins en partie, par la kinase de l'inhibiteur de KB (IKK). Les deux régulations semblent supprimer la fonction RNase de Regnase-1 et permettre ainsi la stabilisation de certains ARN messagers et la production de protéines favorisant la survie. En conclusion, nos études mettent en évidence le rôle-clé de la dimérisation de MALT1 et expliquent l'importance de l'activité catalytique de MALT1 pour l'activation des lymphocytes. Ainsi, nos résultats apportent des connaissances supplémentaires pour le développement de médicaments spécifiques ciblant l'activité catalytique de MALT1, qui pourraient être utiles pour modifier les réponses immunitaires et traiter des lymphomes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cancer omics data are exponentially created and associated with clinical variables, and important findings can be extracted based on bioinformatics approaches which can then be experimentally validated. Many of these findings are related to a specific class of non-coding RNA molecules called microRNAs (miRNAs) (post-transcriptional regulators of mRNA expression). The related research field is quite heterogeneous and bioinformaticians, clinicians, statisticians and biologists, as well as data miners and engineers collaborate to cure stored data and on new impulses coming from the output of the latest Next Generation Sequencing technologies. Here we review the main research findings on miRNA of the first 10 years in colon cancer research with an emphasis on possible uses in clinical practice. This review intends to provide a road map in the jungle of publications of miRNA in colorectal cancer, focusing on data availability and new ways to generate biologically relevant information out of these huge amounts of data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SUMMARY: ExpressionView is an R package that provides an interactive graphical environment to explore transcription modules identified in gene expression data. A sophisticated ordering algorithm is used to present the modules with the expression in a visually appealing layout that provides an intuitive summary of the results. From this overview, the user can select individual modules and access biologically relevant metadata associated with them. AVAILABILITY: http://www.unil.ch/cbg/ExpressionView. Screenshots, tutorials and sample data sets can be found on the ExpressionView web site.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Integrated approaches using different in vitro methods in combination with bioinformatics can (i) increase the success rate and speed of drug development; (ii) improve the accuracy of toxicological risk assessment; and (iii) increase our understanding of disease. Three-dimensional (3D) cell culture models are important building blocks of this strategy which has emerged during the last years. The majority of these models are organotypic, i.e., they aim to reproduce major functions of an organ or organ system. This implies in many cases that more than one cell type forms the 3D structure, and often matrix elements play an important role. This review summarizes the state of the art concerning commonalities of the different models. For instance, the theory of mass transport/metabolite exchange in 3D systems and the special analytical requirements for test endpoints in organotypic cultures are discussed in detail. In the next part, 3D model systems for selected organs--liver, lung, skin, brain--are presented and characterized in dedicated chapters. Also, 3D approaches to the modeling of tumors are presented and discussed. All chapters give a historical background, illustrate the large variety of approaches, and highlight up- and downsides as well as specific requirements. Moreover, they refer to the application in disease modeling, drug discovery and safety assessment. Finally, consensus recommendations indicate a roadmap for the successful implementation of 3D models in routine screening. It is expected that the use of such models will accelerate progress by reducing error rates and wrong predictions from compound testing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A haplotype is an m-long binary vector. The XOR-genotype of two haplotypes is the m-vector of their coordinate-wise XOR. We study the following problem: Given a set of XOR-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes can be mapped onto a perfect phylogeny (PP) tree. The question is motivated by studying population evolution in human genetics, and is a variant of the perfect phylogeny haplotyping problem that has received intensive attention recently. Unlike the latter problem, in which the input is "full" genotypes, here we assume less informative input, and so may be more economical to obtain experimentally. Building on ideas of Gusfield, we show how to solve the problem in polynomial time, by a reduction to the graph realization problem. The actual haplotypes are not uniquely determined by that tree they map onto, and the tree itself may or may not be unique. We show that tree uniqueness implies uniquely determined haplotypes, up to inherent degrees of freedom, and give a sufficient condition for the uniqueness. To actually determine the haplotypes given the tree, additional information is necessary. We show that two or three full genotypes suffice to reconstruct all the haplotypes, and present a linear algorithm for identifying those genotypes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Evolution of proteins after whole-genome duplicationGene and genome duplication are considered major mechanisms in the creation of newfunctions in genomes, or in the refinement of networks by the division of function amongmore genes. In animals, the best demonstrated whole genome duplication occurred at theorigin of Teleost fishes. This makes fishes an ideal model to study the consequences ofgenome duplication, particularly since we have a good sampling of genome sequences,abundant functional information, and a very well studied outgroup: the tetrapodes (includinghuman). More specifically, I studied the consequences of duplication on proteins usingevolutionary models to infer adaptive events. I analysed the influence of positive selection invertebrate genes, by contrasting singleton genes and duplicated genes. The conclusion of theanalyses was threefold: (i) positive selection affects diverse phylogenetic branches anddiverse gene categories during vertebrate evolution; (ii) it concerns only a small proportion ofsites (1%-5%); and (iii) whole genome duplication had no detectable impact on theprevalence of this positive selection.I also studied evolution at the amino acid level with different methods to detect functionalshifts (covarion process and constant-but-different process). As in my previous research, Ifound similar numbers of functional shifts between duplicates and between orthologs.The accepted framework for studies of molecular evolution is that orthologs share the samefunction, whereas the function of paralogs diverges. This framework gives a special place togene duplication in evolution, as the main mechanism for generating novelty. With myprevious results showing that duplication and speciation are not so different, we investigatedthe literature to question the evidence for similar or divergent evolution of gene function afterduplication relative to speciation genes. This led us to propose a more rigorous design offuture studies of gene duplication.Finally, based on my automated protocol, we built a database of positive selection invertebrates' genes, Selectome. This database is freely available on the web and will helpfuture evolutionary as well as biochemical studies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SUMMARY: A top scoring pair (TSP) classifier consists of a pair of variables whose relative ordering can be used for accurately predicting the class label of a sample. This classification rule has the advantage of being easily interpretable and more robust against technical variations in data, as those due to different microarray platforms. Here we describe a parallel implementation of this classifier which significantly reduces the training time, and a number of extensions, including a multi-class approach, which has the potential of improving the classification performance. AVAILABILITY AND IMPLEMENTATION: Full C++ source code and R package Rgtsp are freely available from http://lausanne.isb-sib.ch/~vpopovic/research/. The implementation relies on existing OpenMP libraries.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Today, recognition and classification of sequence motifs and protein folds is a mature field, thanks to the availability of numerous comprehensive and easy to use software packages and web-based services. Recognition of structural motifs, by comparison, is less well developed and much less frequently used, possibly due to a lack of easily accessible and easy to use software. RESULTS: In this paper, we describe an extension of DeepView/Swiss-PdbViewer through which structural motifs may be defined and searched for in large protein structure databases, and we show that common structural motifs involved in stabilizing protein folds are present in evolutionarily and structurally unrelated proteins, also in deeply buried locations which are not obviously related to protein function. CONCLUSIONS: The possibility to define custom motifs and search for their occurrence in other proteins permits the identification of recurrent arrangements of residues that could have structural implications. The possibility to do so without having to maintain a complex software/hardware installation on site brings this technology to experts and non-experts alike.