931 resultados para Bioinformatics
Resumo:
With a virus such as Human Immunodeficiency Virus (HIV) that has infected millions of people worldwide, and with many unaware that they are infected, it becomes vital to understand how the virus works and how it functions at the molecular level. Because there currently is no vaccine and no way to eradicate the virus from an infected person, any information about how the virus interacts with its host greatly increases the chances of understanding how HIV works and brings scientists one step closer to being able to combat such a destructive virus. Thousands of HIV viruses have been sequenced and are available in many online databases for public use. Attributes that are linked to each sequence include the viral load within the host and how sick the patient is currently. Being able to predict the stage of infection for someone is a valuable resource, as it could potentially aid in treatment options and proper medication use. Our approach of analyzing region-specific amino acid composition for select genes has been able to predict patient disease state up to an accuracy of 85.4%. Moreover, we output a set of classification rules based on the sequence that may prove useful for diagnosing the expected clinical outcome of the infected patient.
Resumo:
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Resumo:
The spatio-temporal control of gene expression is fundamental to elucidate cell proliferation and deregulation phenomena in living systems. Novel approaches based on light-sensitive multiprotein complexes have recently been devised, showing promising perspectives for the noninvasive and reversible modulation of the DNA-transcriptional activity in vivo. This has lately been demonstrated in a striking way through the generation of the artificial protein construct light-oxygen-voltage (LOV)-tryptophan-activated protein (TAP), in which the LOV-2-Jα photoswitch of phototropin1 from Avena sativa (AsLOV2-Jα) has been ligated to the tryptophan-repressor (TrpR) protein from Escherichia coli. Although tremendous progress has been achieved on the generation of such protein constructs, a detailed understanding of their functioning as opto-genetical tools is still in its infancy. Here, we elucidate the early stages of the light-induced regulatory mechanism of LOV-TAP at the molecular level, using the noninvasive molecular dynamics simulation technique. More specifically, we find that Cys450-FMN-adduct formation in the AsLOV2-Jα-binding pocket after photoexcitation induces the cleavage of the peripheral Jα-helix from the LOV core, causing a change of its polarity and electrostatic attraction of the photoswitch onto the DNA surface. This goes along with the flexibilization through unfolding of a hairpin-like helix-loop-helix region interlinking the AsLOV2-Jα- and TrpR-domains, ultimately enabling the condensation of LOV-TAP onto the DNA surface. By contrast, in the dark state the AsLOV2-Jα photoswitch remains inactive and exerts a repulsive electrostatic force on the DNA surface. This leads to a distortion of the hairpin region, which finally relieves its tension by causing the disruption of LOV-TAP from the DNA.
Resumo:
In modern life- and medical-sciences major efforts are currently concentrated on creating artificial photoenzymes, consisting of light- oxygen-voltage-sensitive (LOV) domains fused to a target enzyme. Such protein constructs possess great potential for controlling the cell metabolism as well as gene function upon light stimulus. This has recently been impressively demonstrated by designing a novel artificial fusion protein, connecting the AsLOV2-Jα-photosensor from Avena sativa with the Rac1-GTPase (AsLOV2-Jα-Rac1), and by using it, to control the motility of cancer cells from the HeLa-line. Although tremendous progress has been achieved on the generation of such protein constructs, a detailed understanding of their signaling pathway after photoexcitation is still in its infancy. Here, we show through computer simulations of the AsLOV2-Jα-Rac1-photoenzyme that the early processes after formation of the Cys450-FMN-adduct involve the breakage of a H-bond between the carbonyl oxygen FMN-C4O and the amino group of Gln513, followed by a rotational reorientation of its sidechain. This initial event is followed by successive events including β-sheet tightening and transmission of torsional stress along the Iβ-sheet, which leads to the disruption of the Jα-helix from the N-terminal end. Finally, this process triggers the detachment of the AsLOV2-Jα-photosensor from the Rac1-GTPase, ultimately enabling the activation of Rac1 via binding of the effector protein PAK1.
Resumo:
BACKGROUND: Despite recent algorithmic and conceptual progress, the stoichiometric network analysis of large metabolic models remains a computationally challenging problem. RESULTS: SNA is a interactive, high performance toolbox for analysing the possible steady state behaviour of metabolic networks by computing the generating and elementary vectors of their flux and conversions cones. It also supports analysing the steady states by linear programming. The toolbox is implemented mainly in Mathematica and returns numerically exact results. It is available under an open source license from: http://bioinformatics.org/project/?group_id=546. CONCLUSION: Thanks to its performance and modular design, SNA is demonstrably useful in analysing genome scale metabolic networks. Further, the integration into Mathematica provides a very flexible environment for the subsequent analysis and interpretation of the results.
Resumo:
BACKGROUND: Pneumococcal meningitis is associated with high mortality (approximately 30%) and morbidity. Up to 50% of survivors are affected by neurological sequelae due to a wide spectrum of brain injury mainly affecting the cortex and hippocampus. Despite this significant disease burden, the genetic program that regulates the host response leading to brain damage as a consequence of bacterial meningitis is largely unknown.We used an infant rat model of pneumococcal meningitis to assess gene expression profiles in cortex and hippocampus at 22 and 44 hours after infection and in controls at 22 h after mock-infection with saline. To analyze the biological significance of the data generated by Affymetrix DNA microarrays, a bioinformatics pipeline was used combining (i) a literature-profiling algorithm to cluster genes based on the vocabulary of abstracts indexed in MEDLINE (NCBI) and (ii) the self-organizing map (SOM), a clustering technique based on covariance in gene expression kinetics. RESULTS: Among 598 genes differentially regulated (change factor > or = 1.5; p < or = 0.05), 77% were automatically assigned to one of 11 functional groups with 94% accuracy. SOM disclosed six patterns of expression kinetics. Genes associated with growth control/neuroplasticity, signal transduction, cell death/survival, cytoskeleton, and immunity were generally upregulated. In contrast, genes related to neurotransmission and lipid metabolism were transiently downregulated on the whole. The majority of the genes associated with ionic homeostasis, neurotransmission, signal transduction and lipid metabolism were differentially regulated specifically in the hippocampus. Of the cell death/survival genes found to be continuously upregulated only in hippocampus, the majority are pro-apoptotic, while those continuously upregulated only in cortex are anti-apoptotic. CONCLUSION: Temporal and spatial analysis of gene expression in experimental pneumococcal meningitis identified potential targets for therapy.
Resumo:
The advent of experimental techniques capable of probing biomolecules and cells at high levels of resolution has led to a rapid change in the methods used for the analysis of experimental molecular biology data. In this article we give an overview over visualization techniques and methods that can be used to assess various aspects of genomic data.
Resumo:
This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray data and incorporation of quality information in subsequent analyses, the combination of information across arrays and across sets of experiments, the discovery and recognition of patterns in expression at the single gene and multiple gene levels, and the assessment of significance of these findings, considering the fact that there is a lot of noise and thus random features in the data. For all of these components, access to a flexible and efficient statistical computing environment is an essential aspect.
Resumo:
Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper. Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting gene sets using different statistics. We find that gender may have effects on the gene expression in addition to the phenotype effects. Investigating overlap among interesting gene sets indicate that overlapping could alter the interpretation of the significant results.
Resumo:
Background: The recent development of semi-automated techniques for staining and analyzing flow cytometry samples has presented new challenges. Quality control and quality assessment are critical when developing new high throughput technologies and their associated information services. Our experience suggests that significant bottlenecks remain in the development of high throughput flow cytometry methods for data analysis and display. Especially, data quality control and quality assessment are crucial steps in processing and analyzing high throughput flow cytometry data. Methods: We propose a variety of graphical exploratory data analytic tools for exploring ungated flow cytometry data. We have implemented a number of specialized functions and methods in the Bioconductor package rflowcyt. We demonstrate the use of these approaches by investigating two independent sets of high throughput flow cytometry data. Results: We found that graphical representations can reveal substantial non-biological differences in samples. Empirical Cumulative Distribution Function and summary scatterplots were especially useful in the rapid identification of problems not identified by manual review. Conclusions: Graphical exploratory data analytic tools are quick and useful means of assessing data quality. We propose that the described visualizations should be used as quality assessment tools and where possible, be used for quality control.
Resumo:
Understanding regulatory mechanisms in complex biological systems is an important challenge, in particular to understand disease mechanisms, and to discover new therapies and drugs. In this paper, we consider the important question of cellular regulation of phenotype. Using single gene deletion data, we address the problem of linking a phenotype to underlying functional roles in the organism and provide a sound computational and statistical paradigm that can be extended to address more complex experimental settings such as multiple deletions. We apply the proposed approaches to publicly available data sets to demonstrate strong evidence for the involvement of multi-protein complexes in the phenotypes studied.
Resumo:
For various reasons, it is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, etc. with the documents that describe and rely on them. This integration allows readers to both verify and adapt the statements in the documents. Authors can easily reproduce them in the future, and they can present the document's contents in a different medium, e.g. with interactive controls. This paper describes a software framework for authoring and distributing these integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The documents are dynamic in that the contents, including figures, tables, etc., can be recalculated each time a view of the document is generated. Our model treats a dynamic document as a master or ``source'' document from which one can generate different views in the form of traditional, derived documents for different audiences. We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, ...), and as a means for distributing, managing and updating the collection. The step from disseminating analyses via a compendium to reproducible research is a small one. By reproducible research, we mean research papers with accompanying software tools that allow the reader to directly reproduce the results and employ the methods that are presented in the research paper. Some of the issues involved in paradigms for the production, distribution and use of such reproducible research are discussed.
Resumo:
PURPOSE: Identification of a novel rhodopsin mutation in a family with retinitis pigmentosa and comparison of the clinical phenotype to a known mutation at the same amino acid position. METHODS: Screening for mutations in rhodopsin was performed in 78 patients with retinitis pigmentosa. All exons and flanking intronic regions were amplified by PCR, sequenced, and compared to the reference sequence derived from the National Center for Biotechnology Information (NCBI, Bethesda, MD) database. Patients were characterized clinically according to the results of best corrected visual acuity testing (BCVA), slit lamp examination (SLE), funduscopy, Goldmann perimetry (GP), dark adaptometry (DA), and electroretinography (ERG). Structural analyses of the rhodopsin protein were performed with the Swiss-Pdb Viewer program available on-line (http://www.expasy.org.spdvbv/ provided in the public domain by Swiss Institute of Bioinformatics, Geneva, Switzerland). RESULTS: A novel rhodopsin mutation (Gly90Val) was identified in a Swiss family of three generations. The pedigree indicated autosomal dominant inheritance. No additional mutation was found in this family in other autosomal dominant genes. The BCVA of affected family members ranged from 20/25 to 20/20. Fundus examination showed fine pigment mottling in patients of the third generation and well-defined bone spicules in patients of the second generation. GP showed concentric constriction. DA demonstrated monophasic cone adaptation only. ERG revealed severely reduced rod and cone signals. The clinical picture is compatible with retinitis pigmentosa. A previously reported amino acid substitution at the same position in rhodopsin leads to a phenotype resembling night blindness in mutation carriers, whereas patients reported in the current study showed the classic retinitis pigmentosa phenotype. The effect of different amino acid substitutions on the three-dimensional structure of rhodopsin was analyzed by homology modeling. Distinct distortions of position 90 (shifts in amino acids 112 and 113) and additional hydrogen bonds were found. CONCLUSIONS: Different amino acid substitutions at position 90 of rhodopsin can lead to night blindness or retinitis pigmentosa. The data suggest that the property of the substituted amino acid distinguishes between the phenotypes.
Resumo:
The last few years have seen the advent of high-throughput technologies to analyze various properties of the transcriptome and proteome of several organisms. The congruency of these different data sources, or lack thereof, can shed light on the mechanisms that govern cellular function. A central challenge for bioinformatics research is to develop a unified framework for combining the multiple sources of functional genomics information and testing associations between them, thus obtaining a robust and integrated view of the underlying biology. We present a graph theoretic approach to test the significance of the association between multiple disparate sources of functional genomics data by proposing two statistical tests, namely edge permutation and node label permutation tests. We demonstrate the use of the proposed tests by finding significant association between a Gene Ontology-derived "predictome" and data obtained from mRNA expression and phenotypic experiments for Saccharomyces cerevisiae. Moreover, we employ the graph theoretic framework to recast a surprising discrepancy presented in Giaever et al. (2002) between gene expression and knockout phenotype, using expression data from a different set of experiments.