973 resultados para Computational biology
Resumo:
The use of mutagenic drugs to drive HIV-1 past its error threshold presents a novel intervention strategy, as suggested by the quasispecies theory, that may be less susceptible to failure via viral mutation-induced emergence of drug resistance than current strategies. The error threshold of HIV-1, mu(c), however, is not known. Application of the quasispecies theory to determine mu(c) poses significant challenges: Whereas the quasispecies theory considers the asexual reproduction of an infinitely large population of haploid individuals, HIV-1 is diploid, undergoes recombination, and is estimated to have a small effective population size in vivo. We performed population genetics-based stochastic simulations of the within-host evolution of HIV-1 and estimated the structure of the HIV-1 quasispecies and mu(c). We found that with small mutation rates, the quasispecies was dominated by genomes with few mutations. Upon increasing the mutation rate, a sharp error catastrophe occurred where the quasispecies became delocalized in sequence space. Using parameter values that quantitatively captured data of viral diversification in HIV-1 patients, we estimated mu(c) to be 7 x 10(-5) -1 x 10(-4) substitutions/site/replication, similar to 2-6 fold higher than the natural mutation rate of HIV-1, suggesting that HIV-1 survives close to its error threshold and may be readily susceptible to mutagenic drugs. The latter estimate was weakly dependent on the within-host effective population size of HIV-1. With large population sizes and in the absence of recombination, our simulations converged to the quasispecies theory, bridging the gap between quasispecies theory and population genetics-based approaches to describing HIV-1 evolution. Further, mu(c) increased with the recombination rate, rendering HIV-1 less susceptible to error catastrophe, thus elucidating an added benefit of recombination to HIV-1. Our estimate of mu(c) may serve as a quantitative guideline for the use of mutagenic drugs against HIV-1.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
Flap dynamics of HIV-1 protease (HIV-pr) controls the entry of inhibitors and substrates to the active site. Dynamical models from previous simulations are not all consistent with each other and not all are supported by the NMR results. In the present work, the er effect of force field on the dynamics of HIV-pr is investigated by MD simulations using three AMBER force fields ff99, ff99SB, and ff03. The generalized order parameters for amide backbone are calculated from the three force fields and compared with the NMR S2 values. We found that the ff99SB and ff03 force field calculated order parameters agree reasonably well with the NMR S2 values, whereas ff99 calculated values deviate most from the NMR order parameters. Stereochemical geometry of protein models from each force field also agrees well with the remarks from NMR S2 values. However, between ff99SB and ff03, there are several differences, most notably in the loop regions. It is found that these loops are, in general, more flexible in the ff03 force field. This results in a larger active site cavity in the simulation with the ff03 force field. The effect of this difference in computer-aided drug design against flexible receptors is discussed.
Resumo:
This article is concerned with the evolution of haploid organisms that reproduce asexually. In a seminal piece of work, Eigen and coauthors proposed the quasispecies model in an attempt to understand such an evolutionary process. Their work has impacted antiviral treatment and vaccine design strategies. Yet, predictions of the quasispecies model are at best viewed as a guideline, primarily because it assumes an infinite population size, whereas realistic population sizes can be quite small. In this paper we consider a population genetics-based model aimed at understanding the evolution of such organisms with finite population sizes and present a rigorous study of the convergence and computational issues that arise therein. Our first result is structural and shows that, at any time during the evolution, as the population size tends to infinity, the distribution of genomes predicted by our model converges to that predicted by the quasispecies model. This justifies the continued use of the quasispecies model to derive guidelines for intervention. While the stationary state in the quasispecies model is readily obtained, due to the explosion of the state space in our model, exact computations are prohibitive. Our second set of results are computational in nature and address this issue. We derive conditions on the parameters of evolution under which our stochastic model mixes rapidly. Further, for a class of widely used fitness landscapes we give a fast deterministic algorithm which computes the stationary distribution of our model. These computational tools are expected to serve as a framework for the modeling of strategies for the deployment of mutagenic drugs.
Resumo:
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞-norm extreme of the lp-norm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on real-world data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
Resumo:
In systems biology, questions concerning the molecular and cellular makeup of an organism are of utmost importance, especially when trying to understand how unreliable components-like genetic circuits, biochemical cascades, and ion channels, among others-enable reliable and adaptive behaviour. The repertoire and speed of biological computations are limited by thermodynamic or metabolic constraints: an example can be found in neurons, where fluctuations in biophysical states limit the information they can encode-with almost 20-60% of the total energy allocated for the brain used for signalling purposes, either via action potentials or by synaptic transmission. Here, we consider the imperatives for neurons to optimise computational and metabolic efficiency, wherein benefits and costs trade-off against each other in the context of self-organised and adaptive behaviour. In particular, we try to link information theoretic (variational) and thermodynamic (Helmholtz) free-energy formulations of neuronal processing and show how they are related in a fundamental way through a complexity minimisation lemma.
Resumo:
Gene expression is the most fundamental biological process, which is essential for phenotypic variation. It is regulated by various external (environment and evolution) and internal (genetic) factors. The level of gene expression depends on promoter architecture, along with other external factors. Presence of sequence motifs, such as transcription factor binding sites (TFBSs) and TATA-box, or DNA methylation in vertebrates has been implicated in the regulation of expression of some genes in eukaryotes, but a large number of genes lack these sequences. On the other hand, several experimental and computational studies have shown that promoter sequences possess some special structural properties, such as low stability, less bendability, low nucleosome occupancy, and more curvature, which are prevalent across all organisms. These structural features may play role in transcription initiation and regulation of gene expression. We have studied the relationship between the structural features of promoter DNA, promoter directionality and gene expression variability in S. cerevisiae. This relationship has been analyzed for seven different measures of gene expression variability, along with two different regulatory effect measures. We find that a few of the variability measures of gene expression are linked to DNA structural properties, nucleosome occupancy, TATA-box presence, and bidirectionality of promoter regions. Interestingly, gene responsiveness is most intimately correlated with DNA structural features and promoter architecture.
Resumo:
A balance between excitatory and inhibitory synaptic currents is thought to be important for several aspects of information processing in cortical neurons in vivo, including gain control, bandwidth and receptive field structure. These factors will affect the firing rate of cortical neurons and their reliability, with consequences for their information coding and energy consumption. Yet how balanced synaptic currents contribute to the coding efficiency and energy efficiency of cortical neurons remains unclear. We used single compartment computational models with stochastic voltage-gated ion channels to determine whether synaptic regimes that produce balanced excitatory and inhibitory currents have specific advantages over other input regimes. Specifically, we compared models with only excitatory synaptic inputs to those with equal excitatory and inhibitory conductances, and stronger inhibitory than excitatory conductances (i.e. approximately balanced synaptic currents). Using these models, we show that balanced synaptic currents evoke fewer spikes per second than excitatory inputs alone or equal excitatory and inhibitory conductances. However, spikes evoked by balanced synaptic inputs are more informative (bits/spike), so that spike trains evoked by all three regimes have similar information rates (bits/s). Consequently, because spikes dominate the energy consumption of our computational models, approximately balanced synaptic currents are also more energy efficient than other synaptic regimes. Thus, by producing fewer, more informative spikes approximately balanced synaptic currents in cortical neurons can promote both coding efficiency and energy efficiency.
Resumo:
The most spectacular applications of crystallography are currently concerned with biological macromolecules like proteins and their assemblies. Macromolecular crystallography originated in England in the thirties of the last century, but definitive results began to appear only around 1960. Since then macromolecular crystallography has grown to become central to modern biology. India has a long tradition in crystallography starting with the work of K. Banerjee in the thirties. In addition to their contributions to crystallography, G.N. Ramachandran and his colleagues gave a head start to India in computational biology, molecular modeling and what we now call bioinformatics. However, attempts to initiate macromolecular crystallography in India started only in the seventies. The work took off the ground after the Department of Science and Technology handsomely supported the group at Indian Institute of Science, Bangalore in 1983. The Bangalore group was also recognized as a national nucleus for the development of the area in the country. Since then macromolecular crystallography, practiced in more than 30 institutions in the country, has grown to become an important component of scientific research in India. The articles in this issue provide a flavor of activities in the area in the country. The area is still in an expanding phase and is poised to scale greater heights.
Resumo:
Information is encoded in neural circuits using both graded and action potentials, converting between them within single neurons and successive processing layers. This conversion is accompanied by information loss and a drop in energy efficiency. We investigate the biophysical causes of this loss of information and efficiency by comparing spiking neuron models, containing stochastic voltage-gated Na+ and K+ channels, with generator potential and graded potential models lacking voltage-gated Na+ channels. We identify three causes of information loss in the generator potential that are the by-product of action potential generation: (1) the voltage-gated Na+ channels necessary for action potential generation increase intrinsic noise and (2) introduce non-linearities, and (3) the finite duration of the action potential creates a `footprint' in the generator potential that obscures incoming signals. These three processes reduce information rates by similar to 50% in generator potentials, to similar to 3 times that of spike trains. Both generator potentials and graded potentials consume almost an order of magnitude less energy per second than spike trains. Because of the lower information rates of generator potentials they are substantially less energy efficient than graded potentials. However, both are an order of magnitude more efficient than spike trains due to the higher energy costs and low information content of spikes, emphasizing that there is a two-fold cost of converting analogue to digital; information loss and cost inflation.
Resumo:
In recent times, zebrafish has garnered lot of popularity as model organism to study human cancers. Despite high evolutionary divergence from humans, zebrafish develops almost all types of human tumors when induced. However, mechanistic details of tumor formation have remained largely unknown. Present study is aimed at analysis of repertoire of kinases in zebrafish proteome to provide insights into various cellular components. Annotation using highly sensitive remote homology detection methods revealed ``substantial expansion'' of Ser/Thr/Tyr kinase family in zebrafish compared to humans, constituting over 3% of proteome. Subsequent classification of kinases into subfamilies revealed presence of large number of CAMK group of kinases, with massive representation of PIM kinases, important for cell cycle regulation and growth. Extensive sequence comparison between human and zebrafish PIM kinases revealed high conservation of functionally important residues with a few organism specific variations. There are about 300 PIM kinases in zebrafish kinome, while human genome codes for only about 500 kinases altogether. PIM kinases have been implicated in various human cancers and are currently being targeted to explore their therapeutic potentials. Hence, in depth analysis of PIM kinases in zebrafish has opened up new avenues of research to verify the model organism status of zebrafish.
Resumo:
Network biology is conceptualized as an interdisciplinary field, lying at the intersection among graph theory, statistical mechanics and biology. Great efforts have been made to promote the concept of network biology and its various applications in life s
Resumo:
Background: CpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, are often located in the 5' end of genes and considered gene markers. Hackenberg et al. ( 2006) recently developed a new algorithm, CpGcluster, which uses a completely diffe