973 resultados para Computational biology


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications of microarrays are becoming more and more popular. In this paper we describe a preprocessing methodology for a technology designed for the identification of DNA sequence variants in specific genes or regions of the human genome that are associated with phenotypes of interest such as disease. In particular we describe methodology useful for preprocessing Affymetrix SNP chips and obtaining genotype calls with the preprocessed data. We demonstrate how our procedure improves existing approaches using data from three relatively large studies including one in which large number independent calls are available. Software implementing these ideas are avialble from the Bioconductor oligo package.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Genotyping platforms such as Affymetrix can be used to assess genotype-phenotype as well as copy number-phenotype associations at millions of markers. While genotyping algorithms are largely concordant when assessed on HapMap samples, tools to assess copy number changes are more variable and often discordant. One explanation for the discordance is that copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. Analysis algorithms that do not adjust for batch effects are prone to spurious measures of association. The R package crlmm implements a multilevel model that adjusts for batch effects and provides allele-specific estimates of copy number. This paper illustrates a workflow for the estimation of allele-specific copy number, develops markerand study-level summaries of batch effects, and demonstrates how the marker-level estimates can be integrated with complimentary Bioconductor software for inferring regions of copy number gain or loss. All analyses are performed in the statistical environment R. A compendium for reproducing the analysis is available from the author’s website (http://www.biostat.jhsph.edu/~rscharpf/crlmmCompendium/index.html).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The primary visual cortex (V1) is pre-wired to facilitate the extraction of behaviorally important visual features. Collinear edge detectors in V1, for instance, mutually enhance each other to improve the perception of lines against a noisy background. The same pre-wiring that facilitates line extraction, however, is detrimental when subjects have to discriminate the brightness of different line segments. How is it possible to improve in one task by unsupervised practicing, without getting worse in the other task? The classical view of perceptual learning is that practicing modulates the feedforward input stream through synaptic modifications onto or within V1. However, any rewiring of V1 would deteriorate other perceptual abilities different from the trained one. We propose a general neuronal model showing that perceptual learning can modulate top-down input to V1 in a task-specific way while feedforward and lateral pathways remain intact. Consistent with biological data, the model explains how context-dependent brightness discrimination is improved by a top-down recruitment of recurrent inhibition and a top-down induced increase of the neuronal gain within V1. Both the top-down modulation of inhibition and of neuronal gain are suggested to be universal features of cortical microcircuits which enable perceptual learning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important problem in computational biology is finding the longest common subsequence (LCS) of two nucleotide sequences. This paper examines the correctness and performance of a recently proposed parallel LCS algorithm that uses successor tables and pruning rules to construct a list of sets from which an LCS can be easily reconstructed. Counterexamples are given for two pruning rules that were given with the original algorithm. Because of these errors, performance measurements originally reported cannot be validated. The work presented here shows that speedup can be reliably achieved by an implementation in Unified Parallel C that runs on an Infiniband cluster. This performance is partly facilitated by exploiting the software cache of the MuPC runtime system. In addition, this implementation achieved speedup without bulk memory copy operations and the associated programming complexity of message passing.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Amyloids and prion proteins are clinically and biologically important beta-structures, whose supersecondary structures are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Recent work has indicated the utility of pairwise probabilistic statistics in beta-structure prediction. We develop here a new strategy for beta-structure prediction, emphasizing the determination of beta-strands and pairs of beta-strands as fundamental units of beta-structure. Our program, BETASCAN, calculates likelihood scores for potential beta-strands and strand-pairs based on correlations observed in parallel beta-sheets. The program then determines the strands and pairs with the greatest local likelihood for all of the sequence's potential beta-structures. BETASCAN suggests multiple alternate folding patterns and assigns relative a priori probabilities based solely on amino acid sequence, probability tables, and pre-chosen parameters. The algorithm compares favorably with the results of previous algorithms (BETAPRO, PASTA, SALSA, TANGO, and Zyggregator) in beta-structure prediction and amyloid propensity prediction. Accurate prediction is demonstrated for experimentally determined amyloid beta-structures, for a set of known beta-aggregates, and for the parallel beta-strands of beta-helices, amyloid-like globular proteins. BETASCAN is able both to detect beta-strands with higher sensitivity and to detect the edges of beta-strands in a richly beta-like sequence. For two proteins (Abeta and Het-s), there exist multiple sets of experimental data implying contradictory structures; BETASCAN is able to detect each competing structure as a potential structure variant. The ability to correlate multiple alternate beta-structures to experiment opens the possibility of computational investigation of prion strains and structural heterogeneity of amyloid. BETASCAN is publicly accessible on the Web at http://betascan.csail.mit.edu.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The success of combination antiretroviral therapy is limited by the evolutionary escape dynamics of HIV-1. We used Isotonic Conjunctive Bayesian Networks (I-CBNs), a class of probabilistic graphical models, to describe this process. We employed partial order constraints among viral resistance mutations, which give rise to a limited set of mutational pathways, and we modeled phenotypic drug resistance as monotonically increasing along any escape pathway. Using this model, the individualized genetic barrier (IGB) to each drug is derived as the probability of the virus not acquiring additional mutations that confer resistance. Drug-specific IGBs were combined to obtain the IGB to an entire regimen, which quantifies the virus' genetic potential for developing drug resistance under combination therapy. The IGB was tested as a predictor of therapeutic outcome using between 2,185 and 2,631 treatment change episodes of subtype B infected patients from the Swiss HIV Cohort Study Database, a large observational cohort. Using logistic regression, significant univariate predictors included most of the 18 drugs and single-drug IGBs, the IGB to the entire regimen, the expert rules-based genotypic susceptibility score (GSS), several individual mutations, and the peak viral load before treatment change. In the multivariate analysis, the only genotype-derived variables that remained significantly associated with virological success were GSS and, with 10-fold stronger association, IGB to regimen. When predicting suppression of viral load below 400 cps/ml, IGB outperformed GSS and also improved GSS-containing predictors significantly, but the difference was not significant for suppression below 50 cps/ml. Thus, the IGB to regimen is a novel data-derived predictor of treatment outcome that has potential to improve the interpretation of genotypic drug resistance tests.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Most empirical and theoretical studies have shown that sex increases the rate of evolution, although evidence of sex constraining genomic and epigenetic variation and slowing down evolution also exists. Faster rates with sex have been attributed to new gene combinations, removal of deleterious mutations, and adaptation to heterogeneous environments. Slower rates with sex have been attributed to removal of major genetic rearrangements, the cost of finding a mate, vulnerability to predation, and exposure to sexually transmitted diseases. Whether sex speeds or slows evolution, the connection between reproductive mode, the evolutionary rate, and species diversity remains largely unexplored. Here we present a spatially explicit model of ecological and evolutionary dynamics based on DNA sequence change to study the connection between mutation, speciation, and the resulting biodiversity in sexual and asexual populations. We show that faster speciation can decrease the abundance of newly formed species and thus decrease long-term biodiversity. In this way, sex can reduce diversity relative to asexual populations, because it leads to a higher rate of production of new species, but with lower abundances. Our results show that reproductive mode and the mechanisms underlying it can alter the link between mutation, evolutionary rate, speciation and biodiversity and we suggest that a high rate of evolution may not be required to yield high biodiversity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Most empirical studies support a decline in speciation rates through time, although evidence for constant speciation rates also exists. Declining rates have been explained by invoking pre-existing niches, whereas constant rates have been attributed to non-adaptive processes such as sexual selection and mutation. Trends in speciation rate and the processes underlying it remain unclear, representing a critical information gap in understanding patterns of global diversity. Here we show that the temporal trend in the speciation rate can also be explained by frequency-dependent selection. We construct a frequency-dependent and DNA sequence-based model of speciation. We compare our model to empirical diversity patterns observed for cichlid fish and Darwin's finches, two classic systems for which speciation rates and richness data exist. Negative frequency-dependent selection predicts well both the declining speciation rate found in cichlid fish and explains their species richness. For groups like the Darwin's finches, in which speciation rates are constant and diversity is lower, speciation rate is better explained by a model without frequency-dependent selection. Our analysis shows that differences in diversity may be driven by incipient species abundance with frequency-dependent selection. Our results demonstrate that genetic-distance-based speciation and frequency-dependent selection are sufficient to explain the high diversity observed in natural systems and, importantly, predict decay through time in speciation rate in the absence of pre-existing niches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The β2 adrenergic receptor (β2AR) regulates smooth muscle relaxation in the vasculature and airways. Long- and Short-acting β-agonists (LABAs/SABAs) are widely used in treatment of chronic obstructive pulmonary disorder (COPD) and asthma. Despite their widespread clinical use we do not understand well the dominant β2AR regulatory pathways that are stimulated during therapy and bring about tachyphylaxis, which is the loss of drug effects. Thus, an understanding of how the β2AR responds to various β-agonists is crucial to their rational use. Towards that end we have developed deterministic models that explore the mechanism of drug- induced β2AR regulation. These mathematical models can be classified into three classes; (i) Six quantitative models of SABA-induced G protein coupled receptor kinase (GRK)-mediated β2AR regulation; (ii) Three phenomenological models of salmeterol (a LABA)-induced GRK-mediated β2AR regulation; and (iii) One semi-quantitative, unified model of SABA-induced GRK-, protein kinase A (PKA)-, and phosphodiesterase (PDE)-mediated regulation of β2AR signalling. The various models were constrained with all or some of the following experimental data; (i) GRK-mediated β2AR phosphorylation in response to various LABAs/SABAs; (ii) dephosphorylation of the GRK site on the β2AR; (iii) β2AR internalisation; (iv) β2AR recycling; (v) β2AR desensitisation; (vi) β2AR resensitisation; (vii) PKA-mediated β2AR phosphorylation in response to a SABA; and (viii) LABA/SABA induced cAMP profile ± PDE inhibitors. The models of GRK-mediated β2AR regulation show that plasma membrane dephosphorylation and recycling of the phosphorylated β2AR are required to reconcile with the measured dephosphorylation kinetics. We further used a consensus model to predict the consequences of rapid pulsatile agonist stimulation and found that although resensitisation was rapid, the β2AR system retained the memory of prior stimuli and desensitised much more rapidly and strongly in response to subsequent stimuli. This could explain tachyphylaxis of SABAs over repeated use in rescue therapy of asthma patients. The LABA models show that the long action of salmeterol can be explained due to decreased stability of the arrestin/β2AR/salmeterol complex. This could explain long action of β-agonists used in maintenance therapy of asthma patients. Our consensus model of PKA/PDE/GRK-mediated β2AR regulation is being used to identify the dominant β2AR desensitisation pathways under different therapeutic regimens in human airway cells. In summary our models represent a significant advance towards understanding agonist-specific β2AR regulation that will aid in a more rational use of the β2AR agonists in the treatment of asthma.