9 resultados para Statistical Model

em National Center for Biotechnology Information - NCBI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A “most probable state” equilibrium statistical theory for random distributions of hetons in a closed basin is developed here in the context of two-layer quasigeostrophic models for the spreading phase of open-ocean convection. The theory depends only on bulk conserved quantities such as energy, circulation, and the range of values of potential vorticity in each layer. The simplest theory is formulated for a uniform cooling event over the entire basin that triggers a homogeneous random distribution of convective towers. For a small Rossby deformation radius typical for open-ocean convection sites, the most probable states that arise from this theory strongly resemble the saturated baroclinic states of the spreading phase of convection, with a stabilizing barotropic rim current and localized temperature anomaly.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Structural genomics aims to solve a large number of protein structures that represent the protein space. Currently an exhaustive solution for all structures seems prohibitively expensive, so the challenge is to define a relatively small set of proteins with new, currently unknown folds. This paper presents a method that assigns each protein with a probability of having an unsolved fold. The method makes extensive use of protomap, a sequence-based classification, and scop, a structure-based classification. According to protomap, the protein space encodes the relationship among proteins as a graph whose vertices correspond to 13,354 clusters of proteins. A representative fold for a cluster with at least one solved protein is determined after superposition of all scop (release 1.37) folds onto protomap clusters. Distances within the protomap graph are computed from each representative fold to the neighboring folds. The distribution of these distances is used to create a statistical model for distances among those folds that are already known and those that have yet to be discovered. The distribution of distances for solved/unsolved proteins is significantly different. This difference makes it possible to use Bayes' rule to derive a statistical estimate that any protein has a yet undetermined fold. Proteins that score the highest probability to represent a new fold constitute the target list for structural determination. Our predicted probabilities for unsolved proteins correlate very well with the proportion of new folds among recently solved structures (new scop 1.39 records) that are disjoint from our original training set.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present statistical methods for analyzing replicated cDNA microarray expression data and report the results of a controlled experiment. The study was conducted to investigate inherent variability in gene expression data and the extent to which replication in an experiment produces more consistent and reliable findings. We introduce a statistical model to describe the probability that mRNA is contained in the target sample tissue, converted to probe, and ultimately detected on the slide. We also introduce a method to analyze the combined data from all replicates. Of the 288 genes considered in this controlled experiment, 32 would be expected to produce strong hybridization signals because of the known presence of repetitive sequences within them. Results based on individual replicates, however, show that there are 55, 36, and 58 highly expressed genes in replicates 1, 2, and 3, respectively. On the other hand, an analysis by using the combined data from all 3 replicates reveals that only 2 of the 288 genes are incorrectly classified as expressed. Our experiment shows that any single microarray output is subject to substantial variability. By pooling data from replicates, we can provide a more reliable analysis of gene expression data. Therefore, we conclude that designing experiments with replications will greatly reduce misclassification rates. We recommend that at least three replicates be used in designing experiments by using cDNA microarrays, particularly when gene expression data from single specimens are being analyzed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A statistical modeling approach is proposed for use in searching large microarray data sets for genes that have a transcriptional response to a stimulus. The approach is unrestricted with respect to the timing, magnitude or duration of the response, or the overall abundance of the transcript. The statistical model makes an accommodation for systematic heterogeneity in expression levels. Corresponding data analyses provide gene-specific information, and the approach provides a means for evaluating the statistical significance of such information. To illustrate this strategy we have derived a model to depict the profile expected for a periodically transcribed gene and used it to look for budding yeast transcripts that adhere to this profile. Using objective criteria, this method identifies 81% of the known periodic transcripts and 1,088 genes, which show significant periodicity in at least one of the three data sets analyzed. However, only one-quarter of these genes show significant oscillations in at least two data sets and can be classified as periodic with high confidence. The method provides estimates of the mean activation and deactivation times, induced and basal expression levels, and statistical measures of the precision of these estimates for each periodic transcript.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Understanding the mechanism of protein secondary structure formation is an essential part of the protein-folding puzzle. Here, we describe a simple statistical mechanical model for the formation of a β-hairpin, the minimal structural element of the antiparallel β-pleated sheet. The model accurately describes the thermodynamic and kinetic behavior of a 16-residue, β-hairpin-forming peptide, successfully explaining its two-state behavior and apparent negative activation energy for folding. The model classifies structures according to their backbone conformation, defined by 15 pairs of dihedral angles, and is further simplified by considering only the 120 structures with contiguous stretches of native pairs of backbone dihedral angles. This single sequence approximation is tested by comparison with a more complete model that includes the 215 possible conformations and 15 × 215 possible kinetic transitions. Finally, we use the model to predict the equilibrium unfolding curves and kinetics for several variants of the β-hairpin peptide.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present work develops and implements a biomathematical statement of how reciprocal connectivity drives stress-adaptive homeostasis in the corticotropic (hypothalamo-pituitary-adrenal) axis. In initial analyses with this interactive construct, we test six specific a priori hypotheses of mechanisms linking circadian (24-h) rhythmicity to pulsatile secretory output. This formulation offers a dynamic framework for later statistical estimation of unobserved in vivo neurohormone secretion and within-axis, dose-responsive interfaces in health and disease. Explication of the core dynamics of the stress-responsive corticotropic axis based on secure physiological precepts should help to unveil new biomedical hypotheses of stressor-specific system failure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A model of interdependent decision making has been developed to understand group differences in socioeconomic behavior such as nonmarital fertility, school attendance, and drug use. The statistical mechanical structure of the model illustrates how the physical sciences contain useful tools for the study of socioeconomic phenomena.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A molecular model of poorly understood hydrophobic effects is heuristically developed using the methods of information theory. Because primitive hydrophobic effects can be tied to the probability of observing a molecular-sized cavity in the solvent, the probability distribution of the number of solvent centers in a cavity volume is modeled on the basis of the two moments available from the density and radial distribution of oxygen atoms in liquid water. The modeled distribution then yields the probability that no solvent centers are found in the cavity volume. This model is shown to account quantitatively for the central hydrophobic phenomena of cavity formation and association of inert gas solutes. The connection of information theory to statistical thermodynamics provides a basis for clarification of hydrophobic effects. The simplicity and flexibility of the approach suggest that it should permit applications to conformational equilibria of nonpolar solutes and hydrophobic residues in biopolymers.