Biblioteca Digital

980 resultados para Statistical Distributions.

Finite mixture distributions, sequential likelihood and the EM algorithm

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A popular way to account for unobserved heterogeneity is to assume that the data are drawn from a finite mixture distribution. A barrier to using finite mixture models is that parameters that could previously be estimated in stages must now be estimated jointly: using mixture distributions destroys any additive separability of the log-likelihood function. We show, however, that an extension of the EM algorithm reintroduces additive separability, thus allowing one to estimate parameters sequentially during each maximization step. In establishing this result, we develop a broad class of estimators for mixture models. Returning to the likelihood problem, we show that, relative to full information maximum likelihood, our sequential estimator can generate large computational savings with little loss of efficiency.

Analysis of soil carbon transit times and age distributions using network theories

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The long-term soil carbon dynamics may be approximated by networks of linear compartments, permitting theoretical analysis of transit time (i.e., the total time spent by a molecule in the system) and age (the time elapsed since the molecule entered the system) distributions. We compute and compare these distributions for different network. configurations, ranging from the simple individual compartment, to series and parallel linear compartments, feedback systems, and models assuming a continuous distribution of decay constants. We also derive the transit time and age distributions of some complex, widely used soil carbon models (the compartmental models CENTURY and Rothamsted, and the continuous-quality Q-Model), and discuss them in the context of long-term carbon sequestration in soils. We show how complex models including feedback loops and slow compartments have distributions with heavier tails than simpler models. Power law tails emerge when using continuous-quality models, indicating long retention times for an important fraction of soil carbon. The responsiveness of the soil system to changes in decay constants due to altered climatic conditions or plant species composition is found to be stronger when all compartments respond equally to the environmental change, and when the slower compartments are more sensitive than the faster ones or lose more carbon through microbial respiration. Copyright 2009 by the American Geophysical Union.

Enhancing imaging systems using transformation optics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We apply the transformation optical technique to modify or improve conventional refractive and gradient index optical imaging devices. In particular, when it is known that a detector will terminate the paths of rays over some surface, more freedom is available in the transformation approach, since the wave behavior over a large portion of the domain becomes unimportant. For the analyzed configurations, quasi-conformal and conformal coordinate transformations can be used, leading to simplified constitutive parameter distributions that, in some cases, can be realized with isotropic index; index-only media can be low-loss and have broad bandwidth. We apply a coordinate transformation to flatten a Maxwell fish-eye lens, forming a near-perfect relay lens; and also flatten the focal surface associated with a conventional refractive lens, such that the system exhibits an ultra-wide field-of-view with reduced aberration.

Application of Edwards' statistical mechanics to high-dimensional jammed sphere packings

Relevância:

20.00% 20.00%

Publicador:

Chiropteran types I and II interferon genes inferred from genome sequencing traces by a statistical gene-family assembler.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.

Latent Stick-Breaking Processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We develop a model for stochastic processes with random marginal distributions. Our model relies on a stick-breaking construction for the marginal distribution of the process, and introduces dependence across locations by using a latent Gaussian copula model as the mechanism for selecting the atoms. The resulting latent stick-breaking process (LaSBP) induces a random partition of the index space, with points closer in space having a higher probability of being in the same cluster. We develop an efficient and straightforward Markov chain Monte Carlo (MCMC) algorithm for computation and discuss applications in financial econometrics and ecology. This article has supplementary material online.

Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Missing data in randomized clinical trials for weight loss: scope of the problem, state of the field, and performance of statistical methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Dropouts and missing data are nearly-ubiquitous in obesity randomized controlled trails, threatening validity and generalizability of conclusions. Herein, we meta-analytically evaluate the extent of missing data, the frequency with which various analytic methods are employed to accommodate dropouts, and the performance of multiple statistical methods. METHODOLOGY/PRINCIPAL FINDINGS: We searched PubMed and Cochrane databases (2000-2006) for articles published in English and manually searched bibliographic references. Articles of pharmaceutical randomized controlled trials with weight loss or weight gain prevention as major endpoints were included. Two authors independently reviewed each publication for inclusion. 121 articles met the inclusion criteria. Two authors independently extracted treatment, sample size, drop-out rates, study duration, and statistical method used to handle missing data from all articles and resolved disagreements by consensus. In the meta-analysis, drop-out rates were substantial with the survival (non-dropout) rates being approximated by an exponential decay curve (e(-lambdat)) where lambda was estimated to be .0088 (95% bootstrap confidence interval: .0076 to .0100) and t represents time in weeks. The estimated drop-out rate at 1 year was 37%. Most studies used last observation carried forward as the primary analytic method to handle missing data. We also obtained 12 raw obesity randomized controlled trial datasets for empirical analyses. Analyses of raw randomized controlled trial data suggested that both mixed models and multiple imputation performed well, but that multiple imputation may be more robust when missing data are extensive. CONCLUSION/SIGNIFICANCE: Our analysis offers an equation for predictions of dropout rates useful for future study planning. Our raw data analyses suggests that multiple imputation is better than other methods for handling missing data in obesity randomized controlled trials, followed closely by mixed models. We suggest these methods supplant last observation carried forward as the primary method of analysis.

Ambient aerosol size distributions and number concentrations measured during the Pittsburgh Air Quality Study (PAQS)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Twelve months of aerosol size distributions from 3 to 560nm, measured using scanning mobility particle sizers are presented with an emphasis on average number, surface, and volume distributions, and seasonal and diurnal variation. The measurements were made at the main sampling site of the Pittsburgh Air Quality Study from July 2001 to June 2002. These are supplemented with 5 months of size distribution data from 0.5 to 2.5μm measured with a TSI aerosol particle sizer and 2 months of size distributions measured at an upwind rural sampling site. Measurements at the main site were made continuously under both low and ambient relative humidity. The average Pittsburgh number concentration (3-500nm) is 22,000cm-3 with an average mode size of 40nm. Strong diurnal patterns in number concentrations are evident as a direct effect of the sources of particles (atmospheric nucleation, traffic, and other combustion sources). New particle formation from homogeneous nucleation is significant on 30-50% of study days and over a wide area (at least a hundred kilometers). Rural number concentrations are a factor of 2-3 lower (on average) than the urban values. Average measured distributions are different from model literature urban and rural size distributions. © 2004 Elsevier Ltd. All rights reserved.

Convergent Surface Water Distributions in U.S. Cities

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Earth's surface is rapidly urbanizing, resulting in dramatic changes in the abundance, distribution and character of surface water features in urban landscapes. However, the scope and consequences of surface water redistribution at broad spatial scales are not well understood. We hypothesized that urbanization would lead to convergent surface water abundance and distribution: in other words, cities will gain or lose water such that they become more similar to each other than are their surrounding natural landscapes. Using a database of more than 1 million water bodies and 1 million km of streams, we compared the surface water of 100 US cities with their surrounding undeveloped land. We evaluated differences in areal (A WB) and numeric densities (N WB) of water bodies (lakes, wetlands, and so on), the morphological characteristics of water bodies (size), and the density (D C) of surface flow channels (that is, streams and rivers). The variance of urban A WB, N WB, and D C across the 100 MSAs decreased, by 89, 25, and 71%, respectively, compared to undeveloped land. These data show that many cities are surface water poor relative to undeveloped land; however, in drier landscapes urbanization increases the occurrence of surface water. This convergence pattern strengthened with development intensity, such that high intensity urban development had an areal water body density 98% less than undeveloped lands. Urbanization appears to drive the convergence of hydrological features across the US, such that surface water distributions of cities are more similar to each other than to their surrounding landscapes. © 2014 The Author(s).

Task-driven adaptive statistical compressive sensing of gaussian mixture models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A framework for adaptive and non-adaptive statistical compressive sensing is developed, where a statistical model replaces the standard sparsity model of classical compressive sensing. We propose within this framework optimal task-specific sensing protocols specifically and jointly designed for classification and reconstruction. A two-step adaptive sensing paradigm is developed, where online sensing is applied to detect the signal class in the first step, followed by a reconstruction step adapted to the detected class and the observed samples. The approach is based on information theory, here tailored for Gaussian mixture models (GMMs), where an information-theoretic objective relationship between the sensed signals and a representation of the specific task of interest is maximized. Experimental results using synthetic signals, Landsat satellite attributes, and natural images of different sizes and with different noise levels show the improvements achieved using the proposed framework when compared to more standard sensing protocols. The underlying formulation can be applied beyond GMMs, at the price of higher mathematical and computational complexity. © 1991-2012 IEEE.

Statistical analysis of crystallization database links protein physico-chemical features with crystallization mechanisms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

X-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.

Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Determination of copy number variants (CNVs) inferred in genome wide single nucleotide polymorphism arrays has shown increasing utility in genetic variant disease associations. Several CNV detection methods are available, but differences in CNV call thresholds and characteristics exist. We evaluated the relative performance of seven methods: circular binary segmentation, CNVFinder, cnvPartition, gain and loss of DNA, Nexus algorithms, PennCNV and QuantiSNP. Tested data included real and simulated Illumina HumHap 550 data from the Singapore cohort study of the risk factors for Myopia (SCORM) and simulated data from Affymetrix 6.0 and platform-independent distributions. The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization before enacting full analysis. We used 10 SCORM samples for optimizing parameter settings for each method and then evaluated method performance at optimal parameters using 100 SCORM samples. The statistical power, false positive rates, and receiver operating characteristic (ROC) curve residuals were evaluated by simulation studies. Optimal parameters, as determined by NSR and ROC curve residuals, were consistent across datasets. QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs.

A flexible statistical model for alignment of label-free proteomics data - incorporating ion mobility and product ion information

Relevância:

20.00% 20.00%

Publicador:

Universal Quake Statistics: From Compressed Nanocrystals to Earthquakes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Slowly-compressed single crystals, bulk metallic glasses (BMGs), rocks, granular materials, and the earth all deform via intermittent slips or "quakes". We find that although these systems span 12 decades in length scale, they all show the same scaling behavior for their slip size distributions and other statistical properties. Remarkably, the size distributions follow the same power law multiplied with the same exponential cutoff. The cutoff grows with applied force for materials spanning length scales from nanometers to kilometers. The tuneability of the cutoff with stress reflects "tuned critical" behavior, rather than self-organized criticality (SOC), which would imply stress-independence. A simple mean field model for avalanches of slipping weak spots explains the agreement across scales. It predicts the observed slip-size distributions and the observed stress-dependent cutoff function. The results enable extrapolations from one scale to another, and from one force to another, across different materials and structures, from nanocrystals to earthquakes.

«
1
2
...
57
58
59
60
61
62
63
...
65
66
»