5 resultados para distributed computation

em Duke University


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We obtain an upper bound on the time available for quantum computation for a given quantum computer and decohering environment with quantum error correction implemented. First, we derive an explicit quantum evolution operator for the logical qubits and show that it has the same form as that for the physical qubits but with a reduced coupling strength to the environment. Using this evolution operator, we find the trace distance between the real and ideal states of the logical qubits in two cases. For a super-Ohmic bath, the trace distance saturates, while for Ohmic or sub-Ohmic baths, there is a finite time before the trace distance exceeds a value set by the user. © 2010 The American Physical Society.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The beta-adrenergic receptor kinase (beta ARK) phosphorylates the agonist-occupied beta-adrenergic receptor to promote rapid receptor uncoupling from Gs, thereby attenuating adenylyl cyclase activity. Beta ARK-mediated receptor desensitization may reflect a general molecular mechanism operative on many G-protein-coupled receptor systems and, particularly, synaptic neurotransmitter receptors. Two distinct cDNAs encoding beta ARK isozymes were isolated from rat brain and sequenced. The regional and cellular distributions of these two gene products, termed beta ARK1 and beta ARK2, were determined in brain by in situ hybridization and by immunohistochemistry at the light and electron microscopic levels. The beta ARK isozymes were found to be expressed primarily in neurons distributed throughout the CNS. Ultrastructurally, beta ARK1 and beta ARK2 immunoreactivities were present both in association with postsynaptic densities and, presynaptically, with axon terminals. The beta ARK isozymes have a regional and subcellular distribution consistent with a general role in the desensitization of synaptic receptors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hydrologic research is a very demanding application of fiber-optic distributed temperature sensing (DTS) in terms of precision, accuracy and calibration. The physics behind the most frequently used DTS instruments are considered as they apply to four calibration methods for single-ended DTS installations. The new methods presented are more accurate than the instrument-calibrated data, achieving accuracies on the order of tenths of a degree root mean square error (RMSE) and mean bias. Effects of localized non-uniformities that violate the assumptions of single-ended calibration data are explored and quantified. Experimental design considerations such as selection of integration times or selection of the length of the reference sections are discussed, and the impacts of these considerations on calibrated temperatures are explored in two case studies.