11 resultados para scalable parallel programming
em Duke University
Resumo:
This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
Resumo:
The establishment of conductive graphene-molecule-graphene junction is investigated through first-principles electronic structure calculations and quantum transport calculations. The junction consists of a conjugated molecule connecting two parallel graphene sheets. The effects of molecular electronic states, structure relaxation, and molecule-graphene contact on the conductance of the junction are explored. A conductance as large as 0.38 conductance quantum is found achievable with an appropriately oriented dithiophene bridge. This work elucidates the designing principles of promising nanoelectronic devices based on conductive graphene-molecule-graphene junctions.
Resumo:
BACKGROUND: Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. RESULTS: We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. CONCLUSIONS: permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.
Resumo:
The health of clergy is important, and clergy may find health programming tailored to them more effective. Little is known about existing clergy health programs. We contacted Protestant denominational headquarters and searched academic databases and the Internet. We identified 56 clergy health programs and categorized them into prevention and personal enrichment; counseling; marriage and family enrichment; peer support; congregational health; congregational effectiveness; denominational enrichment; insurance/strategic pension plans; and referral-based programs. Only 13 of the programs engaged in outcomes evaluation. Using the Socioecological Framework, we found that many programs support individual-level and institutional-level changes, but few programs support congregational-level changes. Outcome evaluation strategies and a central repository for information on clergy health programs are needed. © 2011 Springer Science+Business Media, LLC.
Resumo:
Alewife, Alosa pseudoharengus, populations occur in two discrete life-history variants, an anadromous form and a landlocked (freshwater resident) form. Landlocked populations display a consistent pattern of life-history divergence from anadromous populations, including earlier age at maturity, smaller adult body size, and reduced fecundity. In Connecticut (USA), dams constructed on coastal streams separate anadromous spawning runs from lake-resident landlocked populations. Here, we used sequence data from the mtDNA control region and allele frequency data from five microsatellite loci to ask whether coastal Connecticut landlocked alewife populations are independently evolved from anadromous populations or whether they share a common freshwater ancestor. We then used microsatellite data to estimate the timing of the divergence between anadromous and landlocked populations. Finally, we examined anadromous and landlocked populations for divergence in foraging morphology and used divergence time estimates to calculate the rate of evolution for foraging traits. Our results indicate that landlocked populations have evolved multiple times independently. Tests of population divergence and estimates of gene flow show that landlocked populations are genetically isolated, whereas anadromous populations exchange genes. These results support a 'phylogenetic raceme' model of landlocked alewife divergence, with anadromous populations forming an ancestral core from which landlocked populations independently diverged. Divergence time estimates suggest that landlocked populations diverged from a common anadromous ancestor no longer than 5000 years ago and perhaps as recently as 300 years ago, depending on the microsatellite mutation rate assumed. Examination of foraging traits reveals landlocked populations to have significantly narrower gapes and smaller gill raker spacings than anadromous populations, suggesting that they are adapted to foraging on smaller prey items. Estimates of evolutionary rates (in haldanes) indicate rapid evolution of foraging traits, possibly in response to changes in available resources.
Resumo:
An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.
This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.
On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.
In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.
We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,
and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.
In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.
Resumo:
Programmed death is often associated with a bacterial stress response. This behavior appears paradoxical, as it offers no benefit to the individual. This paradox can be explained if the death is 'altruistic': the killing of some cells can benefit the survivors through release of 'public goods'. However, the conditions where bacterial programmed death becomes advantageous have not been unambiguously demonstrated experimentally. Here, we determined such conditions by engineering tunable, stress-induced altruistic death in the bacterium Escherichia coli. Using a mathematical model, we predicted the existence of an optimal programmed death rate that maximizes population growth under stress. We further predicted that altruistic death could generate the 'Eagle effect', a counter-intuitive phenomenon where bacteria appear to grow better when treated with higher antibiotic concentrations. In support of these modeling insights, we experimentally demonstrated both the optimality in programmed death rate and the Eagle effect using our engineered system. Our findings fill a critical conceptual gap in the analysis of the evolution of bacterial programmed death, and have implications for a design of antibiotic treatment.
Resumo:
Activation of CD4+ T cells results in rapid proliferation and differentiation into effector and regulatory subsets. CD4+ effector T cell (Teff) (Th1 and Th17) and Treg subsets are metabolically distinct, yet the specific metabolic differences that modify T cell populations are uncertain. Here, we evaluated CD4+ T cell populations in murine models and determined that inflammatory Teffs maintain high expression of glycolytic genes and rely on high glycolytic rates, while Tregs are oxidative and require mitochondrial electron transport to proliferate, differentiate, and survive. Metabolic profiling revealed that pyruvate dehydrogenase (PDH) is a key bifurcation point between T cell glycolytic and oxidative metabolism. PDH function is inhibited by PDH kinases (PDHKs). PDHK1 was expressed in Th17 cells, but not Th1 cells, and at low levels in Tregs, and inhibition or knockdown of PDHK1 selectively suppressed Th17 cells and increased Tregs. This alteration in the CD4+ T cell populations was mediated in part through ROS, as N-acetyl cysteine (NAC) treatment restored Th17 cell generation. Moreover, inhibition of PDHK1 modulated immunity and protected animals against experimental autoimmune encephalomyelitis, decreasing Th17 cells and increasing Tregs. Together, these data show that CD4+ subsets utilize and require distinct metabolic programs that can be targeted to control specific T cell populations in autoimmune and inflammatory diseases.