909 resultados para parallel computation model
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Resumo:
In this paper we describe the development of a program that aims at the optimal integration of observed data in an oceanographic model describ
Resumo:
The paper investigates train scheduling problems when prioritised trains and non-prioritised trains are simultaneously traversed in a single-line rail network. In this case, no-wait conditions arise because the prioritised trains such as express passenger trains should traverse continuously without any interruption. In comparison, non-prioritised trains such as freight trains are allowed to enter the next section immediately if possible or to remain in a section until the next section on the routing becomes available, which is thought of as a relaxation of no-wait conditions. With thorough analysis of the structural properties of the No-Wait Blocking Parallel-Machine Job-Shop-Scheduling (NWBPMJSS) problem that is originated in this research, an innovative generic constructive algorithm (called NWBPMJSS_Liu-Kozan) is proposed to construct the feasible train timetable in terms of a given order of trains. In particular, the proposed NWBPMJSS_Liu-Kozan constructive algorithm comprises several recursively-used sub-algorithms (i.e. Best-Starting-Time-Determination Procedure, Blocking-Time-Determination Procedure, Conflict-Checking Procedure, Conflict-Eliminating Procedure, Tune-up Procedure and Fine-tune Procedure) to guarantee feasibility by satisfying the blocking, no-wait, deadlock-free and conflict-free constraints. A two-stage hybrid heuristic algorithm (NWBPMJSS_Liu-Kozan-BIH) is developed by combining the NWBPMJSS_Liu-Kozan constructive algorithm and the Best-Insertion-Heuristic (BIH) algorithm to find the preferable train schedule in an efficient and economical way. Extensive computational experiments show that the proposed methodology is promising because it can be applied as a standard and fundamental toolbox for identifying, analysing, modelling and solving real-world scheduling problems.
Resumo:
We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms using indirect infer- ence. We embed this approach within a sequential Monte Carlo algorithm that is completely adaptive. This methodological development was motivated by an application involving data on macroparasite population evolution modelled with a trivariate Markov process. The main objective of the analysis is to compare inferences on the Markov process when considering two di®erent indirect mod- els. The two indirect models are based on a Beta-Binomial model and a three component mixture of Binomials, with the former providing a better ¯t to the observed data.
Resumo:
These lecture notes describe the use and implementation of a framework in which mathematical as well as engineering optimisation problems can be analysed. The foundations of the framework and algorithms described -Hierarchical Asynchronous Parallel Evolutionary Algorithms (HAPEAs) - lie upon traditional evolution strategies and incorporate the concepts of a multi-objective optimisation, hierarchical topology, asynchronous evaluation of candidate solutions , parallel computing and game strategies. In a step by step approach, the numerical implementation of EAs and HAPEAs for solving multi criteria optimisation problems is conducted providing the reader with the knowledge to reproduce these hand on training in his – her- academic or industrial environment.
Resumo:
Approximate Bayesian Computation’ (ABC) represents a powerful methodology for the analysis of complex stochastic systems for which the likelihood of the observed data under an arbitrary set of input parameters may be entirely intractable – the latter condition rendering useless the standard machinery of tractable likelihood-based, Bayesian statistical inference [e.g. conventional Markov chain Monte Carlo (MCMC) simulation]. In this paper, we demonstrate the potential of ABC for astronomical model analysis by application to a case study in the morphological transformation of high-redshift galaxies. To this end, we develop, first, a stochastic model for the competing processes of merging and secular evolution in the early Universe, and secondly, through an ABC-based comparison against the observed demographics of massive (Mgal > 1011 M⊙) galaxies (at 1.5 < z < 3) in the Cosmic Assembly Near-IR Deep Extragalatic Legacy Survey (CANDELS)/Extended Groth Strip (EGS) data set we derive posterior probability densities for the key parameters of this model. The ‘Sequential Monte Carlo’ implementation of ABC exhibited herein, featuring both a self-generating target sequence and self-refining MCMC kernel, is amongst the most efficient of contemporary approaches to this important statistical algorithm. We highlight as well through our chosen case study the value of careful summary statistic selection, and demonstrate two modern strategies for assessment and optimization in this regard. Ultimately, our ABC analysis of the high-redshift morphological mix returns tight constraints on the evolving merger rate in the early Universe and favours major merging (with disc survival or rapid reformation) over secular evolution as the mechanism most responsible for building up the first generation of bulges in early-type discs.
Resumo:
Analytically or computationally intractable likelihood functions can arise in complex statistical inferential problems making them inaccessible to standard Bayesian inferential methods. Approximate Bayesian computation (ABC) methods address such inferential problems by replacing direct likelihood evaluations with repeated sampling from the model. ABC methods have been predominantly applied to parameter estimation problems and less to model choice problems due to the added difficulty of handling multiple model spaces. The ABC algorithm proposed here addresses model choice problems by extending Fearnhead and Prangle (2012, Journal of the Royal Statistical Society, Series B 74, 1–28) where the posterior mean of the model parameters estimated through regression formed the summary statistics used in the discrepancy measure. An additional stepwise multinomial logistic regression is performed on the model indicator variable in the regression step and the estimated model probabilities are incorporated into the set of summary statistics for model choice purposes. A reversible jump Markov chain Monte Carlo step is also included in the algorithm to increase model diversity for thorough exploration of the model space. This algorithm was applied to a validating example to demonstrate the robustness of the algorithm across a wide range of true model probabilities. Its subsequent use in three pathogen transmission examples of varying complexity illustrates the utility of the algorithm in inferring preference of particular transmission models for the pathogens.
Resumo:
There exists various suggestions for building a functional and a fault-tolerant large-scale quantum computer. Topological quantum computation is a more exotic suggestion, which makes use of the properties of quasiparticles manifest only in certain two-dimensional systems. These so called anyons exhibit topological degrees of freedom, which, in principle, can be used to execute quantum computation with intrinsic fault-tolerance. This feature is the main incentive to study topological quantum computation. The objective of this thesis is to provide an accessible introduction to the theory. In this thesis one has considered the theory of anyons arising in two-dimensional quantum mechanical systems, which are described by gauge theories based on so called quantum double symmetries. The quasiparticles are shown to exhibit interactions and carry quantum numbers, which are both of topological nature. Particularly, it is found that the addition of the quantum numbers is not unique, but that the fusion of the quasiparticles is described by a non-trivial fusion algebra. It is discussed how this property can be used to encode quantum information in a manner which is intrinsically protected from decoherence and how one could, in principle, perform quantum computation by braiding the quasiparticles. As an example of the presented general discussion, the particle spectrum and the fusion algebra of an anyon model based on the gauge group S_3 are explicitly derived. The fusion algebra is found to branch into multiple proper subalgebras and the simplest one of them is chosen as a model for an illustrative demonstration. The different steps of a topological quantum computation are outlined and the computational power of the model is assessed. It turns out that the chosen model is not universal for quantum computation. However, because the objective was a demonstration of the theory with explicit calculations, none of the other more complicated fusion subalgebras were considered. Studying their applicability for quantum computation could be a topic of further research.
Resumo:
Stochastic volatility models are of fundamental importance to the pricing of derivatives. One of the most commonly used models of stochastic volatility is the Heston Model in which the price and volatility of an asset evolve as a pair of coupled stochastic differential equations. The computation of asset prices and volatilities involves the simulation of many sample trajectories with conditioning. The problem is treated using the method of particle filtering. While the simulation of a shower of particles is computationally expensive, each particle behaves independently making such simulations ideal for massively parallel heterogeneous computing platforms. In this paper, we present our portable Opencl implementation of the Heston model and discuss its performance and efficiency characteristics on a range of architectures including Intel cpus, Nvidia gpus, and Intel Many-Integrated-Core (mic) accelerators.
Resumo:
Background Ankylosing spondylitis (AS) is an immune-mediated arthritis particularly targeting the spine and pelvis and is characterised by inflammation, osteoproliferation and frequently ankylosis. Current treatments that predominately target inflammatory pathways have disappointing efficacy in slowing disease progression. Thus, a better understanding of the causal association and pathological progression from inflammation to bone formation, particularly whether inflammation directly initiates osteoproliferation, is required. Methods The proteoglycan-induced spondylitis (PGISp) mouse model of AS was used to histopathologically map the progressive axial disease events, assess molecular changes during disease progression and define disease progression using unbiased clustering of semi-quantitative histology. PGISp mice were followed over a 24-week time course. Spinal disease was assessed using a novel semi-quantitative histological scoring system that independently evaluated the breadth of pathological features associated with PGISp axial disease, including inflammation, joint destruction and excessive tissue formation (osteoproliferation). Matrix components were identified using immunohistochemistry. Results Disease initiated with inflammation at the periphery of the intervertebral disc (IVD) adjacent to the longitudinal ligament, reminiscent of enthesitis, and was associated with upregulated tumor necrosis factor and metalloproteinases. After a lag phase, established inflammation was temporospatially associated with destruction of IVDs, cartilage and bone. At later time points, advanced disease was characterised by substantially reduced inflammation, excessive tissue formation and ectopic chondrocyte expansion. These distinct features differentiated affected mice into early, intermediate and advanced disease stages. Excessive tissue formation was observed in vertebral joints only if the IVD was destroyed as a consequence of the early inflammation. Ectopic excessive tissue was predominantly chondroidal with chondrocyte-like cells embedded within collagen type II- and X-rich matrix. This corresponded with upregulation of mRNA for cartilage markers Col2a1, sox9 and Comp. Osteophytes, though infrequent, were more prevalent in later disease. Conclusions The inflammation-driven IVD destruction was shown to be a prerequisite for axial disease progression to osteoproliferation in the PGISp mouse. Osteoproliferation led to vertebral body deformity and fusion but was never seen concurrent with persistent inflammation, suggesting a sequential process. The findings support that early intervention with anti-inflammatory therapies will be needed to limit destructive processes and consequently prevent progression of AS.
Resumo:
Although various strategies have been developed for scheduling parallel applications with independent tasks, very little work exists for scheduling tightly coupled parallel applications on cluster environments. In this paper, we compare four different strategies based on performance models of tightly coupled parallel applications for scheduling the applications on clusters. In addition to algorithms based on existing popular optimization techniques, we also propose a new algorithm called Box Elimination that searches the space of performance model parameters to determine the best schedule of machines. By means of real and simulation experiments, we evaluated the algorithms on single cluster and multi-cluster setups. We show that our Box Elimination algorithm generates up to 80% more efficient schedule than other algorithms. We also show that the execution times of the schedules produced by our algorithm are more robust against the performance modeling errors.
Resumo:
This paper deals with the development of a new model for the cooling process on the runout table of hot strip mills, The suitability of different numerical methods for the solution of the proposed model equation from the point of view of accuracy and computation time are studied, Parallel solutions for the model equation are proposed.
Resumo:
Workstation clusters equipped with high performance interconnect having programmable network processors facilitate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in parallel database query execution is performed on the network processor. We evaluate the performance of TPC-H queries executing on a high end cluster where all tuple processing is done on the host processor, using a timed Petri net model, and find that tuple processing costs on the host processor dominate the execution time. These results are validated using a small cluster. We therefore propose 4 schemes where certain tuple processing activity is offloaded to the network processor. The first 2 schemes offload the tuple splitting activity - computation to identify the node on which to process the tuples, resulting in an execution time speedup of 1.09 relative to the base scheme, but with I/O bus becoming the bottleneck resource. In the 3rd scheme in addition to offloading tuple processing activity, the disk and network interface are combined to avoid the I/O bus bottleneck, which results in speedups up to 1.16, but with high host processor utilization. Our 4th scheme where the network processor also performs apart of join operation along with the host processor, gives a speedup of 1.47 along with balanced system resource utilizations. Further we observe that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64