967 resultados para Parallel or distributed processing


Relevância:

40.00% 40.00%

Publicador:

Resumo:

O estudo do fluxo de água e do transporte escalar em reservatórios hidrelétricos é importante para a determinação da qualidade da água durante as fases iniciais do enchimento e durante a vida útil do reservatório. Neste contexto, um código de elementos finitos paralelo 2D foi implementado para resolver as equações de Navier-Stokes para fluido incompressível acopladas a transporte escalar, utilizando o modelo de programação de troca de mensagens, a fim de realizar simulações em um ambiente de cluster de computadores. A discretização espacial é baseada no elemento MINI, que satisfaz as condições de Babuska-Brezzi (BB), que permite uma formulação mista estável. Todas as estruturas de dados distribuídos necessárias nas diferentes fases do código, como pré-processamento, solução e pós-processamento, foram implementadas usando a biblioteca PETSc. Os sistemas lineares resultantes foram resolvidos usando o método da projeção discreto com fatoração LU por blocos. Para aumentar o desempenho paralelo na solução dos sistemas lineares, foi empregado o método de condensação estática para resolver a velocidade intermediária nos vértices e no centróide do elemento MINI separadamente. Os resultados de desempenho do método de condensação estática com a abordagem da solução do sistema completo foram comparados. Os testes mostraram que o método de condensação estática apresenta melhor desempenho para grandes problemas, às custas de maior uso de memória. O desempenho de outras partes do código também são apresentados.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Este trabalho apresenta a proposta de um middleware, chamado DistributedCL, que torna transparente o processamento paralelo em GPUs distribuídas. Com o suporte do middleware DistributedCL uma aplicação, preparada para utilizar a API OpenCL, pode executar de forma distribuída, utilizando GPUs remotas, de forma transparente e sem necessidade de alteração ou nova compilação do seu código. A arquitetura proposta para o middleware DistributedCL é modular, com camadas bem definidas e um protótipo foi construído de acordo com a arquitetura, onde foram empregados vários pontos de otimização, incluindo o envio de dados em lotes, comunicação assíncrona via rede e chamada assíncrona da API OpenCL. O protótipo do middleware DistributedCL foi avaliado com o uso de benchmarks disponíveis e também foi desenvolvido o benchmark CLBench, para avaliação de acordo com a quantidade dos dados. O desempenho do protótipo se mostrou bom, superior às propostas semelhantes, tendo alguns resultados próximos do ideal, sendo o tamanho dos dados para transmissão através da rede o maior fator limitante.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An optical fiber strain sensing technique, based on Brillouin Optical Time Domain Reflectometry (BOTDR), was used to obtain the full deformation profile of a secant pile wall during construction of an adjacent basement in London. Details of the installation of sensors as well as data processing are described. By installing optical fiber down opposite sides of the pile, the distributed strain profiles obtained can be used to give both the axial and lateral movements along the pile. Measurements obtained from the BOTDR were found in good agreement with inclinometer data from the adjacent piles. The relative merits of the two different techniques are discussed. © 2007 ASCE.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Cambridge Flow Solutions Ltd, Compass House, Vision Park, Cambridge, CB4 9AD, UK Real-world simulation challenges are getting bigger: virtual aero-engines with multistage blade rows coupled with their secondary air systems & with fully featured geometry; environmental flows at meta-scales over resolved cities; synthetic battlefields. It is clear that the future of simulation is scalable, end-to-end parallelism. To address these challenges we have reported in a sequence of papers a series of inherently parallel building blocks based on the integration of a Level Set based geometry kernel with an octree-based cut-Cartesian mesh generator, RANS flow solver, post-processing and geometry management & editing. The cut-cells which characterize the approach are eliminated by exporting a body-conformal mesh driven by the underpinning Level Set and managed by mesh quality optimization algorithms; this permits third party flow solvers to be deployed. This paper continues this sequence by reporting & demonstrating two main novelties: variable depth volume mesh refinement enabling variable surface mesh refinement and a radical rework of the mesh generation into a bottom-up system based on Space Filling Curves. Also reported are the associated extensions to body-conformal mesh export. Everything is implemented in a scalable, parallel manner. As a practical demonstration, meshes of guaranteed quality are generated for a fully resolved, generic aircraft carrier geometry, a cooled disc brake assembly and a B747 in landing configuration. Copyright © 2009 by W.N.Dawes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

High-throughput DNA sequencing (HTS) instruments today are capable of generating millions of sequencing reads in a short period of time, and this represents a serious challenge to current bioinformatics pipeline in processing such an enormous amount of data in a fast and economical fashion. Modern graphics cards are powerful processing units that consist of hundreds of scalar processors in parallel in order to handle the rendering of high-definition graphics in real-time. It is this computational capability that we propose to harness in order to accelerate some of the time-consuming steps in analyzing data generated by the HTS instruments. We have developed BarraCUDA, a novel sequence mapping software that utilizes the parallelism of NVIDIA CUDA graphics cards to map sequencing reads to a particular location on a reference genome. While delivering a similar mapping fidelity as other mainstream programs , BarraCUDA is a magnitude faster in mapping throughput compared to its CPU counterparts. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the mapping throughput. BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the mapping of millions of sequencing reads generated by HTS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available at http://seqbarracuda.sf.net

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We show that the sensor self-localization problem can be cast as a static parameter estimation problem for Hidden Markov Models and we implement fully decentralized versions of the Recursive Maximum Likelihood and on-line Expectation-Maximization algorithms to localize the sensor network simultaneously with target tracking. For linear Gaussian models, our algorithms can be implemented exactly using a distributed version of the Kalman filter and a novel message passing algorithm. The latter allows each node to compute the local derivatives of the likelihood or the sufficient statistics needed for Expectation-Maximization. In the non-linear case, a solution based on local linearization in the spirit of the Extended Kalman Filter is proposed. In numerical examples we demonstrate that the developed algorithms are able to learn the localization parameters. © 2012 IEEE.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The interface of wet oxidized Al0.97Ga0.03As/GaAs in a distributed Bragg reflector (DBR) structure has been studied by means of transmission electron microscopy and Raman spectroscopy. With the extension of oxidation time, the oxide/GaAs interfaces are not abrupt any more. There is an amorphous film near the oxide/GaAs interface, which is Ga2O3 related to the prolonged heating. In the samples oxidized for 10 and 20 min, there are some fissures along the oxidized AlGaAs/GaAs interfaces. In the samples oxidized or in situ annealed for long time, no such fissures are present due to the complete removal of the volatile products.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One of the most important functions in the individual development is the interaction and integration of each sensory input. There exist two competing theories, i.e. the deficiency theory and the compensatory theory, regarding the origin and nature of changes in visual functions observed after auditory deprivation. The deficiency theory proposed that integrative processes are essential for normal development. In contrast, the compensatory theory stated that the loss of one sense may be met by a greater reliance upon, therefore an enhancement of the remaining senses. Given that hearing impaired children’s learning depends primarily on visual information, it is important to recognize the differences of visual attention between them and their hearing age-mates. Differences among age groups could exist in either selectivity or sustained attention. Study 1 and study 2 explored the selective and sustained attention development of hearing impaired and hearing students with average cognitive ability, aged from 7 years to college students. The analysis and discussion of the results are based on the visual attention development as well as deficiency theory and compensatory theory. According to the results of the study 1 and study 2, the spatial distribution and controlling of the visual attention between hearing impaired and hearing students were also investigated in the study 3 and study 4. The present work showed that: Firstly, both hearing impaired and hearing participants had the similar developmental trajectory of the sustained attention. The ability of children’s sustained attention appeared to improve with age, and in adolescence it reached the peak. The hearing impaired participants had the comparable sustained attention skills to the matched hearing ones. Besides, the results of the hearing impaired participants showed that they could maintain their attention and vigilance on the current task over the observation period. Secondly, group differences of visual attention development were found between hearing impaired and hearing participants. In the childhood, the visual attention developmental speed of the hearing impaired children was slower than that of the hearing ones. The selective attention skill of the hearing impaired were not comparable to the hearing ones, however, their selective skill improved with age, so in the adulthood, hearing impaired students showed the slight advantage in the selective attention skill over the hearing ones. Thirdly, hearing impaired and hearing participants showed the similar spatial distribution in the attention resources. In the low perceptual load condition, both participants were suffered great interference of the distrator at the fixation. In contrast, in the high perceptual load condition, hearing impaired adults were suffered more interference of the peripheral distractor, which suggested that they distributed more attention resources to the peripheral field when faced difficult tasks. Fourthly, both groups showed similar processing in the visual attention tasks. That is, they both searched the target with only the color feature in a parallel way, but in a serial way while processing orientation feature and the features with the combination of the color and orientation. Furthermore, the results indicated that two groups show similar ways in the attention controlling. In summary, the present study showed that visual attention development was dependent upon the integration of multimodal sensory information. Because of the interaction and integration of the input from various sensory, it has a negative impact on the intact sensory at the early stage of one sensory loss, however, it can better the functions of other intact sensory gradually with development and practice.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A vernier offset is detected at once among straight lines, and reaction times are almost independent of the number of simultaneously presented stimuli (distractors), indicating parallel processing of vernier offsets. Reaction times for identifying a vernier offset to one side among verniers offset to the opposite side increase with the number of distractors, indicating serial processing. Even deviations below a photoreceptor diameter can be detected at once. The visual system thus attains positional accuracy below the photoreceptor diameter simultaneously at different positions. I conclude that deviation from straightness, or change of orientation, is detected in parallel over the visual field. Discontinuities or gradients in orientation may represent an elementary feature of vision.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper presents a model for the general flow in the neocortex. The basic process, called "sequence-seeking," is a search for a sequence of mappings or transformations, linking source and target representations. The search is bi-directional, "bottom-up" as well as "top-down," and it explores in parallel a large numbe rof alternative sequences. This operation is implemented in a structure termed "counter streams," in which multiple sequences are explored along two separate, complementary pathways which seeking to meet. The first part of the paper discusses the general sequence-seeking scheme and a number of related processes, such as the learning of successful sequences, context effects, and the use of "express lines" and partial matches. The second part discusses biological implications of the model in terms of connections within and between cortical areas. The model is compared with existing data, and a number of new predictions are proposed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This report describes Processor Coupling, a mechanism for controlling multiple ALUs on a single integrated circuit to exploit both instruction-level and inter-thread parallelism. A compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting inter-thread parallelism to maintain high ALU utilization. ALUs are assigned to threads on a cycle byscycle basis, and several threads can be active concurrently. Simulation results show that Processor Coupling performs well both on single threaded and multi-threaded applications. The experiments address the effects of memory latencies, function unit latencies, and communication bandwidth between function units.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization. Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Euterpe is a real-time computer system for the modeling of musical structures. It provides a formalism wherein familiar concepts of musical analysis may be readily expressed. This is verified by its application to the analysis of a wide variety of conventional forms of music: Gregorian chant, Mediaeval polyphony, Back counterpoint, and sonata form. It may be of further assistance in the real-time experiments in various techniques of thematic development. Finally, the system is endowed with sound-synthesis apparatus with which the user may prepare tapes for musical performances.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Huelse, M, Barr, D R W, Dudek, P: Cellular Automata and non-static image processing for embodied robot systems on a massively parallel processor array. In: Adamatzky, A et al. (eds) AUTOMATA 2008, Theory and Applications of Cellular Automata. Luniver Press, 2008, pp. 504-510. Sponsorship: EPSRC