Biblioteca Digital

864 resultados para High performance processors

Mechanical Properties of Rice Husk Ash (RHA) - High strength Concrete

Relevância:

90.00% 90.00%

Publicador:

Resumo:

High strength and high performance concrete are being widely used all over the world. Most of the applications of high strength concrete have been found in high rise buildings, long span bridges etc. The potential of rice husk ash as a cement replacement material is well established .Earlier researches showed an improvement in mechanical properties of high strength concrete with finely ground RHA as a partial cement replacement material. A review of literature urges the need for optimizing the replacement level of cement with RHA for improved mechanical properties at optimum water binder ratio. This paper discusses the mechanical properties of RHA- High strength concrete at optimized conditions

Modification of Polypropylene/High Density Polyethylene Blend Using Nanokaolinite Clay and E-Glass Fibre: Preparation, Characterization and Micromechanical Modelling

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Upgrading two widely used standard plastics, polypropylene (PP) and high density polyethylene (HDPE), and generating a variety of useful engineering materials based on these blends have been the main objective of this study. Upgradation was effected by using nanomodifiers and/or fibrous modifiers. PP and HDPE were selected for modification due to their attractive inherent properties and wide spectrum of use. Blending is the engineered method of producing new materials with tailor made properties. It has the advantages of both the materials. PP has high tensile and flexural strength and the HDPE acts as an impact modifier in the resultant blend. Hence an optimized blend of PP and HDPE was selected as the matrix material for upgradation. Nanokaolinite clay and E-glass fibre were chosen for modifying PP/HDPE blend. As the first stage of the work, the mechanical, thermal, morphological, rheological, dynamic mechanical and crystallization characteristics of the polymer nanocomposites prepared with PP/HDPE blend and different surface modified nanokaolinite clay were analyzed. As the second stage of the work, the effect of simultaneous inclusion of nanokaolinite clay (both N100A and N100) and short glass fibres are investigated. The presence of nanofiller has increased the properties of hybrid composites to a greater extent than micro composites. As the last stage, micromechanical modeling of both nano and hybrid A composite is carried out to analyze the behavior of the composite under load bearing conditions. These theoretical analyses indicate that the polymer-nanoclay interfacial characteristics partially converge to a state of perfect interfacial bonding (Takayanagi model) with an iso-stress (Reuss IROM) response. In the case of hybrid composites the experimental data follows the trend of Halpin-Tsai model. This implies that matrix and filler experience varying amount of strain and interfacial adhesion between filler and matrix and also between the two fillers which play a vital role in determining the modulus of the hybrid composites.A significant observation from this study is that the requirement of higher fibre loading for efficient reinforcement of polymers can be substantially reduced by the presence of nanofiller together with much lower fibre content in the composite. Hybrid composites with both nanokaolinite clay and micron sized E-glass fibre as reinforcements in PP/HDPE matrix will generate a novel class of high performance, cost effective engineering material.

A User-Centric Perspective on Parallel Programming with Focus on OpenMP

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.

Performance Evaluation of the Scheme 86 and HP Precision Architecture

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Scheme86 and the HP Precision Architectures represent different trends in computer processor design. The former uses wide micro-instructions, parallel hardware, and a low latency memory interface. The latter encourages pipelined implementation and visible interlocks. To compare the merits of these approaches, algorithms frequently encountered in numerical and symbolic computation were hand-coded for each architecture. Timings were done in simulators and the results were evaluated to determine the speed of each design. Based on these measurements, conclusions were drawn as to which aspects of each architecture are suitable for a high- performance computer.

Fat-Tree Routing for Transit

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Transit network provides high-speed, low-latency, fault-tolerant interconnect for high-performance, multiprocessor computers. The basic connection scheme for Transit uses bidelta style, multistage networks to support up to 256 processors. Scaling to larger machines by simply extending the bidelta network topology will result in a uniform degradation of network latency between all processors. By employing a fat-tree network structure in larger systems, the network provides locality and universality properties which can help minimize the impact of scaling on network latency. This report details the topology and construction issues associated with integrating Transit routing technology into fat-tree interconnect topologies.

ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

Fast Sorting on a Distributed-Memory Architecture

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.

Nested parallelism for multi-core HPC systems using Java

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.

Novel material combinations for enhanced infrared filter performance

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The introduction of non-toxic fluride compounds as direct replacements for Thorium Fluoride (ThF4) has renewed interest in the use of low index fluoride compounds in high performance infrared filters. This paper reports the results of an investigation into the effects of combining these low index materials, particularly Barium Fluoride (BaF2), with the high index material Lead Telluride (PbTe) in bandpass and edge filters. Infrared filter designs using conventional and the new material ombination are compared, and infrared filters using these material combinations have been manufactured and have been shown to suffer problems with residual stress. A possible solution to this problem utilising Zinc Sulphide (ZnS) layers with compensating compressive stress is discussed.

From saccades to smooth pursuit: real-time gaze control using motion feedback

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The authors present an active vision system which performs a surveillance task in everyday dynamic scenes. The system is based around simple, rapid motion processors and a control strategy which uses both position and velocity information. The surveillance task is defined in terms of two separate behavioral subsystems, saccade and smooth pursuit, which are demonstrated individually on the system. It is shown how these and other elementary responses to 2D motion can be built up into behavior sequences, and how judicious close cooperation between vision and control results in smooth transitions between the behaviors. These ideas are demonstrated by an implementation of a saccade to smooth pursuit surveillance system on a high-performance robotic hand/eye platform.

NeMo: a platform for neural modelling of spiking neurons using GPUs

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Simulating spiking neural networks is of great interest to scientists wanting to model the functioning of the brain. However, large-scale models are expensive to simulate due to the number and interconnectedness of neurons in the brain. Furthermore, where such simulations are used in an embodied setting, the simulation must be real-time in order to be useful. In this paper we present NeMo, a platform for such simulations which achieves high performance through the use of highly parallel commodity hardware in the form of graphics processing units (GPUs). NeMo makes use of the Izhikevich neuron model which provides a range of realistic spiking dynamics while being computationally efficient. Our GPU kernel can deliver up to 400 million spikes per second. This corresponds to a real-time simulation of around 40 000 neurons under biologically plausible conditions with 1000 synapses per neuron and a mean firing rate of 10 Hz.

Dynamic group communication for large-scale parallel data mining

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.

What is the right balance for performance and isolation with virtualization in HPC?

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The use of virtualization in high-performance computing (HPC) has been suggested as a means to provide tailored services and added functionality that many users expect from full-featured Linux cluster environments. The use of virtual machines in HPC can offer several benefits, but maintaining performance is a crucial factor. In some instances the performance criteria are placed above the isolation properties. This selective relaxation of isolation for performance is an important characteristic when considering resilience for HPC environments that employ virtualization. In this paper we consider some of the factors associated with balancing performance and isolation in configurations that employ virtual machines. In this context, we propose a classification of errors based on the concept of “error zones”, as well as a detailed analysis of the trade-offs between resilience and performance based on the level of isolation provided by virtualization solutions. Finally, a set of experiments are performed using different virtualization solutions to elucidate the discussion.

The development of a data-driven application benchmarking approach to performance modelling

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Performance modelling is a useful tool in the lifeycle of high performance scientific software, such as weather and climate models, especially as a means of ensuring efficient use of available computing resources. In particular, sufficiently accurate performance prediction could reduce the effort and experimental computer time required when porting and optimising a climate model to a new machine. In this paper, traditional techniques are used to predict the computation time of a simple shallow water model which is illustrative of the computation (and communication) involved in climate models. These models are compared with real execution data gathered on AMD Opteron-based systems, including several phases of the U.K. academic community HPC resource, HECToR. Some success is had in relating source code to achieved performance for the K10 series of Opterons, but the method is found to be inadequate for the next-generation Interlagos processor. The experience leads to the investigation of a data-driven application benchmarking approach to performance modelling. Results for an early version of the approach are presented using the shallow model as an example.

A multi-GPU algorithm for large-scale neuronal networks

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Large-scale simulations of parts of the brain using detailed neuronal models to improve our understanding of brain functions are becoming a reality with the usage of supercomputers and large clusters. However, the high acquisition and maintenance cost of these computers, including the physical space, air conditioning, and electrical power, limits the number of simulations of this kind that scientists can perform. Modern commodity graphical cards, based on the CUDA platform, contain graphical processing units (GPUs) composed of hundreds of processors that can simultaneously execute thousands of threads and thus constitute a low-cost solution for many high-performance computing applications. In this work, we present a CUDA algorithm that enables the execution, on multiple GPUs, of simulations of large-scale networks composed of biologically realistic Hodgkin-Huxley neurons. The algorithm represents each neuron as a CUDA thread, which solves the set of coupled differential equations that model each neuron. Communication among neurons located in different GPUs is coordinated by the CPU. We obtained speedups of 40 for the simulation of 200k neurons that received random external input and speedups of 9 for a network with 200k neurons and 20M neuronal connections, in a single computer with two graphic boards with two GPUs each, when compared with a modern quad-core CPU. Copyright (C) 2010 John Wiley & Sons, Ltd.

«
1
2
...
50
51
52
53
54
55
56
57
58
»