Biblioteca Digital

55 resultados para Proposed architectures

CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.

Veja mais

Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The twin demands of energy-efficiency and higher performance on DRAM are highly emphasized in multicore architectures. A variety of schemes have been proposed to address either the latency or the energy consumption of DRAMs. These schemes typically require non-trivial hardware changes and end up improving latency at the cost of energy or vice-versa. One specific DRAM performance problem in multicores is that interleaved accesses from different cores can potentially degrade row-buffer locality. In this paper, based on the temporal and spatial locality characteristics of memory accesses, we propose a reorganization of the existing single large row-buffer in a DRAM bank into multiple sub-row buffers (MSRB). This re-organization not only improves row hit rates, and hence the average memory latency, but also brings down the energy consumed by the DRAM. The first major contribution of this work is proposing such a reorganization without requiring any significant changes to the existing widely accepted DRAM specifications. Our proposed reorganization improves weighted speedup by 35.8%, 14.5% and 21.6% in quad, eight and sixteen core workloads along with a 42%, 28% and 31% reduction in DRAM energy. The proposed MSRB organization enables opportunities for the management of multiple row-buffers at the memory controller level. As the memory controller is aware of the behaviour of individual cores it allows us to implement coordinated buffer allocation schemes for different cores that take into account program behaviour. We demonstrate two such schemes, namely Fairness Oriented Allocation and Performance Oriented Allocation, which show the flexibility that memory controllers can now exploit in our MSRB organization to improve overall performance and/or fairness. Further, the MSRB organization enables additional opportunities for DRAM intra-bank parallelism and selective early precharging of the LRU row-buffer to further improve memory access latencies. These two optimizations together provide an additional 5.9% performance improvement.

Veja mais

Hydrogen bond seen, halogen bond defined and carbon bond proposed: intermolecular bonding, a field that is maturing!

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Tripodal Bile Acid Architectures Based on a Triarylphosphine Oxide Core Obtained by Copper-Catalysed 1,3]-Dipolar Cycloaddition: Synthesis and Preliminary Aggregation Studies

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report the synthesis and aggregation behaviour of new water-soluble, bile acid derived tripodal architectures based on a core derived from triphenylphosphine oxide. We employed the well-established copper-catalysed 1,3]-dipolar cycloaddition (CuAAC) for the construction of these tripodal molecules. The aggregation behaviour of these molecules in aqueous media was studied by different analytical methods such as dye solubilisation, dynamic light scattering, NMR and AFM. These molecular architectures also offer an additional advantage in aiding understanding of the influence of the nature of the bile acid backbone and of the configuration at the steroid C-3 position in these architectures; to the best of our knowledge this has not been reported in the literature. The unique gelation properties of the -derivatives were explained through molecular modelling studies and the mechanical behaviour of these gels was studied by rheology experiments.

Veja mais

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes.

Veja mais

Parallel Flow-Sensitive Pointer Analysis by Graph-Rewriting

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Precise pointer analysis is a problem of interest to both the compiler and the program verification community. Flow-sensitivity is an important dimension of pointer analysis that affects the precision of the final result computed. Scaling flow-sensitive pointer analysis to millions of lines of code is a major challenge. Recently, staged flow-sensitive pointer analysis has been proposed, which exploits a sparse representation of program code created by staged analysis. In this paper we formulate the staged flow-sensitive pointer analysis as a graph-rewriting problem. Graph-rewriting has already been used for flow-insensitive analysis. However, formulating flow-sensitive pointer analysis as a graph-rewriting problem adds additional challenges due to the nature of flow-sensitivity. We implement our parallel algorithm using Intel Threading Building Blocks and demonstrate considerable scaling (upto 2.6x) for 8 threads on a set of 10 benchmarks. Compared to the sequential implementation of staged flow-sensitive analysis, a single threaded execution of our implementation performs better in 8 of the benchmarks.

Veja mais

Design and morphology control of a thiophene derivative through electrospraying using various solvents

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present work, electrospraying of an organic molecule is carried out using various solvents, obtaining fibril structures along with a range of distinct morphologies. Solvent characteristics play a major role in determining the morphology of the organic material. A thiophene derivative (7,9-di(thiophen-2-yl)-8H-cyclopentaa]acenaphthylen-8-one) (DTCPA) of donor-acceptor-donor (DAD) architecture is used to study this solvent effect. Seven solvents with decreasing vapour pressure are selected for experiments. Electrospraying is conducted at a solution concentration of 1.5 wt% and a constant applied voltage of 15 kV. Gradual transformation in morphology of the electrospun product from spiked-spheres to only spikes is observed. A mechanism describing this transformation is proposed based on electron micrograph analysis and XRD analysis. These data indicate that the morphological change is due to the synergistic effect of both vapour pressure and dielectric constant of the solvents. Through a reasonable control of the crystallite size and morphology along with the proposal of the transformation mechanism, this study elucidates electrospraying as a prospective method for designing architectures in organic electronics.

Veja mais

Router Attack toward NoC-enabled MPSoC and Monitoring Countermeasures against such Threat

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The growing number of applications and processing units in modern Multiprocessor Systems-on-Chips (MPSoCs) come along with reduced time to market. Different IP cores can come from different vendors, and their trust levels are also different, but typically they use Network-on-Chip (NoC) as their communication infrastructure. An MPSoC can have multiple Trusted Execution Environments (TEEs). Apart from performance, power, and area research in the field of MPSoC, robust and secure system design is also gaining importance in the research community. To build a secure system, the designer must know beforehand all kinds of attack possibilities for the respective system (MPSoC). In this paper we survey the possible attack scenarios on present-day MPSoCs and investigate a new attack scenario, i.e., router attack targeted toward NoC architecture. We show the validity of this attack by analyzing different present-day NoC architectures and show that they are all vulnerable to this type of attack. By launching a router attack, an attacker can control the whole chip very easily, which makes it a very serious issue. Both routing tables and routing logic-based routers are vulnerable to such attacks. In this paper, we address attacks on routing tables. We propose different monitoring-based countermeasures against routing table-based router attack in an MPSoC having multiple TEEs. Synthesis results show that proposed countermeasures, viz. Runtime-monitor, Restart-monitor, Intermediate manager, and Auditor, occupy areas that are 26.6, 22, 0.2, and 12.2 % of a routing table-based router area. Apart from these, we propose Ejection address checker and Local monitoring module inside a router that cause 3.4 and 10.6 % increase of a router area, respectively. Simulation results are also given, which shows effectiveness of proposed monitoring-based countermeasures.

Veja mais

Impact of lifetime control on the threshold of quantum dot lasers

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite significant improvements in their properties as emitters, colloidal quantum dots have not had much success in emerging as suitable materials for laser applications. Gain in most colloidal systems is short lived, and needs to compete with biexcitonic decay. This has necessitated the use of short pulsed lasers to pump quantum dots to thresholds needed for amplified spontaneous emission or lasing. Continuous wave pumping of gain that is possible in some inorganic phosphors has therefore remained a very distant possibility for quantum dots. Here, we demonstrate that trilayer heterostructures could provide optimal conditions for demonstration of continuous wave lasing in colloidal materials. The design considerations for these materials are discussed in terms of a kinetic model. The electronic structure of the proposed dot architectures is modeled within effective mass theory.

Veja mais

Conceptual design of three-dimensional scaffolds of powder-based materials for bone tissue engineering applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose - The purpose of this paper is to investigate the possibility to construct tissue-engineered bone repair scaffolds with pore size distributions using rapid prototyping techniques. Design/methodology/approach - The fabrication of porous scaffolds with complex porous architectures represents a major challenge in tissue engineering and the design aspects to mimic complex pore shape as well as spatial distribution of pore sizes of natural hard tissue remain unexplored. In this context, this work aims to evaluate the three-dimensional printing process to study its potential for scaffold fabrication as well as some innovative design of homogeneously porous or gradient porous scaffolds is described and such design has wider implication in the field of bone tissue engineering. Findings - The present work discusses biomedically relevant various design strategies with spatial/radial gradient in pore sizes as well as with different pore sizes and with different pore geometries. Originality/value - One of the important implications of the proposed novel design scheme would be the development of porous bioactive/biodegradable composites with gradient pore size, porosity, composition and with spatially distributed biochemical stimuli so that stem cells loaded into scaffolds would develop into complex tissues such as those at the bone-cartilage interface.

Veja mais

55 resultados para Proposed architectures

Filtro por publicador