985 resultados para intel processor


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many meteorological phenomena occur at different locations simultaneously. These phenomena vary temporally and spatially. It is essential to track these multiple phenomena for accurate weather prediction. Efficient analysis require high-resolution simulations which can be conducted by introducing finer resolution nested simulations, nests at the locations of these phenomena. Simultaneous tracking of these multiple weather phenomena requires simultaneous execution of the nests on different subsets of the maximum number of processors for the main weather simulation. Dynamic variation in the number of these nests require efficient processor reallocation strategies. In this paper, we have developed strategies for efficient partitioning and repartitioning of the nests among the processors. As a case study, we consider an application of tracking multiple organized cloud clusters in tropical weather systems. We first present a parallel data analysis algorithm to detect such clouds. We have developed a tree-based hierarchical diffusion method which reallocates processors for the nests such that the redistribution cost is less. We achieve this by a novel tree reorganization approach. We show that our approach exhibits up to 25% lower redistribution cost and 53% lesser hop-bytes than the processor reallocation strategy that does not consider the existing processor allocation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Field Programmable Gate Array (FPGA) based hardware accelerator for multi-conductor parasitic capacitance extraction, using Method of Moments (MoM), is presented in this paper. Due to the prohibitive cost of solving a dense algebraic system formed by MoM, linear complexity fast solver algorithms have been developed in the past to expedite the matrix-vector product computation in a Krylov sub-space based iterative solver framework. However, as the number of conductors in a system increases leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products present a time bottleneck, especially for ill-conditioned system matrices. In this work, an FPGA based hardware implementation is proposed to parallelize the iterative matrix solution for multiple RHS vectors in a low-rank compression based fast solver scheme. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple conductors in a Ball Grid Array (BGA) package. Speed-ups up to 13x over equivalent software implementation on an Intel Core i5 processor for dense matrix-vector products and 12x for QR compressed matrix-vector products is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The correctness of a hard real-time system depends its ability to meet all its deadlines. Existing real-time systems use either a pure real-time scheduler or a real-time scheduler embedded as a real-time scheduling class in the scheduler of an operating system (OS). Existing implementations of schedulers in multicore systems that support real-time and non-real-time tasks, permit the execution of non-real-time tasks in all the cores with priorities lower than those of real-time tasks, but interrupts and softirqs associated with these non-real-time tasks can execute in any core with priorities higher than those of real-time tasks. As a result, the execution overhead of real-time tasks is quite large in these systems, which, in turn, affects their runtime. In order that the hard real-time tasks can be executed in such systems with minimal interference from other Linux tasks, we propose, in this paper, an integrated scheduler architecture, called SchedISA, which aims to considerably reduce the execution overhead of real-time tasks in these systems. In order to test the efficacy of the proposed scheduler, we implemented partitioned earliest deadline first (P-EDF) scheduling algorithm in SchedISA on Linux kernel, version 3.8, and conducted experiments on Intel core i7 processor with eight logical cores. We compared the execution overhead of real-time tasks in the above implementation of SchedISA with that in SCHED_DEADLINE's P-EDF implementation, which concurrently executes real-time and non-real-time tasks in Linux OS in all the cores. The experimental results show that the execution overhead of real-time tasks in the above implementation of SchedISA is considerably less than that in SCHED_DEADLINE. We believe that, with further refinement of SchedISA, the execution overhead of real-time tasks in SchedISA can be reduced to a predictable maximum, making it suitable for scheduling hard real-time tasks without affecting the CPU share of Linux tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article, a Field Programmable Gate Array (FPGA)-based hardware accelerator for 3D electromagnetic extraction, using Method of Moments (MoM) is presented. As the number of nets or ports in a system increases, leading to a corresponding increase in the number of right-hand-side (RHS) vectors, the computational cost for multiple matrix-vector products presents a time bottleneck in a linear-complexity fast solver framework. In this work, an FPGA-based hardware implementation is proposed toward a two-level parallelization scheme: (i) matrix level parallelization for single RHS and (ii) pipelining for multiple-RHS. The method is applied to accelerate electrostatic parasitic capacitance extraction of multiple nets in a Ball Grid Array (BGA) package. The acceleration is shown to be linearly scalable with FPGA resources and speed-ups over 10x against equivalent software implementation on a 2.4GHz Intel Core i5 processor is achieved using a Virtex-6 XC6VLX240T FPGA on Xilinx's ML605 board with the implemented design operating at 200MHz clock frequency. (c) 2016 Wiley Periodicals, Inc. Microwave Opt Technol Lett 58:776-783, 2016

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Computer generated holography is an extremely demanding and complex task when it comes to providing realistic reconstructions with full parallax, occlusion, and shadowing. We present an algorithm designed for data-parallel computing on modern graphics processing units to alleviate the computational burden. We apply Gaussian interpolation to create a continuous surface representation from discrete input object points. The algorithm maintains a potential occluder list for each individual hologram plane sample to keep the number of visibility tests to a minimum.We experimented with two approximations that simplify and accelerate occlusion computation. It is observed that letting several neighboring hologramplane samples share visibility information on object points leads to significantly faster computation without causing noticeable artifacts in the reconstructed images. Computing a reduced sample set via nonuniform sampling is also found to be an effective acceleration technique. © 2009 Optical Society of America.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Desde que se inventó el primer ordenador, uno de los objetivos ha sido que el ordenador fuese capaz de ejecutar más y más rápido, para poder así solucionar problemas más complejos. La primera solución fue aumentar la potencia de los procesadores, pero las limitaciones físicas impuestas por la velocidad de los componentes electrónicos han obligado a buscar otras formas de mejorar el rendimiento. Desde entonces, ha habido muchos tipos de tecnologías para aumentar el rendimiento como los multiprocesadores, las arquitecturas MIMD… pero nosotros analizaremos la arquitectura SIMD. Este tipo de procesadores fue muy usado en los supercomputadores de los años 80 y 90, pero el progreso de los microprocesadores hizo que esta tecnología quedara en un segundo plano. Hoy en día la todos los procesadores tienen arquitecturas que implementan las instrucciones SIMD (Single Instruction, Multiple Data). En este documento estudiaremos las tecnologías de SIMD de Intel SSE, AVX y AVX2 para ver si realmente usando el procesador vectorial con las instrucciones SIMD, se obtiene alguna mejora de rendimiento. Hay que tener en cuenta que AVX solo está disponible desde 2011 y AVX2 no ha estado disponible hasta el 2013, por lo tanto estaremos trabajando con nuevas tecnologías. Además este tipo de tecnologías tiene el futuro asegurado, al anunciar Intel su nueva tecnología, AVX- 512 para 2015.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A compact two-step modified-signed-digit arithmetic-logic array processor is proposed. When the reference digits are programmed, both addition and subtraction can be performed by the same binary logic operations regardless of the sign of the input digits. The optical implementation and experimental demonstration with an electron-trapping device are shown. Each digit is encoded by a single pixel, and no polarization is included. Any combinational logic can be easily performed without optoelectronic and electro-optic conversions of the intermediate results. The system is compact, general purpose, simple to align, and has a high signal-to-noise ratio. (C) 1999 Optical Society of America.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based on birefringence, a building-block stacking technique is suggested in this paper. A solid-state optical morphological processor module is thus developed, which is an integration of a beam array generator submodule, an optical connector submodule, and a Pockels readout optical modulator. It is shown that the technique is compact in construction, simple for fabrication, and insensitive to the environment.

Relevância:

20.00% 20.00%

Publicador: