965 resultados para Application specific architectures


Relevância:

90.00% 90.00%

Publicador:

Resumo:

We describe a high-level design method to synthesize multi-phase regular arrays. The method is based on deriving component designs using classical regular (or systolic) array synthesis techniques and composing these separately evolved component design into a unified global design. Similarity transformations ar e applied to component designs in the composition stage in order to align data ow between the phases of the computations. Three transformations are considered: rotation, re ection and translation. The technique is aimed at the design of hardware components for high-throughput embedded systems applications and we demonstrate this by deriving a multi-phase regular array for the 2-D DCT algorithm which is widely used in many vide ocommunications applications.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Simulating spiking neural networks is of great interest to scientists wanting to model the functioning of the brain. However, large-scale models are expensive to simulate due to the number and interconnectedness of neurons in the brain. Furthermore, where such simulations are used in an embodied setting, the simulation must be real-time in order to be useful. In this paper we present NeMo, a platform for such simulations which achieves high performance through the use of highly parallel commodity hardware in the form of graphics processing units (GPUs). NeMo makes use of the Izhikevich neuron model which provides a range of realistic spiking dynamics while being computationally efficient. Our GPU kernel can deliver up to 400 million spikes per second. This corresponds to a real-time simulation of around 40 000 neurons under biologically plausible conditions with 1000 synapses per neuron and a mean firing rate of 10 Hz.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The complexity of current and emerging high performance architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven performance modelling approach is outlined that is appro- priate for modern multicore architectures. The approach is demonstrated by constructing a model of a simple shallow water code on a Cray XE6 system, from application-specific benchmarks that illustrate precisely how architectural char- acteristics impact performance. The model is found to recre- ate observed scaling behaviour up to 16K cores, and used to predict optimal rank-core affinity strategies, exemplifying the type of problem such a model can be used for.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The complexity of current and emerging architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven model is developed for a simple shallow water code on a Cray XE6 system, to explore how deployment choices such as domain decomposition and core affinity affect performance. The resource sharing present in modern multi-core architectures adds various levels of heterogeneity to the system. Shared resources often includes cache, memory, network controllers and in some cases floating point units (as in the AMD Bulldozer), which mean that the access time depends on the mapping of application tasks, and the core's location within the system. Heterogeneity further increases with the use of hardware-accelerators such as GPUs and the Intel Xeon Phi, where many specialist cores are attached to general-purpose cores. This trend for shared resources and non-uniform cores is expected to continue into the exascale era. The complexity of these systems means that various runtime scenarios are possible, and it has been found that under-populating nodes, altering the domain decomposition and non-standard task to core mappings can dramatically alter performance. To find this out, however, is often a process of trial and error. To better inform this process, a performance model was developed for a simple regular grid-based kernel code, shallow. The code comprises two distinct types of work, loop-based array updates and nearest-neighbour halo-exchanges. Separate performance models were developed for each part, both based on a similar methodology. Application specific benchmarks were run to measure performance for different problem sizes under different execution scenarios. These results were then fed into a performance model that derives resource usage for a given deployment scenario, with interpolation between results as necessary.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The increase of applications complexity has demanded hardware even more flexible and able to achieve higher performance. Traditional hardware solutions have not been successful in providing these applications constraints. General purpose processors have inherent flexibility, since they perform several tasks, however, they can not reach high performance when compared to application-specific devices. Moreover, since application-specific devices perform only few tasks, they achieve high performance, although they have less flexibility. Reconfigurable architectures emerged as an alternative to traditional approaches and have become an area of rising interest over the last decades. The purpose of this new paradigm is to modify the device s behavior according to the application. Thus, it is possible to balance flexibility and performance and also to attend the applications constraints. This work presents the design and implementation of a coarse grained hybrid reconfigurable architecture to stream-based applications. The architecture, named RoSA, consists of a reconfigurable logic attached to a processor. Its goal is to exploit the instruction level parallelism from intensive data-flow applications to accelerate the application s execution on the reconfigurable logic. The instruction level parallelism extraction is done at compile time, thus, this work also presents an optimization phase to the RoSA architecture to be included in the GCC compiler. To design the architecture, this work also presents a methodology based on hardware reuse of datapaths, named RoSE. RoSE aims to visualize the reconfigurable units through reusability levels, which provides area saving and datapath simplification. The architecture presented was implemented in hardware description language (VHDL). It was validated through simulations and prototyping. To characterize performance analysis some benchmarks were used and they demonstrated a speedup of 11x on the execution of some applications

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper investigates properties of integer programming models for a class of production planning problems. The models are developed within a decision support system to advise a sales team of the products on which to focus their efforts in gaining new orders in the short term. The products generally require processing on several manufacturing cells and involve precedence relationships. The cells are already (partially) committed with products for stock and to satisfy existing orders and therefore only the residual capacities of each cell in each time period of the planning horizon are considered. The determination of production recommendations to the sales team that make use of residual capacities is a nontrivial optimization problem. Solving such models is computationally demanding and techniques for speeding up solution times are highly desirable. An integer programming model is developed and various preprocessing techniques are investigated and evaluated. In addition, a number of cutting plane approaches have been applied. The performance of these approaches which are both general and application specific is examined.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We propose an asymmetric multi-processor SoC architecture, featuring a master CPU running uClinux, and multiple loosely-coupled slave CPUs running real-time threads assigned by the master CPU. Real-time SoC architectures often demand a compromise between a generic platform for different applications, and application-specific customizations to achieve performance requirements. Our proposed architecture offers a generic platform running a conventional embedded operating system providing a traditional software-oriented development approach, while multiple slave CPUs act as a dedicated independent real-time threads execution unit running in parallel of master CPU to achieve performance requirements. In this paper, the architecture is described, including the application / threading development environment. The performance of the architecture with several standard benchmark routines is also analysed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In-Motes Bins is an agent based real time In-Motes application developed for sensing light and temperature variations in an environment. In-Motes is a mobile agent middleware that facilitates the rapid deployment of adaptive applications in Wireless Sensor Networks (WSN's). In-Motes Bins is based on the injection of mobile agents into the WSN that can migrate or clone following specific rules and performing application specific tasks. Using In-Motes we were able to create and rapidly deploy our application on a WSN consisting of 10 MICA2 motes. Our application was tested in a wine store for a period of four months. In this paper we present the In-Motes Bins application and provide a detailed evaluation of its implementation. © 2007 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The recent advances in embedded systems world, lead us to more complex systems with application specific blocks (IP cores), the System on Chip (SoC) devices. A good example of these complex devices can be encountered in the cell phones that can have image processing cores, communication cores, memory card cores, and others. The need of augmenting systems’ processing performance with lowest power, leads to a concept of Multiprocessor System on Chip (MSoC) in which the execution of multiple tasks can be distributed along various processors. This thesis intends to address the creation of a synthesizable multiprocessing system to be placed in a FPGA device, providing a good flexibility to tailor the system to a specific application. To deliver a multiprocessing system, will be used the synthesisable 32-bit SPARC V8 compliant, LEON3 processor.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Traditional Real-Time Operating Systems (RTOS) are not designed to accommodate application specific requirements. They address a general case and the application must co-exist with any limitations imposed by such design. For modern real-time applications this limits the quality of services offered to the end-user. Research in this field has shown that it is possible to develop dynamic systems where adaptation is the key for success. However, adaptation requires full knowledge of the system state. To overcome this we propose a framework to gather data, and interact with the operating system, extending the traditional POSIX trace model with a partial reflective model. Such combination still preserves the trace mechanism semantics while creating a powerful platform to develop new dynamic systems, with little impact in the system and avoiding complex changes in the kernel source code.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

IEEE International Conference on Cyber Physical Systems, Networks and Applications (CPSNA'15), Hong Kong, China.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Poster in 12th European Conference on Wireless Sensor Networks (EWSN 2015). 9 to 11, Feb, 2015, pp 24-25. Porto, Portugal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The vision of the Internet of Things (IoT) includes large and dense deployment of interconnected smart sensing and monitoring devices. This vast deployment necessitates collection and processing of large volume of measurement data. However, collecting all the measured data from individual devices on such a scale may be impractical and time consuming. Moreover, processing these measurements requires complex algorithms to extract useful information. Thus, it becomes imperative to devise distributed information processing mechanisms that identify application-specific features in a timely manner and with a low overhead. In this article, we present a feature extraction mechanism for dense networks that takes advantage of dominance-based medium access control (MAC) protocols to (i) efficiently obtain global extrema of the sensed quantities, (ii) extract local extrema, and (iii) detect the boundaries of events, by using simple transforms that nodes employ on their local data. We extend our results for a large dense network with multiple broadcast domains (MBD). We discuss and compare two approaches for addressing the challenges with MBD and we show through extensive evaluations that our proposed distributed MBD approach is fast and efficient at retrieving the most valuable measurements, independent of the number sensor nodes in the network.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tese de Doutoramento Plano Doutoral em Engenharia Eletrónica e de Computadores.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En dispositivos electrónicos de última generación destinados a funciones de comunicación o control automático, los algoritmos de procesamiento digital de señales trasladados al hardware han ocupado un lugar fundamental. Es decir el estado de arte en el área de las comunicaciones y control puede resumirse en algoritmos basados en procesamiento digital de señales. Las implementaciones digitales de estos algoritmos han sido estudiadas en áreas de la informática desde hace tiempo. Sin embargo, aunque el incremento en la complejidad de los algoritmos modernos permite alcanzar desempeños atractivos en aplicaciones específicas, a su vez impone restricciones en la velocidad de operación que han motivado el diseño directamente en hardware de arquitecturas para alto rendimiento. En este contexto, los circuitos electrónicos basados en lógica programable, principalmente los basados en FPGA (Field-Programmable Gate Array), permiten obtener medidas de desempeño altamente confiables que proporcionan el acercamiento necesario hacia el diseño electrónico de circuitos para aplicaciones específicas “ASIC-VLSI” (Application Specific Integrated Circuit - Very Large Scale Integration). En este proyecto se analiza el diseño y la implementación de aquitecturas electrónicas para el procesamiento digital de señales, con el objeto de obtener medidas reales sobre el comportamiento del canal inalámbrico y su influencia sobre la estimación y el control de trayectoria en vehículos aéreos no tripulados (UAV, Unmanned Aerial Vehicle). Para esto se propone analizar un dispositivo híbrido basado en microcontroladores y circuitos FPGA y sobre este mismo dispositivo implementar mediante algoritmo un control de trayectoria que permita mantener un punto fijo en el centro del cuadro de una cámara de video a bordo de un UAV, que sea eficiente en términos de velocidad de operación, dimensiones y consumo de energía.