135 resultados para Multiprocessor


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the use of a multiprocessor architecture for the performance improvement of tomographic image reconstruction. Image reconstruction in computed tomography (CT) is an intensive task for single-processor systems. We investigate the filtered image reconstruction suitability based on DSPs organized for parallel processing and its comparison with the Message Passing Interface (MPI) library. The experimental results show that the speedups observed for both platforms were increased in the same direction of the image resolution. In addition, the execution time to communication time ratios (Rt/Rc) as a function of the sample size have shown a narrow variation for the DSP platform in comparison with the MPI platform, which indicates its better performance for parallel image reconstruction.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genetic algorithms are commonly used to solve combinatorial optimizationproblems. The implementation evolves using genetic operators (crossover, mutation,selection, etc.). Anyway, genetic algorithms like some other methods have parameters(population size, probabilities of crossover and mutation) which need to be tune orchosen.In this paper, our project is based on an existing hybrid genetic algorithmworking on the multiprocessor scheduling problem. We propose a hybrid Fuzzy-Genetic Algorithm (FLGA) approach to solve the multiprocessor scheduling problem.The algorithm consists in adding a fuzzy logic controller to control and tunedynamically different parameters (probabilities of crossover and mutation), in anattempt to improve the algorithm performance. For this purpose, we will design afuzzy logic controller based on fuzzy rules to control the probabilities of crossoverand mutation. Compared with the Standard Genetic Algorithm (SGA), the resultsclearly demonstrate that the FLGA method performs significantly better.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In academia, it is common to create didactic processors, facing practical disciplines in the area of Hardware Computer and can be used as subjects in software platforms, operating systems and compilers. Often, these processors are described without ISA standard, which requires the creation of compilers and other basic software to provide the hardware / software interface and hinder their integration with other processors and devices. Using reconfigurable devices described in a HDL language allows the creation or modification of any microarchitecture component, leading to alteration of the functional units of data path processor as well as the state machine that implements the control unit even as new needs arise. In particular, processors RISP enable modification of machine instructions, allowing entering or modifying instructions, and may even adapt to a new architecture. This work, as the object of study addressing educational soft-core processors described in VHDL, from a proposed methodology and its application on two processors with different complexity levels, shows that it s possible to tailor processors for a standard ISA without causing an increase in the level hardware complexity, ie without significant increase in chip area, while its level of performance in the application execution remains unchanged or is enhanced. The implementations also allow us to say that besides being possible to replace the architecture of a processor without changing its organization, RISP processor can switch between different instruction sets, which can be expanded to toggle between different ISAs, allowing a single processor become adaptive hybrid architecture, which can be used in embedded systems and heterogeneous multiprocessor environments

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It bet on the next generation of computers as architecture with multiple processors and/or multicore processors. In this sense there are challenges related to features interconnection, operating frequency, the area on chip, power dissipation, performance and programmability. The mechanism of interconnection and communication it was considered ideal for this type of architecture are the networks-on-chip, due its scalability, reusability and intrinsic parallelism. The networks-on-chip communication is accomplished by transmitting packets that carry data and instructions that represent requests and responses between the processing elements interconnected by the network. The transmission of packets is accomplished as in a pipeline between the routers in the network, from source to destination of the communication, even allowing simultaneous communications between pairs of different sources and destinations. From this fact, it is proposed to transform the entire infrastructure communication of network-on-chip, using the routing mechanisms, arbitration and storage, in a parallel processing system for high performance. In this proposal, the packages are formed by instructions and data that represent the applications, which are executed on routers as well as they are transmitted, using the pipeline and parallel communication transmissions. In contrast, traditional processors are not used, but only single cores that control the access to memory. An implementation of this idea is called IPNoSys (Integrated Processing NoC System), which has an own programming model and a routing algorithm that guarantees the execution of all instructions in the packets, preventing situations of deadlock, livelock and starvation. This architecture provides mechanisms for input and output, interruption and operating system support. As proof of concept was developed a programming environment and a simulator for this architecture in SystemC, which allows configuration of various parameters and to obtain several results to evaluate it

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The constant increase of complexity in computer applications demands the development of more powerful hardware support for them. With processor's operational frequency reaching its limit, the most viable solution is the use of parallelism. Based on parallelism techniques and the progressive growth in the capacity of transistors integration in a single chip is the concept of MPSoCs (Multi-Processor System-on-Chip). MPSoCs will eventually become a cheaper and faster alternative to supercomputers and clusters, and applications developed for these high performance systems will migrate to computers equipped with MP-SoCs containing dozens to hundreds of computation cores. In particular, applications in the area of oil and natural gas exploration are also characterized by the high processing capacity required and would benefit greatly from these high performance systems. This work intends to evaluate a traditional and complex application of the oil and gas industry known as reservoir simulation, developing a solution with integrated computational systems in a single chip, with hundreds of functional unities. For this, as the STORM (MPSoC Directory-Based Platform) platform already has a shared memory model, a new distributed memory model were developed. Also a message passing library has been developed folowing MPI standard

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The increasingly request for processing power during last years has pushed integrated circuit industry to look for ways of providing even more processing power with less heat dissipation, power consumption, and chip area. This goal has been achieved increasing the circuit clock, but since there are physical limits of this approach a new solution emerged as the multiprocessor system on chip (MPSoC). This approach demands new tools and basic software infrastructure to take advantage of the inherent parallelism of these architectures. The oil exploration industry has one of its firsts activities the project decision on exploring oil fields, those decisions are aided by reservoir simulations demanding high processing power, the MPSoC may offer greater performance if its parallelism can be well used. This work presents a proposal of a micro-kernel operating system and auxiliary libraries aimed to the STORM MPSoC platform analyzing its influence on the problem of reservoir simulation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Software transaction memory (STM) systems have been used as an approach to improve performance, by allowing the concurrent execution of atomic blocks. However, under high-contention workloads, STM-based systems can considerably degrade performance, as transaction conflict rate increases. Contention management policies have been used as a way to select which transaction to abort when a conflict occurs. In general, contention managers are not capable of avoiding conflicts, as they can only select which transaction to abort and the moment it should restart. Since contention managers act only after a conflict is detected, it becomes harder to effectively increase transaction throughput. More proactive approaches have emerged, aiming at predicting when a transaction is likely to abort, postponing its execution. Nevertheless, most of the proposed proactive techniques are limited, as they do not replace the doomed transaction by another or, when they do, they rely on the operating system for that, having little or no control on which transaction to run. This article proposes LUTS, a lightweight user-level transaction scheduler. Unlike other techniques, LUTS provides the means for selecting another transaction to run in parallel, thus improving system throughput. We discuss LUTS design and propose a dynamic conflict-avoidance heuristic built around its scheduling capabilities. Experimental results, conducted with the STAMP and STMBench7 benchmark suites, running on TinySTM and SwissTM, show how our conflict-avoidance heuristic can effectively improve STM performance on high contention applications. © 2012 Springer Science+Business Media, LLC.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Providing support for multimedia applications on low-power mobile devices remains a significant research challenge. This is primarily due to two reasons: • Portable mobile devices have modest sizes and weights, and therefore inadequate resources, low CPU processing power, reduced display capabilities, limited memory and battery lifetimes as compared to desktop and laptop systems. • On the other hand, multimedia applications tend to have distinctive QoS and processing requirementswhichmake themextremely resource-demanding. This innate conflict introduces key research challenges in the design of multimedia applications and device-level power optimization. Energy efficiency in this kind of platforms can be achieved only via a synergistic hardware and software approach. In fact, while System-on-Chips are more and more programmable thus providing functional flexibility, hardwareonly power reduction techniques cannot maintain consumption under acceptable bounds. It is well understood both in research and industry that system configuration andmanagement cannot be controlled efficiently only relying on low-level firmware and hardware drivers. In fact, at this level there is lack of information about user application activity and consequently about the impact of power management decision on QoS. Even though operating system support and integration is a requirement for effective performance and energy management, more effective and QoSsensitive power management is possible if power awareness and hardware configuration control strategies are tightly integratedwith domain-specificmiddleware services. The main objective of this PhD research has been the exploration and the integration of amiddleware-centric energymanagement with applications and operating-system. We choose to focus on the CPU-memory and the video subsystems, since they are the most power-hungry components of an embedded system. A second main objective has been the definition and implementation of software facilities (like toolkits, API, and run-time engines) in order to improve programmability and performance efficiency of such platforms. Enhancing energy efficiency and programmability ofmodernMulti-Processor System-on-Chips (MPSoCs) Consumer applications are characterized by tight time-to-market constraints and extreme cost sensitivity. The software that runs on modern embedded systems must be high performance, real time, and even more important low power. Although much progress has been made on these problems, much remains to be done. Multi-processor System-on-Chip (MPSoC) are increasingly popular platforms for high performance embedded applications. This leads to interesting challenges in software development since efficient software development is a major issue for MPSoc designers. An important step in deploying applications on multiprocessors is to allocate and schedule concurrent tasks to the processing and communication resources of the platform. The problem of allocating and scheduling precedenceconstrained tasks on processors in a distributed real-time system is NP-hard. There is a clear need for deployment technology that addresses thesemulti processing issues. This problem can be tackled by means of specific middleware which takes care of allocating and scheduling tasks on the different processing elements and which tries also to optimize the power consumption of the entire multiprocessor platform. This dissertation is an attempt to develop insight into efficient, flexible and optimalmethods for allocating and scheduling concurrent applications tomultiprocessor architectures. It is a well-known problem in literature: this kind of optimization problems are very complex even in much simplified variants, therefore most authors propose simplified models and heuristic approaches to solve it in reasonable time. Model simplification is often achieved by abstracting away platform implementation ”details”. As a result, optimization problems become more tractable, even reaching polynomial time complexity. Unfortunately, this approach creates an abstraction gap between the optimization model and the real HW-SW platform. The main issue with heuristic or, more in general, with incomplete search is that they introduce an optimality gap of unknown size. They provide very limited or no information on the distance between the best computed solution and the optimal one. The goal of this work is to address both abstraction and optimality gaps, formulating accurate models which accounts for a number of ”non-idealities” in real-life hardware platforms, developing novel mapping algorithms that deterministically find optimal solutions, and implementing software infrastructures required by developers to deploy applications for the targetMPSoC platforms. Energy Efficient LCDBacklightAutoregulation on Real-LifeMultimediaAp- plication Processor Despite the ever increasing advances in Liquid Crystal Display’s (LCD) technology, their power consumption is still one of the major limitations to the battery life of mobile appliances such as smart phones, portable media players, gaming and navigation devices. There is a clear trend towards the increase of LCD size to exploit the multimedia capabilities of portable devices that can receive and render high definition video and pictures. Multimedia applications running on these devices require LCD screen sizes of 2.2 to 3.5 inches andmore to display video sequences and pictures with the required quality. LCD power consumption is dependent on the backlight and pixel matrix driving circuits and is typically proportional to the panel area. As a result, the contribution is also likely to be considerable in future mobile appliances. To address this issue, companies are proposing low power technologies suitable for mobile applications supporting low power states and image control techniques. On the research side, several power saving schemes and algorithms can be found in literature. Some of them exploit software-only techniques to change the image content to reduce the power associated with the crystal polarization, some others are aimed at decreasing the backlight level while compensating the luminance reduction by compensating the user perceived quality degradation using pixel-by-pixel image processing algorithms. The major limitation of these techniques is that they rely on the CPU to perform pixel-based manipulations and their impact on CPU utilization and power consumption has not been assessed. This PhDdissertation shows an alternative approach that exploits in a smart and efficient way the hardware image processing unit almost integrated in every current multimedia application processors to implement a hardware assisted image compensation that allows dynamic scaling of the backlight with a negligible impact on QoS. The proposed approach overcomes CPU-intensive techniques by saving system power without requiring either a dedicated display technology or hardware modification. Thesis Overview The remainder of the thesis is organized as follows. The first part is focused on enhancing energy efficiency and programmability of modern Multi-Processor System-on-Chips (MPSoCs). Chapter 2 gives an overview about architectural trends in embedded systems, illustrating the principal features of new technologies and the key challenges still open. Chapter 3 presents a QoS-driven methodology for optimal allocation and frequency selection for MPSoCs. The methodology is based on functional simulation and full system power estimation. Chapter 4 targets allocation and scheduling of pipelined stream-oriented applications on top of distributed memory architectures with messaging support. We tackled the complexity of the problem by means of decomposition and no-good generation, and prove the increased computational efficiency of this approach with respect to traditional ones. Chapter 5 presents a cooperative framework to solve the allocation, scheduling and voltage/frequency selection problem to optimality for energyefficient MPSoCs, while in Chapter 6 applications with conditional task graph are taken into account. Finally Chapter 7 proposes a complete framework, called Cellflow, to help programmers in efficient software implementation on a real architecture, the Cell Broadband Engine processor. The second part is focused on energy efficient software techniques for LCD displays. Chapter 8 gives an overview about portable device display technologies, illustrating the principal features of LCD video systems and the key challenges still open. Chapter 9 shows several energy efficient software techniques present in literature, while Chapter 10 illustrates in details our method for saving significant power in an LCD panel. Finally, conclusions are drawn, reporting the main research contributions that have been discussed throughout this dissertation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Single core capabilities have reached their maximum clock speed; new multicore architectures provide an alternative way to tackle this issue instead. The design of decoding applications running on top of these multicore platforms and their optimization to exploit all system computational power is crucial to obtain best results. Since the development at the integration level of printed circuit boards are increasingly difficult to optimize due to physical constraints and the inherent increase in power consumption, development of multiprocessor architectures is becoming the new Holy Grail. In this sense, it is crucial to develop applications that can run on the new multi-core architectures and find out distributions to maximize the potential use of the system. Today most of commercial electronic devices, available in the market, are composed of embedded systems. These devices incorporate recently multi-core processors. Task management onto multiple core/processors is not a trivial issue, and a good task/actor scheduling can yield to significant improvements in terms of efficiency gains and also processor power consumption. Scheduling of data flows between the actors that implement the applications aims to harness multi-core architectures to more types of applications, with an explicit expression of parallelism into the application. On the other hand, the recent development of the MPEG Reconfigurable Video Coding (RVC) standard allows the reconfiguration of the video decoders. RVC is a flexible standard compatible with MPEG developed codecs, making it the ideal tool to integrate into the new multimedia terminals to decode video sequences. With the new versions of the Open RVC-CAL Compiler (Orcc), a static mapping of the actors that implement the functionality of the application can be done once the application executable has been generated. This static mapping must be done for each of the different cores available on the working platform. It has been chosen an embedded system with a processor with two ARMv7 cores. This platform allows us to obtain the desired tests, get as much improvement results from the execution on a single core, and contrast both with a PC-based multiprocessor system. Las posibilidades ofrecidas por el aumento de la velocidad de la frecuencia de reloj de sistemas de un solo procesador están siendo agotadas. Las nuevas arquitecturas multiprocesador proporcionan una vía de desarrollo alternativa en este sentido. El diseño y optimización de aplicaciones de descodificación de video que se ejecuten sobre las nuevas arquitecturas permiten un mejor aprovechamiento y favorecen la obtención de mayores rendimientos. Hoy en día muchos de los dispositivos comerciales que se están lanzando al mercado están integrados por sistemas embebidos, que recientemente están basados en arquitecturas multinúcleo. El manejo de las tareas de ejecución sobre este tipo de arquitecturas no es una tarea trivial, y una buena planificación de los actores que implementan las funcionalidades puede proporcionar importantes mejoras en términos de eficiencia en el uso de la capacidad de los procesadores y, por ende, del consumo de energía. Por otro lado, el reciente desarrollo del estándar de Codificación de Video Reconfigurable (RVC), permite la reconfiguración de los descodificadores de video. RVC es un estándar flexible y compatible con anteriores codecs desarrollados por MPEG. Esto hace de RVC el estándar ideal para ser incorporado en los nuevos terminales multimedia que se están comercializando. Con el desarrollo de las nuevas versiones del compilador específico para el desarrollo de lenguaje RVC-CAL (Orcc), en el que se basa MPEG RVC, el mapeo estático, para entornos basados en multiprocesador, de los actores que integran un descodificador es posible. Se ha elegido un sistema embebido con un procesador con dos núcleos ARMv7. Esta plataforma nos permitirá llevar a cabo las pruebas de verificación y contraste de los conceptos estudiados en este trabajo, en el sentido del desarrollo de descodificadores de video basados en MPEG RVC y del estudio de la planificación y mapeo estático de los mismos.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In recent years a lot of research has been invested in parallel processing of numerical applications. However, parallel processing of Symbolic and AI applications has received less attention. This paper presents a system for parallel symbolic computitig, narned ACE, based on the logic programming paradigm. ACE is a computational model for the full Prolog language, capable of exploiting Or-parall< lism and Independent And-parallelism. In this paper vve focus on the implementation of the and-parallel part of the ACE system (ralled &ACE) on a shared memory multiprocessor, d< scribing its organization, some optimizations, and presenting some performance figures, proving the abilhy of &ACE to efficiently exploit parallelism.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An approximate analytic model of a shared memory multiprocessor with a Cache Only Memory Architecture (COMA), the busbased Data Difussion Machine (DDM), is presented and validated. It describes the timing and interference in the system as a function of the hardware, the protocols, the topology and the workload. Model results have been compared to results from an independent simulator. The comparison shows good model accuracy specially for non-saturated systems, where the errors in response times and device utilizations are independent of the number of processors and remain below 10% in 90% of the simulations. Therefore, the model can be used as an average performance prediction tool that avoids expensive simulations in the design of systems with many processors.