966 resultados para Multi-cluster processor


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-core processors is a design philosophy that has become mainstream in scientific and engineering applications. Increasing performance and gate capacity of recent FPGA devices has permitted complex logic systems to be implemented on a single programmable device. By using VHDL here we present an implementation of one multi-core processor by using the PLASMA IP core based on the (most) MIPS I ISA and give an overview of the processor architecture and share theexecution results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this work is to evaluate the SEE sensitivity of a multi-core processor having implemented ECC and parity in their cache memories. Two different application scenarios are studied. The first one configures the multi-core in Asymmetric Multi-Processing mode running a memory-bound application, whereas the second one uses the Symmetric Multi-Processsing mode running a CPU-bound application. The experiments were validated through radiation ground testing performed with 14 MeV neutrons on the Freescale P2041 multi-core manufactured in 45nm SOI technology. A deep analysis of the observed errors in cache memories was carried-out in order to reveal vulnerabilities in the cache protection mechanisms. Critical zones like tag addresses were affected during the experiments. In addition, the results show that the sensitivity strongly depends on the application and the multi-processsing mode used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e Computadores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clusters de computadores são geralmente utilizados para se obter alto desempenho na execução de aplicações paralelas. Sua utilização tem aumentado significativamente ao longo dos anos e resulta hoje em uma presença de quase 60% entre as 500 máquinas mais rápidas do mundo. Embora a utilização de clusters seja bastante difundida, a tarefa de monitoramento de recursos dessas máquinas é considerada complexa. Essa complexidade advém do fato de existirem diferentes configurações de software e hardware que podem ser caracterizadas como cluster. Diferentes configurações acabam por fazer com que o administrador de um cluster necessite de mais de uma ferramenta de monitoramento para conseguir obter informações suficientes para uma tomada de decisão acerca de eventuais problemas que possam estar acontecendo no seu cluster. Outra situação que demonstra a complexidade da tarefa de monitoramento acontece quando o desenvolvedor de aplicações paralelas necessita de informações relativas ao ambiente de execução da sua aplicação para entender melhor o seu comportamento. A execução de aplicações paralelas em ambientes multi-cluster e grid juntamente com a necessidade de informações externas à aplicação é outra situação que necessita da tarefa de monitoramento. Em todas essas situações, verifica-se a existência de múltiplas fontes de dados independentes e que podem ter informações relacionadas ou complementares. O objetivo deste trabalho é propor um modelo de integração de dados que pode se adaptar a diferentes fontes de informação e gerar como resultado informações integradas que sejam passíveis de uma visualização conjunta por alguma ferramenta. Esse modelo é baseado na depuração offline de aplicações paralelas e é dividido em duas etapas: a coleta de dados e uma posterior integração das informações. Um protótipo baseado nesse modelo de integração é descrito neste trabalho Esse protótipo utiliza como fontes de informação as ferramentas de monitoramento de cluster Ganglia e Performance Co-Pilot, bibliotecas de rastreamento de aplicações DECK e MPI e uma instrumentação do Sistema operacional Linux para registrar as trocas de contexto de um conjunto de processos. Pajé é a ferramenta escolhida para a visualização integrada das informações. Os resultados do processo de integração de dados pelo protótipo apresentado neste trabalho são caracterizados em três tipos: depuração de aplicações DECK, depuração de aplicações MPI e monitoramento de cluster. Ao final do texto, são delineadas algumas conclusões e contribuições desse trabalho, assim como algumas sugestões de trabalhos futuros.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Breakthrough advances in microprocessor technology and efficient power management have altered the course of development of processors with the emergence of multi-core processor technology, in order to bring higher level of processing. The utilization of many-core technology has boosted computing power provided by cluster of workstations or SMPs, providing large computational power at an affordable cost using solely commodity components. Different implementations of message-passing libraries and system softwares (including Operating Systems) are installed in such cluster and multi-cluster computing systems. In order to guarantee correct execution of message-passing parallel applications in a computing environment other than that originally the parallel application was developed, review of the application code is needed. In this paper, a hybrid communication interfacing strategy is proposed, to execute a parallel application in a group of computing nodes belonging to different clusters or multi-clusters (computing systems may be running different operating systems and MPI implementations), interconnected with public or private IP addresses, and responding interchangeably to user execution requests. Experimental results demonstrate the feasibility of this proposed strategy and its effectiveness, through the execution of benchmarking parallel applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The IEEE 802.15.4/Zigbee protocols are a promising technology for Wireless Sensor Networks (WSNs). This paper shares our experience on the implementation and use of these protocols and related technologies in WSNs. We present problems and challenges we have been facing in implementing an IEEE 802.15.4/ZigBee stack for TinyOS in a two-folded perspective: IEEE 802.15.4/ZigBee protocol standards limitations (ambiguities and open issues) and technological limitations (hardware and software). Concerning the former, we address challenges for building scalable and synchronized multi-cluster ZigBee networks, providing a trade-off between timeliness and energy-efficiency. On the latter issue, we highlight implementation problems in terms of hardware, timer handling and operating system limitations. We also report on our experience from experimental test-beds, namely on physical layer aspects such as coexistence problems between IEEE 802.15.4 and 802.11 radio channels.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Poster presented in Work in Progress Session, 28th GI/ITG International Conference on Architecture of Computing Systems (ARCS 2015). 24 to 26, Mar, 2015. Porto, Portugal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Transactional memory (TM) is a new synchronization mechanism devised to simplify parallel programming, thereby helping programmers to unleash the power of current multicore processors. Although software implementations of TM (STM) have been extensively analyzed in terms of runtime performance, little attention has been paid to an equally important constraint faced by nearly all computer systems: energy consumption. In this work we conduct a comprehensive study of energy and runtime tradeoff sin software transactional memory systems. We characterize the behavior of three state-of-the-art lock-based STM algorithms, along with three different conflict resolution schemes. As a result of this characterization, we propose a DVFS-based technique that can be integrated into the resolution policies so as to improve the energy-delay product (EDP). Experimental results show that our DVFS-enhanced policies are indeed beneficial for applications with high contention levels. Improvements of up to 59% in EDP can be observed in this scenario, with an average EDP reduction of 16% across the STAMP workloads. © 2012 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The efficient emulation of a many-core architecture is a challenging task, each core could be emulated through a dedicated thread and such threads would be interleaved on an either single-core or a multi-core processor. The high number of context switches will results in an unacceptable performance. To support this kind of application, the GPU computational power is exploited in order to schedule the emulation threads on the GPU cores. This presents a non trivial divergence issue, since GPU computational power is offered through SIMD processing elements, that are forced to synchronously execute the same instruction on different memory portions. Thus, a new emulation technique is introduced in order to overcome this limitation: instead of providing a routine for each ISA opcode, the emulator mimics the behavior of the Micro Architecture level, here instructions are date that a unique routine takes as input. Our new technique has been implemented and compared with the classic emulation approach, in order to investigate the chance of a hybrid solution.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La constante evolución de dispositivos portátiles multimedia que se ha producido en la última década ha provocado que hoy en día se disponga de una amplia variedad de dispositivos con capacidad para reproducir contenidos multimedia. En consecuencia, la reproducción de esos contenidos en dichos terminales lleva asociada disponer de procesadores que soporten una alta carga computacional, ya que las tareas de descodificación y presentación de video así lo requieren. Sin embargo, un procesador potente trabajando a elevadas frecuencias provoca un elevado consumo de la batería, y dado que se pretende trabajar con dispositivos portátiles, la vida útil de la batería se convierte en un asunto de especial importancia. La problemática que se plantea se ha convertido en una de las principales líneas de investigación del Grupo de Investigación GDEM (Grupo de Diseño Electrónico y Microelectrónico). En esta línea de trabajo, se persigue cómo optimizar el consumo de energía en terminales portables desde el punto de vista de la reducción de la calidad de experiencia del usuario a cambio de una mayor autonomía del terminal. Por tanto, para lograr esa reducción de la calidad de experiencia mencionada, se requiere un estándar de codificación de vídeo que así lo permita. El Grupo de Investigación GDEM cuenta con experiencia en el estándar de vídeo escalable H.264/SVC, el cual permite degradar la calidad de experiencia en función de las necesidades/características del dispositivo. Más concretamente, un video escalable contiene embebidas distintas versiones del video original que pueden ser descodificadas en diferentes resoluciones, tasas de cuadro y calidades (escalabilidades espacial, temporal y de calidad respectivamente), permitiendo una adaptación rápida y muy flexible. Seleccionado el estándar H.264/SVC para las tareas de vídeo, se propone trabajar con Mplayer, un reproductor de vídeos de código abierto (open source), al cual se le ha integrado un descodificador para vídeo escalable denominado OpenSVC. Por último, como dispositivo portable se trabajará con la plataforma de desarrollo BeagleBoard, un sistema embebido basado en el procesador OMAP3530 que permite modificar la frecuencia de reloj y la tensión de alimentación dinámicamente reduciendo de este modo el consumo del terminal. Este procesador a su vez contiene integrados un procesador de propósito general (ARM Cortex-A8) y un procesador digital de señal (DSP TMS320C64+TM). Debido a la alta carga computacional de la descodificación de vídeos escalables y la escasa optimización del ARM para procesamiento de datos, se propone llevar a cabo la ejecución de Mplayer en el ARM y encargar la tarea de descodificación al DSP, con la finalidad de reducir el consumo y por tanto aumentar la vida útil del sistema embebido sobre el cual se ejecutará la aplicación desarrollada. Una vez realizada esa integración, se llevará a cabo una caracterización del descodificador alojado en el DSP a través de una serie de medidas de rendimiento y se compararán los resultados con los obtenidos en el proceso de descodificación realizado únicamente en el ARM. ABSTRACT During the last years, the multimedia portable terminals have gradually evolved causing that nowadays a several range of devices with the ability of playing multimedia contents are easily available for everyone. Consequently, those multimedia terminals must have high-performance processors to play those contents because the coding and decoding tasks demand high computational load. However, a powerful processor performing to high frequencies implies higher battery consumption, and this issue has become one of the most important problems in the development cycle of a portable terminal. The power/energy consumption optimization on multimedia terminals has become in one the most significant work lines in the Electronic and Microelectronic Research Group of the Universidad Politécnica de Madrid. In particular, the group is researching how to reduce the user‟s Quality of Experience (QoE) quality in exchange for increased battery life. In order to reduce the Quality of Experience (QoE), a standard video coding that allows this operation is required. The H.264/SVC allows reducing the QoE according to the needs/characteristics of the terminal. Specifically, a scalable video contains different versions of original video embedded in an only one video stream, and each one of them can be decoded in different resolutions, frame rates and qualities (spatial, temporal and quality scalabilities respectively). Once the standard video coding is selected, a multimedia player with support for scalable video is needed. Mplayer has been proposed as a multimedia player, whose characteristics (open-source, enormous flexibility and scalable video decoder called OpenSVC) are the most suitable for the aims of this Master Thesis. Lastly, the embedded system BeagleBoard, based on the multi-core processor OMAP3530, will be the development platform used in this project. The multimedia terminal architecture is based on a commercial chip having a General Purpose Processor (GPP – ARM Cortex A8) and a Digital Signal Processor (DSP, TMS320C64+™). Moreover, the processor OMAP3530 has the ability to modify the operating frequency and the supply voltage in a dynamic way in order to reduce the power consumption of the embedded system. So, the main goal of this Master Thesis is the integration of the multimedia player, MPlayer, executed at the GPP, and scalable video decoder, OpenSVC, executed at the DSP in order to distribute the computational load associated with the scalable video decoding task and to reduce the power consumption of the terminal. Once the integration is accomplished, the performance of the OpenSVC decoder executed at the DSP will be measured using different combinations of scalability values. The obtained results will be compared with the scalable video decoding performed at the GPP in order to show the low optimization of this kind of architecture for decoding tasks in contrast to DSP architecture.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nell'ambito della loro trasformazione digitale, molte organizzazioni stanno adottando nuove tecnologie per supportare lo sviluppo, l'implementazione e la gestione delle proprie architetture basate su microservizi negli ambienti cloud e tra i fornitori di cloud. In questo scenario, le service ed event mesh stanno emergendo come livelli infrastrutturali dinamici e configurabili che facilitano interazioni complesse e la gestione di applicazioni basate su microservizi e servizi cloud. L’obiettivo di questo lavoro è quello di analizzare soluzioni mesh open-source (istio, Linkerd, Apache EventMesh) dal punto di vista delle prestazioni, quando usate per gestire la comunicazione tra applicazioni a workflow basate su microservizi all’interno dell’ambiente cloud. A questo scopo è stato realizzato un sistema per eseguire il dislocamento di ognuno dei componenti all’interno di un cluster singolo e in un ambiente multi-cluster. La raccolta delle metriche e la loro sintesi è stata realizzata con un sistema personalizzato, compatibile con il formato dei dati di Prometheus. I test ci hanno permesso di valutare le prestazioni di ogni componente insieme alla sua efficacia. In generale, mentre si è potuta accertare la maturità delle implementazioni di service mesh testate, la soluzione di event mesh da noi usata è apparsa come una tecnologia ancora non matura, a causa di numerosi problemi di funzionamento.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Context. Cluster properties can be more distinctly studied in pairs of clusters, where we expect the effects of interactions to be strong. Aims. We here discuss the properties of the double cluster Abell 1758 at a redshift z similar to 0.279. These clusters show strong evidence for merging. Methods. We analyse the optical properties of the North and South cluster of Abell 1758 based on deep imaging obtained with the Canada-France-Hawaii Telescope (CFHT) archive Megaprime/Megacam camera in the g' and r' bands, covering a total region of about 1.05 x 1.16 deg(2), or 16.1 x 17.6 Mpc(2). Our X-ray analysis is based on archive XMM-Newton images. Numerical simulations were performed using an N-body algorithm to treat the dark-matter component, a semi-analytical galaxy-formation model for the evolution of the galaxies and a grid-based hydrodynamic code with a parts per million (PPM) scheme for the dynamics of the intra-cluster medium. We computed galaxy luminosity functions (GLFs) and 2D temperature and metallicity maps of the X-ray gas, which we then compared to the results of our numerical simulations. Results. The GLFs of Abell 1758 North are well fit by Schechter functions in the g' and r' bands, but with a small excess of bright galaxies, particularly in the r' band; their faint-end slopes are similar in both bands. In contrast, the GLFs of Abell 1758 South are not well fit by Schechter functions: excesses of bright galaxies are seen in both bands; the faint-end of the GLF is not very well defined in g'. The GLF computed from our numerical simulations assuming a halo mass-luminosity relation agrees with those derived from the observations. From the X-ray analysis, the most striking features are structures in the metal distribution. We found two elongated regions of high metallicity in Abell 1758 North with two peaks towards the centre. In contrast, Abell 1758 South shows a deficit of metals in its central regions. Comparing observational results to those derived from numerical simulations, we could mimic the most prominent features present in the metallicity map and propose an explanation for the dynamical history of the cluster. We found in particular that in the metal-rich elongated regions of the North cluster, winds had been more efficient than ram-pressure stripping in transporting metal-enriched gas to the outskirts. Conclusions. We confirm the merging structure of the North and South clusters, both at optical and X-ray wavelengths.