47 resultados para parallel architecture
Resumo:
La tolerancia a fallos es una línea de investigación que ha adquirido una importancia relevante con el aumento de la capacidad de cómputo de los súper-computadores actuales. Esto es debido a que con el aumento del poder de procesamiento viene un aumento en la cantidad de componentes que trae consigo una mayor cantidad de fallos. Las estrategias de tolerancia a fallos actuales en su mayoría son centralizadas y estas no escalan cuando se utiliza una gran cantidad de procesos, dado que se requiere sincronización entre todos ellos para realizar las tareas de tolerancia a fallos. Además la necesidad de mantener las prestaciones en programas paralelos es crucial, tanto en presencia como en ausencia de fallos. Teniendo en cuenta lo citado, este trabajo se ha centrado en una arquitectura tolerante a fallos descentralizada (RADIC – Redundant Array of Distributed and Independant Controllers) que busca mantener las prestaciones iniciales y garantizar la menor sobrecarga posible para reconfigurar el sistema en caso de fallos. La implementación de esta arquitectura se ha llevado a cabo en la librería de paso de mensajes denominada Open MPI, la misma es actualmente una de las más utilizadas en el mundo científico para la ejecución de programas paralelos que utilizan una plataforma de paso de mensajes. Las pruebas iniciales demuestran que el sistema introduce mínima sobrecarga para llevar a cabo las tareas correspondientes a la tolerancia a fallos. MPI es un estándar por defecto fail-stop, y en determinadas implementaciones que añaden cierto nivel de tolerancia, las estrategias más utilizadas son coordinadas. En RADIC cuando ocurre un fallo el proceso se recupera en otro nodo volviendo a un estado anterior que ha sido almacenado previamente mediante la utilización de checkpoints no coordinados y la relectura de mensajes desde el log de eventos. Durante la recuperación, las comunicaciones con el proceso en cuestión deben ser retrasadas y redirigidas hacia la nueva ubicación del proceso. Restaurar procesos en un lugar donde ya existen procesos sobrecarga la ejecución disminuyendo las prestaciones, por lo cual en este trabajo se propone la utilización de nodos spare para la recuperar en ellos a los procesos que fallan, evitando de esta forma la sobrecarga en nodos que ya tienen trabajo. En este trabajo se muestra un diseño propuesto para gestionar de un modo automático y descentralizado la recuperación en nodos spare en un entorno Open MPI y se presenta un análisis del impacto en las prestaciones que tiene este diseño. Resultados iniciales muestran una degradación significativa cuando a lo largo de la ejecución ocurren varios fallos y no se utilizan spares y sin embargo utilizándolos se restablece la configuración inicial y se mantienen las prestaciones.
Resumo:
Performance prediction and application behavior modeling have been the subject of exten- sive research that aim to estimate applications performance with an acceptable precision. A novel approach to predict the performance of parallel applications is based in the con- cept of Parallel Application Signatures that consists in extract an application most relevant parts (phases) and the number of times they repeat (weights). Executing these phases in a target machine and multiplying its exeuction time by its weight an estimation of the application total execution time can be made. One of the problems is that the performance of an application depends on the program workload. Every type of workload affects differently how an application performs in a given system and so affects the signature execution time. Since the workloads used in most scientific parallel applications have dimensions and data ranges well known and the behavior of these applications are mostly deterministic, a model of how the programs workload affect its performance can be obtained. We create a new methodology to model how a program’s workload affect the parallel application signature. Using regression analysis we are able to generalize each phase time execution and weight function to predict an application performance in a target system for any type of workload within predefined range. We validate our methodology using a synthetic program, benchmarks applications and well known real scientific applications.
Resumo:
It is well known that image processing requires a huge amount of computation, mainly at low level processing where the algorithms are dealing with a great number of data-pixel. One of the solutions to estimate motions involves detection of the correspondences between two images. For normalised correlation criteria, previous experiments shown that the result is not altered in presence of nonuniform illumination. Usually, hardware for motion estimation has been limited to simple correlation criteria. The main goal of this paper is to propose a VLSI architecture for motion estimation using a matching criteria more complex than Sum of Absolute Differences (SAD) criteria. Today hardware devices provide many facilities for the integration of more and more complex designs as well as the possibility to easily communicate with general purpose processors
Resumo:
All-optical label swapping (AOLS) forms a key technology towards the implementation of all-optical packet switching nodes (AOPS) for the future optical Internet. The capital expenditures of the deployment of AOLS increases with the size of the label spaces (i.e. the number of used labels), since a special optical device is needed for each recognized label on every node. Label space sizes are affected by the way in which demands are routed. For instance, while shortest-path routing leads to the usage of fewer labels but high link utilization, minimum interference routing leads to the opposite. This paper studies all-optical label stacking (AOLStack), which is an extension of the AOLS architecture. AOLStack aims at reducing label spaces while easing the compromise with link utilization. In this paper, an integer lineal program is proposed with the objective of analyzing the softening of the aforementioned trade-off due to AOLStack. Furthermore, a heuristic aiming at finding good solutions in polynomial-time is proposed as well. Simulation results show that AOLStack either a) reduces the label spaces with a low increase in the link utilization or, similarly, b) uses better the residual bandwidth to decrease the number of labels even more
Resumo:
Es tracta d'un projecte que proposa una aplicació per al calibratge automàtic de models P-sistema. Per a fer-ho primer es farà un estudi sobre els models P-sistema i el procediment seguit pels investigadors per desenvolupar aquest tipus de models. Es desenvoluparà una primera solució sèrie per al problema, i s'analitzaran els seus punts febles. Seguidament es proposarà una versió paral·lela que millori significativament el temps d'execució, tot mantenint una alta eficiència i escalabilitat.
Resumo:
The mutual information of independent parallel Gaussian-noise channels is maximized, under an average power constraint, by independent Gaussian inputs whose power is allocated according to the waterfilling policy. In practice, discrete signalling constellations with limited peak-to-average ratios (m-PSK, m-QAM, etc) are used in lieu of the ideal Gaussian signals. This paper gives the power allocation policy that maximizes the mutual information over parallel channels with arbitrary input distributions. Such policy admits a graphical interpretation, referred to as mercury/waterfilling, which generalizes the waterfilling solution and allows retaining some of its intuition. The relationship between mutual information of Gaussian channels and nonlinear minimum mean-square error proves key to solving the power allocation problem.
Resumo:
This paper presents a webservice architecture for Statistical Machine Translation aimed at non-technical users. A workfloweditor allows a user to combine different webservices using a graphical user interface. In the current state of this project,the webservices have been implemented for a range of sentential and sub-sententialaligners. The advantage of a common interface and a common data format allows the user to build workflows exchanging different aligners.
Resumo:
Estudi sobre la millora de rendiment (en temps d’execució) al'algorisme de gràfics Fast Multipath Radiosity Using Hierarchical Subscenes gràcies a l’execució paral•lela especulada que ens permet obtenir el motor d'especulació per a clústers desenvolupat en el grup de recerca BCDS de la Universitat de Girona
Resumo:
We address the problem of scheduling a multiclass $M/M/m$ queue with Bernoulli feedback on $m$ parallel servers to minimize time-average linear holding costs. We analyze the performance of a heuristic priority-index rule, which extends Klimov's optimal solution to the single-server case: servers select preemptively customers with larger Klimov indices. We present closed-form suboptimality bounds (approximate optimality) for Klimov's rule, which imply that its suboptimality gap is uniformly bounded above with respect to (i) external arrival rates, as long as they stay within system capacity;and (ii) the number of servers. It follows that its relativesuboptimality gap vanishes in a heavy-traffic limit, as external arrival rates approach system capacity (heavy-traffic optimality). We obtain simpler expressions for the special no-feedback case, where the heuristic reduces to the classical $c \mu$ rule. Our analysis is based on comparing the expected cost of Klimov's ruleto the value of a strong linear programming (LP) relaxation of the system's region of achievable performance of mean queue lengths. In order to obtain this relaxation, we derive and exploit a new set ofwork decomposition laws for the parallel-server system. We further report on the results of a computational study on the quality of the $c \mu$ rule for parallel scheduling.
Resumo:
Análisis de desarrollo paralelo CUDA en lenguajes Java y Python, utilizando JCuda, RootBeer, PyCuda y Anaconda Accelerate.
Resumo:
This paper presents SiMR, a simulator of the Rudimentary Machine designed to be used in a first course of computer architecture of Software Engineering and Computer Engineering programmes. The Rudimentary Machine contains all the basic elements in a RISC computer, and SiMR allows editing, assembling and executing programmes for this processor. SiMR is used at the Universitat Oberta de Catalunya as one of the most important resources in the Virtual Computing Architecture and Organisation Laboratory, since students work at home with the simulator and reports containing their work are automatically generated to be evaluated by lecturers. The results obtained from a survey show that most of the students consider SiMR as a highly necessary or even an indispensable resource to learn the basic concepts about computer architecture.
Resumo:
A partir de maig de 2003, per iniciativa del Vicerectorat adjunt d’Edificacions de la UPC, el Centre Interdisciplinari de Tecnologia, Innovació i Educació per a la Sostenibilitat (CITIES) treballa en l’elaboració i la implantació del Pla d’Eficiència en el Consum de Recursos (PECR), amb l’objectiu d’establir polítiques i definir línees d’actuació per a l’estalvi i l’eficiència en el consum dels recursos energètics i d’ aigua en els edificis de la UPC.El PECR contempla, en una de les primeres fases, la realització d’avaluacions energètiques en les edificacions de la UPC per valorar l’estat actual dels edificis i poder establir uns indicadors del seu comportament energètic a partir dels quals establir els objectius d’estalvi i d’eficiència. Per fer aquestes avaluacions, es va crear una línea de projectes finals de carrera (PFC) per a estudiants de l’Escola Politècnica Superior de l’Edificació de Barcelona (EPSEB), sota la coordinació de professors tutors de diferents departaments y amb la col•laboració indispensable de totes les unitats de recolzament de la UPC.Aquesta publicació és la ponència presentada al IV Congrès "Sustainable City", a Tallinn, en el que es va presentar aquest projecte com a una eina de treball amb l'objectiu de reduir l'impacte ambiental dels edificis universitaris en les ciutats.