984 resultados para parallel application
Resumo:
Performance prediction and application behavior modeling have been the subject of exten- sive research that aim to estimate applications performance with an acceptable precision. A novel approach to predict the performance of parallel applications is based in the con- cept of Parallel Application Signatures that consists in extract an application most relevant parts (phases) and the number of times they repeat (weights). Executing these phases in a target machine and multiplying its exeuction time by its weight an estimation of the application total execution time can be made. One of the problems is that the performance of an application depends on the program workload. Every type of workload affects differently how an application performs in a given system and so affects the signature execution time. Since the workloads used in most scientific parallel applications have dimensions and data ranges well known and the behavior of these applications are mostly deterministic, a model of how the programs workload affect its performance can be obtained. We create a new methodology to model how a program’s workload affect the parallel application signature. Using regression analysis we are able to generalize each phase time execution and weight function to predict an application performance in a target system for any type of workload within predefined range. We validate our methodology using a synthetic program, benchmarks applications and well known real scientific applications.
Resumo:
The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.
Resumo:
The recent emergence of a new generation of mobile application marketplaces has changed the business in the mobile ecosystems. The marketplaces have gathered over a million applications by hundreds of thousands of application developers and publishers. Thus, software ecosystems—consisting of developers, consumers and the orchestrator—have emerged as a part of the mobile ecosystem. This dissertation addresses the new challenges faced by mobile application developers in the new ecosystems through empirical methods. By using the theories of two-sided markets and business ecosystems as the basis, the thesis assesses monetization and value creation in the market as well as the impact of electronic Word-of-Mouth (eWOM) and developer multihoming— i. e. contributing for more than one platform—in the ecosystems. The data for the study was collected with web crawling from the three biggest marketplaces: Apple App Store, Google Play and Windows Phone Store. The dissertation consists of six individual articles. The results of the studies show a gap in monetization among the studied applications, while a majority of applications are produced by small or micro-enterprises. The study finds only weak support for the impact of eWOM on the sales of an application in the studied ecosystem. Finally, the study reveals a clear difference in the multi-homing rates between the top application developers and the rest. This has, as discussed in the thesis, an impact on the future market analyses—it seems that the smart device market can sustain several parallel application marketplaces.
Resumo:
The performance benefit when using Grid systems comes from different strategies, among which partitioning the applications into parallel tasks is the most important. However, in most cases the enhancement coming from partitioning is smoothed by the effect of the synchronization overhead, mainly due to the high variability of completion times of the different tasks, which, in turn, is due to the large heterogeneity of Grid nodes. For this reason, it is important to have models which capture the performance of such systems. In this paper we describe a queueing-network-based performance model able to accurately analyze Grid architectures, and we use the model to study a real parallel application executed in a Grid. The proposed model improves the classical modelling techniques and highlights the impact of resource heterogeneity and network latency on the application performance.
Resumo:
Ne bis in idem, understood as a procedural guarantee in the EU assumes different features in the AFSJ and in european competition law. Despite having a common origin (being, in both sectors the result of the case law of the same jurisdictional organ) its components are quite distintic in each area of the integration. In the AFSJ, the content of bis and idem are broader and addressed at a larger protection of individuals. Its axiological ground is based on the freedom of movements and human dignity, whereas in european competition law its closely linked to defence rights of legal persons and the concept of criminal punishment of anticompetitive sanctions as interpreted by the ECHR´s jurisprudence. In european competition law, ne bis in idem is limited by the systemic framework of competition law and the need to ensure parallel application of both european and national laws. Nonetheless, the absence of a compulsory mechanism to allocate jurisdiction in the EU (both in the AFSJ and in the field of anti-trust law) demands a common axiological framework. In this context, ne bis in idem must be understood as a defence right based on equity and proportionality. As far as its international dimension is concerned, ne bis in idem also lacks an erga omnes effect and it is not considered to be a rule of ius cogens. Consequently, the model which the ECJ has built regarding the application of the ne bis in idem in transnational and supranational contexts should be replicated by other courts through cross fertilization, in order to internationalize that procedural guarantee and broaden its scope of application.
Resumo:
Fault tolerance has become a major issue for computer and software engineers because the occurrence of faults increases the cost of using a parallel computer. RADIC is the fault tolerance architecture for message passing systems which is transparent, decentralized, flexible and scalable. This master thesis presents the methodology used to implement the RADIC architecture over Open MPI, a well-know large-used message passing library. This implementation kept the RADIC architecture characteristics. In order to validate the implementation we have executed a synthetic ping program, besides, to evaluate the implementation performance we have used the NAS Parallel Benchmarks. The results prove that the RADIC architecture performance depends on the communication pattern of the parallel application which is running. Furthermore, our implementation proves that the RADIC architecture could be implemented over an existent message passing library.
Resumo:
La gestión de recursos en los procesadores multi-core ha ganado importancia con la evolución de las aplicaciones y arquitecturas. Pero esta gestión es muy compleja. Por ejemplo, una misma aplicación paralela ejecutada múltiples veces con los mismos datos de entrada, en un único nodo multi-core, puede tener tiempos de ejecución muy variables. Hay múltiples factores hardware y software que afectan al rendimiento. La forma en que los recursos hardware (cómputo y memoria) se asignan a los procesos o threads, posiblemente de varias aplicaciones que compiten entre sí, es fundamental para determinar este rendimiento. La diferencia entre hacer la asignación de recursos sin conocer la verdadera necesidad de la aplicación, frente a asignación con una meta específica es cada vez mayor. La mejor manera de realizar esta asignación és automáticamente, con una mínima intervención del programador. Es importante destacar, que la forma en que la aplicación se ejecuta en una arquitectura no necesariamente es la más adecuada, y esta situación puede mejorarse a través de la gestión adecuada de los recursos disponibles. Una apropiada gestión de recursos puede ofrecer ventajas tanto al desarrollador de las aplicaciones, como al entorno informático donde ésta se ejecuta, permitiendo un mayor número de aplicaciones en ejecución con la misma cantidad de recursos. Así mismo, esta gestión de recursos no requeriría introducir cambios a la aplicación, o a su estrategia operativa. A fin de proponer políticas para la gestión de los recursos, se analizó el comportamiento de aplicaciones intensivas de cómputo e intensivas de memoria. Este análisis se llevó a cabo a través del estudio de los parámetros de ubicación entre los cores, la necesidad de usar la memoria compartida, el tamaño de la carga de entrada, la distribución de los datos dentro del procesador y la granularidad de trabajo. Nuestro objetivo es identificar cómo estos parámetros influyen en la eficiencia de la ejecución, identificar cuellos de botella y proponer posibles mejoras. Otra propuesta es adaptar las estrategias ya utilizadas por el Scheduler con el fin de obtener mejores resultados.
Resumo:
Actualmente existen muchas aplicaciones paralelas/distribuidas en las cuales SPMD es el paradigma más usado. Obtener un buen rendimiento en una aplicación paralela de este tipo es uno de los principales desafíos dada la gran cantidad de aplicaciones existentes. Este objetivo no es fácil de resolver ya que existe una gran variedad de configuraciones de hardware, y también la naturaleza de los problemas pueden ser variados así como la forma de implementarlos. En consecuencia, si no se considera adecuadamente la combinación "software/hardware" pueden aparecer problemas inherentes a una aplicación iterativa sin una jerarquía de control definida de acuerdo a este paradigma. En SPMD todos los procesos ejecutan el mismo código pero computan una sección diferente de los datos de entrada. Una solución a un posible problema del rendimiento es proponer una estrategia de balance de carga para homogeneizar el cómputo entre los diferentes procesos. En este trabajo analizamos el benchmark CG con cargas heterogéneas con la finalidad de detectar los posibles problemas de rendimiento en una aplicación real. Un factor que determina el rendimiento en esta aplicación es la cantidad de elementos nonzero contenida en la sección de matriz asignada a cada proceso. Determinamos que es posible definir una estrategia de balance de carga que puede ser implementada de forma dinámica y demostramos experimentalmente que el rendimiento de la aplicación puede mejorarse de forma significativa con dicha estrategia.
Resumo:
In this paper, we present a distributed computing framework for problems characterized by a highly irregular search tree, whereby no reliable workload prediction is available. The framework is based on a peer-to-peer computing environment and dynamic load balancing. The system allows for dynamic resource aggregation, does not depend on any specific meta-computing middleware and is suitable for large-scale, multi-domain, heterogeneous environments, such as computational Grids. Dynamic load balancing policies based on global statistics are known to provide optimal load balancing performance, while randomized techniques provide high scalability. The proposed method combines both advantages and adopts distributed job-pools and a randomized polling technique. The framework has been successfully adopted in a parallel search algorithm for subgraph mining and evaluated on a molecular compounds dataset. The parallel application has shown good calability and close-to linear speedup in a distributed network of workstations.
Resumo:
With the transition to multicore processors almost complete, the parallel processing community is seeking efficient ways to port legacy message passing applications on shared memory and multicore processors. MPJ Express is our reference implementation of Message Passing Interface (MPI)-like bindings for the Java language. Starting with the current release, the MPJ Express software can be configured in two modes: the multicore and the cluster mode. In the multicore mode, parallel Java applications execute on shared memory or multicore processors. In the cluster mode, Java applications parallelized using MPJ Express can be executed on distributed memory platforms like compute clusters and clouds. The multicore device has been implemented using Java threads in order to satisfy two main design goals of portability and performance. We also discuss the challenges of integrating the multicore device in the MPJ Express software. This turned out to be a challenging task because the parallel application executes in a single JVM in the multicore mode. On the contrary in the cluster mode, the parallel user application executes in multiple JVMs. Due to these inherent architectural differences between the two modes, the MPJ Express runtime is modified to ensure correct semantics of the parallel program. Towards the end, we compare performance of MPJ Express (multicore mode) with other C and Java message passing libraries---including mpiJava, MPJ/Ibis, MPICH2, MPJ Express (cluster mode)---on shared memory and multicore processors. We found out that MPJ Express performs signicantly better in the multicore mode than in the cluster mode. Not only this but the MPJ Express software also performs better in comparison to other Java messaging libraries including mpiJava and MPJ/Ibis when used in the multicore mode on shared memory or multicore processors. We also demonstrate effectiveness of the MPJ Express multicore device in Gadget-2, which is a massively parallel astrophysics N-body siimulation code.
Resumo:
In 2006 the Route load balancing algorithm was proposed and compared to other techniques aiming at optimizing the process allocation in grid environments. This algorithm schedules tasks of parallel applications considering computer neighborhoods (where the distance is defined by the network latency). Route presents good results for large environments, although there are cases where neighbors do not have an enough computational capacity nor communication system capable of serving the application. In those situations the Route migrates tasks until they stabilize in a grid area with enough resources. This migration may take long time what reduces the overall performance. In order to improve such stabilization time, this paper proposes RouteGA (Route with Genetic Algorithm support) which considers historical information on parallel application behavior and also the computer capacities and load to optimize the scheduling. This information is extracted by using monitors and summarized in a knowledge base used to quantify the occupation of tasks. Afterwards, such information is used to parameterize a genetic algorithm responsible for optimizing the task allocation. Results confirm that RouteGA outperforms the load balancing carried out by the original Route, which had previously outperformed others scheduling algorithms from literature.
Resumo:
Este trabalho analisa a formação de gabinetes no Governo do Estado do Espírito Santo no período 1995-2014. Para isso, parte-se do debate em torno do presidencialismo de coalizão brasileiro e suas aplicações ao nível subnacional, reforçando a importância de estudos de caso e estudos comparativos. Um resgaste da trajetória política do Espírito Santo é feita, ressaltando o período de crise na década de 1990 e a virada institucional que se deu no começo dos anos 2000. A composição da Assembleia Legislativa no período também é destacada, dada a sua importância para o entendimento das relações entre o Executivo e o Legislativo. Foi construída uma base de dados com todos os Secretários de Estado do período, além de suas respectivas filiações partidárias, de acordo com dados do Tribunal Superior Eleitoral (TSE). Assim, pode-se comparar a composição partidária do gabinete e o tamanho das bancadas partidárias no Legislativo. Para análise da proporcionalidade dos gabinetes este estudo utiliza a Taxa de Coalescência de Amorim Neto (2000) e a aplicação do Índice G sugerido por Avelino, Biderman e Silva (2011). Além da tradicional utilização da filiação partidária dos secretários como proxy para a determinação de um elemento político no gabinete, há ainda a proposição e aplicação paralela de um novo critério que considera a filiação partidária e a ocorrência de candidatura prévia como indicativo de um secretário político. Os dois critérios utilizados mostram resultados diferenciados, e o fato de a maioria dos gabinetes formados não terem sido majoritários sugere que no Espírito Santo a distribuição de cargos no primeiro escalão de governo não seja a principal moeda de troca nos acordos entre Executivo e Legislativo.
Resumo:
The last years have presented an increase in the acceptance and adoption of the parallel processing, as much for scientific computation of high performance as for applications of general intention. This acceptance has been favored mainly for the development of environments with massive parallel processing (MPP - Massively Parallel Processing) and of the distributed computation. A common point between distributed systems and MPPs architectures is the notion of message exchange, that allows the communication between processes. An environment of message exchange consists basically of a communication library that, acting as an extension of the programming languages that allow to the elaboration of applications parallel, such as C, C++ and Fortran. In the development of applications parallel, a basic aspect is on to the analysis of performance of the same ones. Several can be the metric ones used in this analysis: time of execution, efficiency in the use of the processing elements, scalability of the application with respect to the increase in the number of processors or to the increase of the instance of the treat problem. The establishment of models or mechanisms that allow this analysis can be a task sufficiently complicated considering parameters and involved degrees of freedom in the implementation of the parallel application. An joined alternative has been the use of collection tools and visualization of performance data, that allow the user to identify to points of strangulation and sources of inefficiency in an application. For an efficient visualization one becomes necessary to identify and to collect given relative to the execution of the application, stage this called instrumentation. In this work it is presented, initially, a study of the main techniques used in the collection of the performance data, and after that a detailed analysis of the main available tools is made that can be used in architectures parallel of the type to cluster Beowulf with Linux on X86 platform being used libraries of communication based in applications MPI - Message Passing Interface, such as LAM and MPICH. This analysis is validated on applications parallel bars that deal with the problems of the training of neural nets of the type perceptrons using retro-propagation. The gotten conclusions show to the potentiality and easinesses of the analyzed tools.
Resumo:
Assigning cells to switches in a cellular mobile network is known as an NP-hard optimization problem. This means that the alternative for the solution of this type of problem is the use of heuristic methods, because they allow the discovery of a good solution in a very satisfactory computational time. This paper proposes a Beam Search method to solve the problem of assignment cell in cellular mobile networks. Some modifications in this algorithm are also presented, which allows its parallel application. Computational results obtained from several tests confirm the effectiveness of this approach and provide good solutions for large scale problems.
Resumo:
The increase of capacity to integrate transistors permitted to develop completed systems, with several components, in single chip, they are called SoC (System-on-Chip). However, the interconnection subsystem cans influence the scalability of SoCs, like buses, or can be an ad hoc solution, like bus hierarchy. Thus, the ideal interconnection subsystem to SoCs is the Network-on-Chip (NoC). The NoCs permit to use simultaneous point-to-point channels between components and they can be reused in other projects. However, the NoCs can raise the complexity of project, the area in chip and the dissipated power. Thus, it is necessary or to modify the way how to use them or to change the development paradigm. Thus, a system based on NoC is proposed, where the applications are described through packages and performed in each router between source and destination, without traditional processors. To perform applications, independent of number of instructions and of the NoC dimensions, it was developed the spiral complement algorithm, which finds other destination until all instructions has been performed. Therefore, the objective is to study the viability of development that system, denominated IPNoSys system. In this study, it was developed a tool in SystemC, using accurate cycle, to simulate the system that performs applications, which was implemented in a package description language, also developed to this study. Through the simulation tool, several result were obtained that could be used to evaluate the system performance. The methodology used to describe the application corresponds to transform the high level application in data-flow graph that become one or more packages. This methodology was used in three applications: a counter, DCT-2D and float add. The counter was used to evaluate a deadlock solution and to perform parallel application. The DCT was used to compare to STORM platform. Finally, the float add aimed to evaluate the efficiency of the software routine to perform a unimplemented hardware instruction. The results from simulation confirm the viability of development of IPNoSys system. They showed that is possible to perform application described in packages, sequentially or parallelly, without interruptions caused by deadlock, and also showed that the execution time of IPNoSys is more efficient than the STORM platform