963 resultados para load balancing algorithm
Resumo:
Processor virtualization for process migration in distributed parallel computing systems has formed a significant component of research on load balancing. In contrast, the potential of processor virtualization for fault tolerance has been addressed minimally. The work reported in this paper is motivated towards extending concepts of processor virtualization towards ‘intelligent cores’ as a means to achieve fault tolerance in distributed parallel computing systems. Intelligent cores are an abstraction of the hardware processing cores, with the incorporation of cognitive capabilities, on which parallel tasks can be executed and migrated. When a processing core executing a task is predicted to fail the task being executed is proactively transferred onto another core. A parallel reduction algorithm incorporating concepts of intelligent cores is implemented on a computer cluster using Adaptive MPI and Charm ++. Preliminary results confirm the feasibility of the approach.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
Network reconfiguration in distribution systems can be carried out by changing the status of sectionalizing switches and it is usually done for loss minimization and load balancing. In this paper it is presented an heuristic algorithm that accomplishes network reconfiguration for operation planning in order to obtain a configuration set whose configurations have the smallest active losses on its feeders. To obtain the configurations, it is used an approached radial load flow method and an heuristic proceeding based on maximum limit of voltage drop on feeders. Results are presented for three hypothetical systems largely known whose data are available in literature and a real system with 135 busses. In addition, it is used a fast and robust load flow which decreases the computational effort.
Resumo:
In this work, the planning of secondary distribution circuits is approached as a mixed integer nonlinear programming problem (MINLP). In order to solve this problem, a dedicated evolutionary algorithm (EA) is proposed. This algorithm uses a codification scheme, genetic operators, and control parameters, projected and managed to consider the specific characteristics of the secondary network planning. The codification scheme maps the possible solutions that satisfy the requirements in order to obtain an effective and low-cost projected system-the conductors' adequate dimensioning, load balancing among phases, and the transformer placed at the center of the secondary system loads. An effective algorithm for three-phase power flow is used as an auxiliary methodology of the EA for the calculation of the fitness function proposed for solutions of each topology. Results for two secondary distribution circuits are presented, whereas one presents radial topology and the other a weakly meshed topology. © 2005 IEEE.
Resumo:
Pós-graduação em Engenharia Elétrica - FEIS
Resumo:
The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
This paper introduces a joint load balancing and hotspot mitigation protocol for mobile ad-hoc network (MANET) termed by us as 'load_energy balance + hotspot mitigation protocol (LEB+HM)'. We argue that although ad-hoc wireless networks have limited network resources - bandwidth and power, prone to frequent link/node failures and have high security risk; existing ad hoc routing protocols do not put emphasis on maintaining robust link/node, efficient use of network resources and on maintaining the security of the network. Typical route selection metrics used by existing ad hoc routing protocols are shortest hop, shortest delay, and loop avoidance. These routing philosophy have the tendency to cause traffic concentration on certain regions or nodes, leading to heavy contention, congestion and resource exhaustion which in turn may result in increased end-to-end delay, packet loss and faster battery power depletion, degrading the overall performance of the network. Also in most existing on-demand ad hoc routing protocols intermediate nodes are allowed to send route reply RREP to source in response to a route request RREQ. In such situation a malicious node can send a false optimal route to the source so that data packets sent will be directed to or through it, and tamper with them as wish. It is therefore desirable to adopt routing schemes which can dynamically disperse traffic load, able to detect and remove any possible bottlenecks and provide some form of security to the network. In this paper we propose a combine adaptive load_energy balancing and hotspot mitigation scheme that aims at evenly distributing network traffic load and energy, mitigate against any possible occurrence of hotspot and provide some form of security to the network. This combine approach is expected to yield high reliability, availability and robustness, that best suits any dynamic and scalable ad hoc network environment. Dynamic source routing (DSR) was use as our underlying protocol for the implementation of our algorithm. Simulation comparison of our protocol to that of original DSR shows that our protocol has reduced node/link failure, even distribution of battery energy, and better network service efficiency.
Resumo:
A wireless mesh network is a mesh network implemented over a wireless network system such as wireless LANs. Wireless Mesh Networks(WMNs) are promising for numerous applications such as broadband home networking, enterprise networking, transportation systems, health and medical systems, security surveillance systems, etc. Therefore, it has received considerable attention from both industrial and academic researchers. This dissertation explores schemes for resource management and optimization in WMNs by means of network routing and network coding.^ In this dissertation, we propose three optimization schemes. (1) First, a triple-tier optimization scheme is proposed for load balancing objective. The first tier mechanism achieves long-term routing optimization, and the second tier mechanism, using the optimization results obtained from the first tier mechanism, performs the short-term adaptation to deal with the impact of dynamic channel conditions. A greedy sub-channel allocation algorithm is developed as the third tier optimization scheme to further reduce the congestion level in the network. We conduct thorough theoretical analysis to show the correctness of our design and give the properties of our scheme. (2) Then, a Relay-Aided Network Coding scheme called RANC is proposed to improve the performance gain of network coding by exploiting the physical layer multi-rate capability in WMNs. We conduct rigorous analysis to find the design principles and study the tradeoff in the performance gain of RANC. Based on the analytical results, we provide a practical solution by decomposing the original design problem into two sub-problems, flow partition problem and scheduling problem. (3) Lastly, a joint optimization scheme of the routing in the network layer and network coding-aware scheduling in the MAC layer is introduced. We formulate the network optimization problem and exploit the structure of the problem via dual decomposition. We find that the original problem is composed of two problems, routing problem in the network layer and scheduling problem in the MAC layer. These two sub-problems are coupled through the link capacities. We solve the routing problem by two different adaptive routing algorithms. We then provide a distributed coding-aware scheduling algorithm. According to corresponding experiment results, the proposed schemes can significantly improve network performance.^
Resumo:
The reverse time migration algorithm (RTM) has been widely used in the seismic industry to generate images of the underground and thus reduce the risk of oil and gas exploration. Its widespread use is due to its high quality in underground imaging. The RTM is also known for its high computational cost. Therefore, parallel computing techniques have been used in their implementations. In general, parallel approaches for RTM use a coarse granularity by distributing the processing of a subset of seismic shots among nodes of distributed systems. Parallel approaches with coarse granularity for RTM have been shown to be very efficient since the processing of each seismic shot can be performed independently. For this reason, RTM algorithm performance can be considerably improved by using a parallel approach with finer granularity for the processing assigned to each node. This work presents an efficient parallel algorithm for 3D reverse time migration with fine granularity using OpenMP. The propagation algorithm of 3D acoustic wave makes up much of the RTM. Different load balancing were analyzed in order to minimize possible losses parallel performance at this stage. The results served as a basis for the implementation of other phases RTM: backpropagation and imaging condition. The proposed algorithm was tested with synthetic data representing some of the possible underground structures. Metrics such as speedup and efficiency were used to analyze its parallel performance. The migrated sections show that the algorithm obtained satisfactory performance in identifying subsurface structures. As for the parallel performance, the analysis clearly demonstrate the scalability of the algorithm achieving a speedup of 22.46 for the propagation of the wave and 16.95 for the RTM, both with 24 threads.
Resumo:
A method is outlined for optimising graph partitions which arise in mapping un- structured mesh calculations to parallel computers. The method employs a combination of iterative techniques to both evenly balance the workload and minimise the number and volume of interprocessor communications. They are designed to work efficiently in parallel as well as sequentially and when combined with a fast direct partitioning technique (such as the Greedy algorithm) to give an initial partition, the resulting two-stage process proves itself to be both a powerful and flexible solution to the static graph-partitioning problem. The algorithms can also be used for dynamic load-balancing and a clustering technique can additionally be employed to speed up the whole process. Experiments indicate that the resulting parallel code can provide high quality partitions, independent of the initial partition, within a few seconds.
Resumo:
Network Virtualization is a key technology for the Future Internet, allowing the deployment of multiple independent virtual networks that use resources of the same basic infrastructure. An important challenge in the dynamic provision of virtual networks resides in the optimal allocation of physical resources (nodes and links) to requirements of virtual networks. This problem is known as Virtual Network Embedding (VNE). For the resolution of this problem, previous research has focused on designing algorithms based on the optimization of a single objective. On the contrary, in this work we present a multi-objective algorithm, called VNE-MO-ILP, for solving dynamic VNE problem, which calculates an approximation of the Pareto Front considering simultaneously resource utilization and load balancing. Experimental results show evidences that the proposed algorithm is better or at least comparable to a state-of-the-art algorithm. Two performance metrics were simultaneously evaluated: (i) Virtual Network Request Acceptance Ratio and (ii) Revenue/Cost Relation. The size of test networks used in the experiments shows that the proposed algorithm scales well in execution times, for networks of 84 nodes
Resumo:
Os sistemas de tempo real modernos geram, cada vez mais, cargas computacionais pesadas e dinâmicas, começando-se a tornar pouco expectável que sejam implementados em sistemas uniprocessador. Na verdade, a mudança de sistemas com um único processador para sistemas multi- processador pode ser vista, tanto no domínio geral, como no de sistemas embebidos, como uma forma eficiente, em termos energéticos, de melhorar a performance das aplicações. Simultaneamente, a proliferação das plataformas multi-processador transformaram a programação paralela num tópico de elevado interesse, levando o paralelismo dinâmico a ganhar rapidamente popularidade como um modelo de programação. A ideia, por detrás deste modelo, é encorajar os programadores a exporem todas as oportunidades de paralelismo através da simples indicação de potenciais regiões paralelas dentro das aplicações. Todas estas anotações são encaradas pelo sistema unicamente como sugestões, podendo estas serem ignoradas e substituídas, por construtores sequenciais equivalentes, pela própria linguagem. Assim, o modo como a computação é na realidade subdividida, e mapeada nos vários processadores, é da responsabilidade do compilador e do sistema computacional subjacente. Ao retirar este fardo do programador, a complexidade da programação é consideravelmente reduzida, o que normalmente se traduz num aumento de produtividade. Todavia, se o mecanismo de escalonamento subjacente não for simples e rápido, de modo a manter o overhead geral em níveis reduzidos, os benefícios da geração de um paralelismo com uma granularidade tão fina serão meramente hipotéticos. Nesta perspetiva de escalonamento, os algoritmos que empregam uma política de workstealing são cada vez mais populares, com uma eficiência comprovada em termos de tempo, espaço e necessidades de comunicação. Contudo, estes algoritmos não contemplam restrições temporais, nem outra qualquer forma de atribuição de prioridades às tarefas, o que impossibilita que sejam diretamente aplicados a sistemas de tempo real. Além disso, são tradicionalmente implementados no runtime da linguagem, criando assim um sistema de escalonamento com dois níveis, onde a previsibilidade, essencial a um sistema de tempo real, não pode ser assegurada. Nesta tese, é descrita a forma como a abordagem de work-stealing pode ser resenhada para cumprir os requisitos de tempo real, mantendo, ao mesmo tempo, os seus princípios fundamentais que tão bons resultados têm demonstrado. Muito resumidamente, a única fila de gestão de processos convencional (deque) é substituída por uma fila de deques, ordenada de forma crescente por prioridade das tarefas. De seguida, aplicamos por cima o conhecido algoritmo de escalonamento dinâmico G-EDF, misturamos as regras de ambos, e assim nasce a nossa proposta: o algoritmo de escalonamento RTWS. Tirando partido da modularidade oferecida pelo escalonador do Linux, o RTWS é adicionado como uma nova classe de escalonamento, de forma a avaliar na prática se o algoritmo proposto é viável, ou seja, se garante a eficiência e escalonabilidade desejadas. Modificar o núcleo do Linux é uma tarefa complicada, devido à complexidade das suas funções internas e às fortes interdependências entre os vários subsistemas. Não obstante, um dos objetivos desta tese era ter a certeza que o RTWS é mais do que um conceito interessante. Assim, uma parte significativa deste documento é dedicada à discussão sobre a implementação do RTWS e à exposição de situações problemáticas, muitas delas não consideradas em teoria, como é o caso do desfasamento entre vários mecanismo de sincronização. Os resultados experimentais mostram que o RTWS, em comparação com outro trabalho prático de escalonamento dinâmico de tarefas com restrições temporais, reduz significativamente o overhead de escalonamento através de um controlo de migrações, e mudanças de contexto, eficiente e escalável (pelo menos até 8 CPUs), ao mesmo tempo que alcança um bom balanceamento dinâmico da carga do sistema, até mesmo de uma forma não custosa. Contudo, durante a avaliação realizada foi detetada uma falha na implementação do RTWS, pela forma como facilmente desiste de roubar trabalho, o que origina períodos de inatividade, no CPU em questão, quando a utilização geral do sistema é baixa. Embora o trabalho realizado se tenha focado em manter o custo de escalonamento baixo e em alcançar boa localidade dos dados, a escalonabilidade do sistema nunca foi negligenciada. Na verdade, o algoritmo de escalonamento proposto provou ser bastante robusto, não falhando qualquer meta temporal nas experiências realizadas. Portanto, podemos afirmar que alguma inversão de prioridades, causada pela sub-política de roubo BAS, não compromete os objetivos de escalonabilidade, e até ajuda a reduzir a contenção nas estruturas de dados. Mesmo assim, o RTWS também suporta uma sub-política de roubo determinística: PAS. A avaliação experimental, porém, não ajudou a ter uma noção clara do impacto de uma e de outra. No entanto, de uma maneira geral, podemos concluir que o RTWS é uma solução promissora para um escalonamento eficiente de tarefas paralelas com restrições temporais.
Resumo:
Many-core platforms based on Network-on-Chip (NoC [Benini and De Micheli 2002]) present an emerging technology in the real-time embedded domain. Although the idea to group the applications previously executed on separated single-core devices, and accommodate them on an individual many-core chip offers various options for power savings, cost reductions and contributes to the overall system flexibility, its implementation is a non-trivial task. In this paper we address the issue of application mapping onto a NoCbased many-core platform when considering fundamentals and trends of current many-core operating systems, specifically, we elaborate on a limited migrative application model encompassing a message-passing paradigm as a communication primitive. As the main contribution, we formulate the problem of real-time application mapping, and propose a three-stage process to efficiently solve it. Through analysis it is assured that derived solutions guarantee the fulfilment of posed time constraints regarding worst-case communication latencies, and at the same time provide an environment to perform load balancing for e.g. thermal, energy, fault tolerance or performance reasons.We also propose several constraints regarding the topological structure of the application mapping, as well as the inter- and intra-application communication patterns, which efficiently solve the issues of pessimism and/or intractability when performing the analysis.
Resumo:
High-level parallel languages offer a simple way for application programmers to specify parallelism in a form that easily scales with problem size, leaving the scheduling of the tasks onto processors to be performed at runtime. Therefore, if the underlying system cannot efficiently execute those applications on the available cores, the benefits will be lost. In this paper, we consider how to schedule highly heterogenous parallel applications that require real-time performance guarantees on multicore processors. The paper proposes a novel scheduling approach that combines the global Earliest Deadline First (EDF) scheduler with a priority-aware work-stealing load balancing scheme, which enables parallel realtime tasks to be executed on more than one processor at a given time instant. Experimental results demonstrate the better scalability and lower scheduling overhead of the proposed approach comparatively to an existing real-time deadline-oriented scheduling class for the Linux kernel.