Biblioteca Digital

962 resultados para Fault tolerance

Real-Time Application Mapping for Many-Cores Using a Limited Migrative Model

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many-core platforms are an emerging technology in the real-time embedded domain. These devices offer various options for power savings, cost reductions and contribute to the overall system flexibility, however, issues such as unpredictability, scalability and analysis pessimism are serious challenges to their integration into the aforementioned area. The focus of this work is on many-core platforms using a limited migrative model (LMM). LMM is an approach based on the fundamental concepts of the multi-kernel paradigm, which is a promising step towards scalable and predictable many-cores. In this work, we formulate the problem of real-time application mapping on a many-core platform using LMM, and propose a three-stage method to solve it. An extended version of the existing analysis is used to assure that derived mappings (i) guarantee the fulfilment of timing constraints posed on worst-case communication delays of individual applications, and (ii) provide an environment to perform load balancing for e.g. energy/thermal management, fault tolerance and/or performance reasons.

O Futuro dos Centros de Dados

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Atualmente, as Tecnologias de Informação (TI) são cada vez mais vitais dentro das organizações. As TI são o motor de suporte do negócio. Para grande parte das organizações, o funcionamento e desenvolvimento das TI têm como base infraestruturas dedicadas (internas ou externas) denominadas por Centro de Dados (CD). Nestas infraestruturas estão concentrados os equipamentos de processamento e armazenamento de dados de uma organização, por isso, são e serão cada vez mais desafiadas relativamente a diversos fatores tais como a escalabilidade, disponibilidade, tolerância à falha, desempenho, recursos disponíveis ou disponibilizados, segurança, eficiência energética e inevitavelmente os custos associados. Com o aparecimento das tecnologias baseadas em computação em nuvem e virtualização, abrese todo um leque de novas formas de endereçar os desafios anteriormente descritos. Perante este novo paradigma, surgem novas oportunidades de consolidação dos CD que podem representar novos desafios para os gestores de CD. Por isso, é no mínimo irrealista para as organizações simplesmente eliminarem os CD ou transforma-los segundo os mais altos padrões de qualidade. As organizações devem otimizar os seus CD, contudo um projeto eficiente desta natureza, com capacidade para suportar as necessidades impostas pelo mercado, necessidades dos negócios e a velocidade da evolução tecnológica, exigem soluções complexas e dispendiosas tanto para a sua implementação como a sua gestão. É neste âmbito que surge o presente trabalho. Com o objetivo de estudar os CD inicia-se um estudo sobre esta temática, onde é detalhado o seu conceito, evolução histórica, a sua topologia, arquitetura e normas existentes que regem os mesmos. Posteriormente o estudo detalha algumas das principais tendências condicionadoras do futuro dos CD. Explorando o conhecimento teórico resultante do estudo anterior, desenvolve-se uma metodologia de avaliação dos CD baseado em critérios de decisão. O estudo culmina com uma análise sobre uma nova solução tecnológica e a avaliação de três possíveis cenários de implementação: a primeira baseada na manutenção do atual CD; a segunda baseada na implementação da nova solução em outro CD em regime de hosting externo; e finalmente a terceira baseada numa implementação em regime de IaaS.

Une architecture parallèle distribuée et tolérante aux pannes pour le protocole interdomaine BGP au cœur de l’Internet

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L’augmentation du nombre d’usagers de l’Internet a entraîné une croissance exponentielle dans les tables de routage. Cette taille prévoit l’atteinte d’un million de préfixes dans les prochaines années. De même, les routeurs au cœur de l’Internet peuvent facilement atteindre plusieurs centaines de connexions BGP simultanées avec des routeurs voisins. Dans une architecture classique des routeurs, le protocole BGP s’exécute comme une entité unique au sein du routeur. Cette architecture comporte deux inconvénients majeurs : l’extensibilité (scalabilité) et la fiabilité. D’un côté, la scalabilité de BGP est mesurable en termes de nombre de connexions et aussi par la taille maximale de la table de routage que l’interface de contrôle puisse supporter. De l’autre côté, la fiabilité est un sujet critique dans les routeurs au cœur de l’Internet. Si l’instance BGP s’arrête, toutes les connexions seront perdues et le nouvel état de la table de routage sera propagé tout au long de l’Internet dans un délai de convergence non trivial. Malgré la haute fiabilité des routeurs au cœur de l’Internet, leur résilience aux pannes est augmentée considérablement et celle-ci est implantée dans la majorité des cas via une redondance passive qui peut limiter la scalabilité du routeur. Dans cette thèse, on traite les deux inconvénients en proposant une nouvelle approche distribuée de BGP pour augmenter sa scalabilité ainsi que sa fiabilité sans changer la sémantique du protocole. L’architecture distribuée de BGP proposée dans la première contribution est faite pour satisfaire les deux contraintes : scalabilité et fiabilité. Ceci est accompli en exploitant adéquatement le parallélisme et la distribution des modules de BGP sur plusieurs cartes de contrôle. Dans cette contribution, les fonctionnalités de BGP sont divisées selon le paradigme « maître-esclave » et le RIB (Routing Information Base) est dupliqué sur plusieurs cartes de contrôle. Dans la deuxième contribution, on traite la tolérance aux pannes dans l’architecture élaborée dans la première contribution en proposant un mécanisme qui augmente la fiabilité. De plus, nous prouvons analytiquement dans cette contribution qu’en adoptant une telle architecture distribuée, la disponibilité de BGP sera augmentée considérablement versus une architecture monolithique. Dans la troisième contribution, on propose une méthode de partitionnement de la table de routage que nous avons appelé DRTP pour diviser la table de BGP sur plusieurs cartes de contrôle. Cette contribution vise à augmenter la scalabilité de la table de routage et la parallélisation de l’algorithme de recherche (Best Match Prefix) en partitionnant la table de routage sur plusieurs nœuds physiquement distribués.

Rezentralisierung: Server-Based Computing in der Praxis einer Universitätsbibliothek

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Der Beitrag beschreibt die Ein- und Durchführung einer Server-basierten Computerinfrastruktur in einer Universitätsbibliothek. Beschrieben wird das so genannte MetaFrame-DV-Konzept der Universitätsbibliothek Kassel, das das dortige Informationsmanagement in den letzten vier Jahren initiiert, konzipiert und umgesetzt hat. Hierbei werden nunmehr nicht mehr nur Applikationsserver z.B. für das CD-Angebot eingesetzt, sondern sämtliche ca. 200 Mitarbeiter- und Funktionsarbeitsplätze über eine Citrix MetaFrame-Installation serverseitig betreut. Besonderes Augenmerk gilt in diesem Beitrag der Konfiguration, der praktischen Administration und den täglichen Arbeitsbedingungen an den Bibliotheksmitarbeiterarbeitsplätzen.

ADAM: A Decentralized Parallel Computer Architecture Featuring Fast Thread and Data Migration and a Uniform Hardware Abstraction

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.

On the maximal connected component of hypercube with faulty vertices (II)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

evaluating the fault tolerance of an interconnection network, it is essential to estimate the size of a maximal connected component of the network at the presence of faulty processors. Hypercube is one of the most popular interconnection networks. In this paper, we prove that for ngreater than or equal to6, an n-dimensional cube with a set F of at most (4n-10) failing processors has a component of size greater than or equal to2"-\F-3. This result demonstrates the superiority of hypercube in terms of the fault tolerance.

A lower bound on the size of k-neighborhood in generalized cubes

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The determination of the minimum size of a k-neighborhood (i.e., a neighborhood of a set of k nodes) in a given graph is essential in the analysis of diagnosability and fault tolerance of multicomputer systems. The generalized cubes include the hypercube and most hypercube variants as special cases. In this paper, we present a lower bound on the size of a k-neighborhood in n-dimensional generalized cubes, where 2n + 1 <= k <= 3n - 2. This lower bound is tight in that it is met by the n-dimensional hypercube. Our result is an extension of two previously known results. (c) 2005 Elsevier Inc. All rights reserved.

An MPI-based implementation of intelligent agents on clusters

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Real-time identification and modelling in pervasive mining

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces an architecture for identifying and modelling in real-time at a copper mine using new technologies as M2M and cloud computing with a server in the cloud and an Android client inside the mine. The proposed design brings up pervasive mining, a system with wider coverage, higher communication efficiency, better fault-tolerance, and anytime anywhere availability. This solution was designed for a plant inside the mine which cannot tolerate interruption and for which their identification in situ, in real time, is an essential part of the system to control aspects such as instability by adjusting their corresponding parameters without stopping the process.

Formalization of an architectural model for exception handling coordination based on CA action concepts

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Architectures based on Coordinated Atomic action (CA action) concepts have been used to build concurrent fault-tolerant systems. This conceptual model combines concurrent exception handling with action nesting to provide a general mechanism for both enclosing interactions among system components and coordinating forward error recovery measures. This article presents an architectural model to guide the formal specification of concurrent fault-tolerant systems. This architecture provides built-in Communicating Sequential Processes (CSPs) and predefined channels to coordinate exception handling of the user-defined components. Hence some safety properties concerning action scoping and concurrent exception handling can be proved by using the FDR (Failure Divergence Refinement) verification tool. As a result, a formal and general architecture supporting software fault tolerance is ready to be used and proved as users define components with normal and exceptional behaviors. (C) 2010 Elsevier B.V. All rights reserved.

Application execution management on the InteGrade opportunistic grid middleware

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The InteGrade project is a multi-university effort to build a novel grid computing middleware based on the opportunistic use of resources belonging to user workstations. The InteGrade middleware currently enables the execution of sequential, bag-of-tasks, and parallel applications that follow the BSP or the MPI programming models. This article presents the lessons learned over the last five years of the InteGrade development and describes the solutions achieved concerning the support for robust application execution. The contributions cover the related fields of application scheduling, execution management, and fault tolerance. We present our solutions, describing their implementation principles and evaluation through the analysis of several experimental results. (C) 2010 Elsevier Inc. All rights reserved.

A token-based independent update protocol for managing replicated objects

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents the design and evaluation of a token-based protocol supporting independent updates of replicated objects. The paper makes three major contributions. Firstly, a token is used to simplify the control of update propagation and ordering. Secondly a partial commit state is introduced to support independent updates and to enhance the efficiency of transaction processing. Thirdly, a data partition methodology based on priorities is adopted to reduce update conflicts. We evaluate our protocol in comparison of two existing replication-control protocols in terms of transaction processing efficiency.

The development of an efficient checkpointing facility exploiting operating systems services of the GENESIS cluster operating system

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent research efforts of parallel processing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However, as a collection of independent computers used by multiple users, clusters are susceptible to failure. This paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. This facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of checkpoints, while maintaining low demands on cluster resources by using coordinated checkpointing.

Robust parallel job scheduling infrastructure for service-oriented grid computing systems

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent trends in grid computing development is moving towards a service-oriented architecture. With the momentum gaining for the service-oriented grid computing systems, the issue of deploying support for integrated scheduling and fault-tolerant approaches becomes paramount importance. To this end, we propose a scalable framework that loosely couples the dynamic job scheduling approach with the hybrid replications approach to schedule jobs efficiently while at the same time providing fault-tolerance. The novelty of the proposed framework is that it uses passive replication approach under high system load and active replication approach under low system loads. The switch between these two replication methods is also done dynamically and transparently.

An efficient reliable architecture for application layer anycast service

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Anycast is a new service in IPv6, and there are some open issues about the anycast service. In this paper, we focus on efficient and reliable aspects of application layer anycast. We apply the requirement based probing routing algorithm to replace the previous period based probing routingalgorithm for anycast resolvers. We employ the twin server model among the anycast servers, therefore, try to present a reliable service in the Internet environment. Our theoretical analysis shows that the proposed architecture works well, and it offers a more efficient routing performance and fault tolerance capability.

«
1
2
...
6
7
8
9
10
11
12
...
64
65
»