919 resultados para Network-on-Chip (NoC)
Resumo:
6th International Real-Time Scheduling Open Problems Seminar (RTSOPS 2015), Lund, Sweden.
Resumo:
The scale down of transistor technology allows microelectronics manufacturers such as Intel and IBM to build always more sophisticated systems on a single microchip. The classical interconnection solutions based on shared buses or direct connections between the modules of the chip are becoming obsolete as they struggle to sustain the increasing tight bandwidth and latency constraints that these systems demand. The most promising solution for the future chip interconnects are the Networks on Chip (NoC). NoCs are network composed by routers and channels used to inter- connect the different components installed on the single microchip. Examples of advanced processors based on NoC interconnects are the IBM Cell processor, composed by eight CPUs that is installed on the Sony Playstation III and the Intel Teraflops pro ject composed by 80 independent (simple) microprocessors. On chip integration is becoming popular not only in the Chip Multi Processor (CMP) research area but also in the wider and more heterogeneous world of Systems on Chip (SoC). SoC comprehend all the electronic devices that surround us such as cell-phones, smart-phones, house embedded systems, automotive systems, set-top boxes etc... SoC manufacturers such as ST Microelectronics , Samsung, Philips and also Universities such as Bologna University, M.I.T., Berkeley and more are all proposing proprietary frameworks based on NoC interconnects. These frameworks help engineers in the switch of design methodology and speed up the development of new NoC-based systems on chip. In this Thesis we propose an introduction of CMP and SoC interconnection networks. Then focusing on SoC systems we propose: • a detailed analysis based on simulation of the Spidergon NoC, a ST Microelectronics solution for SoC interconnects. The Spidergon NoC differs from many classical solutions inherited from the parallel computing world. Here we propose a detailed analysis of this NoC topology and routing algorithms. Furthermore we propose aEqualized a new routing algorithm designed to optimize the use of the resources of the network while also increasing its performance; • a methodology flow based on modified publicly available tools that combined can be used to design, model and analyze any kind of System on Chip; • a detailed analysis of a ST Microelectronics-proprietary transport-level protocol that the author of this Thesis helped developing; • a simulation-based comprehensive comparison of different network interface designs proposed by the author and the researchers at AST lab, in order to integrate shared-memory and message-passing based components on a single System on Chip; • a powerful and flexible solution to address the time closure exception issue in the design of synchronous Networks on Chip. Our solution is based on relay stations repeaters and allows to reduce the power and area demands of NoC interconnects while also reducing its buffer needs; • a solution to simplify the design of the NoC by also increasing their performance and reducing their power and area consumption. We propose to replace complex and slow virtual channel-based routers with multiple and flexible small Multi Plane ones. This solution allows us to reduce the area and power dissipation of any NoC while also increasing its performance especially when the resources are reduced. This Thesis has been written in collaboration with the Advanced System Technology laboratory in Grenoble France, and the Computer Science Department at Columbia University in the city of New York.
Resumo:
Demands for functionality enhancements, cost reductions and power savings clearly suggest the introduction of multiand many-core platforms in real-time embedded systems. However, when compared to uni-core platforms, the manycores experience additional problems, namely the lack of scalable coherence mechanisms and the necessity to perform migrations. These problems have to be addressed before such systems can be considered for integration into the realtime embedded domain. We have devised several agreement protocols which solve some of the aforementioned issues. The protocols allow the applications to plan and organise their future executions both temporally and spatially (i.e. when and where the next job will be executed). Decisions can be driven by several factors, e.g. load balancing, energy savings and thermal issues. All presented protocols are analytically described, with the particular emphasis on their respective real-time behaviours and worst-case performance. The underlying assumptions are based on the multi-kernel model and the message-passing paradigm, which constitutes the communication between the interacting instances.
Resumo:
As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.
Resumo:
Nel documento vengono principalmente trattati i principali meccanismi per il controllo di flusso per le NoC. Vengono trattati vari schemi di switching, gli stessi schemi associati all'introduzione dei Virtual Channel, alcuni low-level flow control, e due soluzioni per gli end-to-end flow control: Credit Based e CTC (STMicroelectronics). Nel corso della trattazione vengono presentate alcune possibili modifiche a CTC per incrementarne le prestazioni mantenendo la scalabilità che lo contraddistingue: queste sono le "back-to-back request" e "multiple incoming connections". Infine vengono introdotti alcune soluzioni per l'implementazione della qualità di servizio per le reti su chip. Proprio per il supporto al QoS viene introdotto CTTC: una versione di CTC con il supporto alla Time Division Multiplexing su rete Spidergon.
Resumo:
Many-core platforms based on Network-on-Chip (NoC [Benini and De Micheli 2002]) present an emerging technology in the real-time embedded domain. Although the idea to group the applications previously executed on separated single-core devices, and accommodate them on an individual many-core chip offers various options for power savings, cost reductions and contributes to the overall system flexibility, its implementation is a non-trivial task. In this paper we address the issue of application mapping onto a NoCbased many-core platform when considering fundamentals and trends of current many-core operating systems, specifically, we elaborate on a limited migrative application model encompassing a message-passing paradigm as a communication primitive. As the main contribution, we formulate the problem of real-time application mapping, and propose a three-stage process to efficiently solve it. Through analysis it is assured that derived solutions guarantee the fulfilment of posed time constraints regarding worst-case communication latencies, and at the same time provide an environment to perform load balancing for e.g. thermal, energy, fault tolerance or performance reasons.We also propose several constraints regarding the topological structure of the application mapping, as well as the inter- and intra-application communication patterns, which efficiently solve the issues of pessimism and/or intractability when performing the analysis.
Resumo:
“Many-core” systems based on a Network-on-Chip (NoC) architecture offer various opportunities in terms of performance and computing capabilities, but at the same time they pose many challenges for the deployment of real-time systems, which must fulfill specific timing requirements at runtime. It is therefore essential to identify, at design time, the parameters that have an impact on the execution time of the tasks deployed on these systems and the upper bounds on the other key parameters. The focus of this work is to determine an upper bound on the traversal time of a packet when it is transmitted over the NoC infrastructure. Towards this aim, we first identify and explore some limitations in the existing recursive-calculus-based approaches to compute the Worst-Case Traversal Time (WCTT) of a packet. Then, we extend the existing model by integrating the characteristics of the tasks that generate the packets. For this extended model, we propose an algorithm called “Branch and Prune” (BP). Our proposed method provides tighter and safe estimates than the existing recursive-calculus-based approaches. Finally, we introduce a more general approach, namely “Branch, Prune and Collapse” (BPC) which offers a configurable parameter that provides a flexible trade-off between the computational complexity and the tightness of the computed estimate. The recursive-calculus methods and BP present two special cases of BPC when a trade-off parameter is 1 or ∞, respectively. Through simulations, we analyze this trade-off, reason about the implications of certain choices, and also provide some case studies to observe the impact of task parameters on the WCTT estimates.
Resumo:
Rapid ongoing evolution of multiprocessors will lead to systems with hundreds of processing cores integrated in a single chip. An emerging challenge is the implementation of reliable and efficient interconnection between these cores as well as other components in the systems. Network-on-Chip is an interconnection approach which is intended to solve the performance bottleneck caused by traditional, poorly scalable communication structures such as buses. However, a large on-chip network involves issues related to congestion problems and system control, for instance. Additionally, faults can cause problems in multiprocessor systems. These faults can be transient faults, permanent manufacturing faults, or they can appear due to aging. To solve the emerging traffic management, controllability issues and to maintain system operation regardless of faults a monitoring system is needed. The monitoring system should be dynamically applicable to various purposes and it should fully cover the system under observation. In a large multiprocessor the distances between components can be relatively long. Therefore, the system should be designed so that the amount of energy-inefficient long-distance communication is minimized. This thesis presents a dynamically clustered distributed monitoring structure. The monitoring is distributed so that no centralized control is required for basic tasks such as traffic management and task mapping. To enable extensive analysis of different Network-on-Chip architectures, an in-house SystemC based simulation environment was implemented. It allows transaction level analysis without time consuming circuit level implementations during early design phases of novel architectures and features. The presented analysis shows that the dynamically clustered monitoring structure can be efficiently utilized for traffic management in faulty and congested Network-on-Chip-based multiprocessor systems. The monitoring structure can be also successfully applied for task mapping purposes. Furthermore, the analysis shows that the presented in-house simulation environment is flexible and practical tool for extensive Network-on-Chip architecture analysis.
Resumo:
In accordance with the Moore's law, the increasing number of on-chip integrated transistors has enabled modern computing platforms with not only higher processing power but also more affordable prices. As a result, these platforms, including portable devices, work stations and data centres, are becoming an inevitable part of the human society. However, with the demand for portability and raising cost of power, energy efficiency has emerged to be a major concern for modern computing platforms. As the complexity of on-chip systems increases, Network-on-Chip (NoC) has been proved as an efficient communication architecture which can further improve system performances and scalability while reducing the design cost. Therefore, in this thesis, we study and propose energy optimization approaches based on NoC architecture, with special focuses on the following aspects. As the architectural trend of future computing platforms, 3D systems have many bene ts including higher integration density, smaller footprint, heterogeneous integration, etc. Moreover, 3D technology can signi cantly improve the network communication and effectively avoid long wirings, and therefore, provide higher system performance and energy efficiency. With the dynamic nature of on-chip communication in large scale NoC based systems, run-time system optimization is of crucial importance in order to achieve higher system reliability and essentially energy efficiency. In this thesis, we propose an agent based system design approach where agents are on-chip components which monitor and control system parameters such as supply voltage, operating frequency, etc. With this approach, we have analysed the implementation alternatives for dynamic voltage and frequency scaling and power gating techniques at different granularity, which reduce both dynamic and leakage energy consumption. Topologies, being one of the key factors for NoCs, are also explored for energy saving purpose. A Honeycomb NoC architecture is proposed in this thesis with turn-model based deadlock-free routing algorithms. Our analysis and simulation based evaluation show that Honeycomb NoCs outperform their Mesh based counterparts in terms of network cost, system performance as well as energy efficiency.
Resumo:
Electronic applications are currently developed under the reuse-based paradigm. This design methodology presents several advantages for the reduction of the design complexity, but brings new challenges for the test of the final circuit. The access to embedded cores, the integration of several test methods, and the optimization of the several cost factors are just a few of the several problems that need to be tackled during test planning. Within this context, this thesis proposes two test planning approaches that aim at reducing the test costs of a core-based system by means of hardware reuse and integration of the test planning into the design flow. The first approach considers systems whose cores are connected directly or through a functional bus. The test planning method consists of a comprehensive model that includes the definition of a multi-mode access mechanism inside the chip and a search algorithm for the exploration of the design space. The access mechanism model considers the reuse of functional connections as well as partial test buses, cores transparency, and other bypass modes. The test schedule is defined in conjunction with the access mechanism so that good trade-offs among the costs of pins, area, and test time can be sought. Furthermore, system power constraints are also considered. This expansion of concerns makes it possible an efficient, yet fine-grained search, in the huge design space of a reuse-based environment. Experimental results clearly show the variety of trade-offs that can be explored using the proposed model, and its effectiveness on optimizing the system test plan. Networks-on-chip are likely to become the main communication platform of systemson- chip. Thus, the second approach presented in this work proposes the reuse of the on-chip network for the test of the cores embedded into the systems that use this communication platform. A power-aware test scheduling algorithm aiming at exploiting the network characteristics to minimize the system test time is presented. The reuse strategy is evaluated considering a number of system configurations, such as different positions of the cores in the network, power consumption constraints and number of interfaces with the tester. Experimental results show that the parallelization capability of the network can be exploited to reduce the system test time, whereas area and pin overhead are strongly minimized. In this manuscript, the main problems of the test of core-based systems are firstly identified and the current solutions are discussed. The problems being tackled by this thesis are then listed and the test planning approaches are detailed. Both test planning techniques are validated for the recently released ITC’02 SoC Test Benchmarks, and further compared to other test planning methods of the literature. This comparison confirms the efficiency of the proposed methods.
Resumo:
Com as recentes tecnologias de fabricação é possível integrar milhões de transistores em um único chip, permitindo a criação dos chamados System-on-Chip (SoCs), que integram em um único chip um grande número de componentes (tipicamente blocos reutilizáveis conhecidos por núcleos). Quanto mais complexos forem estes sistemas, melhores técnicas de projeto serão necessárias para também reduzir o tempo e custo do projeto. Uma destas técnicas, chamada de Network-on-Chip (NoC), permite melhorar a performance da comunicação entre os núcleos e, ao mesmo tempo, fornecer uma plataforma de comunicação escalável e que pode ser reutilizada para um grande número de sistemas. Uma NoC pode ser definida como uma estrutura de roteadores e canais ponto-a-ponto que interconectam os núcleos de um sistema, provendo o suporte de comunicação entre eles. Os dados são transmitidos pela rede na forma de mensagens, que podem ser divididas em unidades menores chamadas de pacote. Uma das desvantagens desta plataforma de comunicação é o impacto na área do sistema causado pelos roteadores. Dentro deste contexto, este trabalho apresenta uma arquitetura de roteador de baixo custo, com o objetivo de permitir o uso de NoCs em sistemas onde a área do roteador representará um grande impacto no custo do sistema. A arquitetura deste roteador, chamado de Tonga, é baseada em um roteador chamado RASoC, um soft-core para SoCs. Nesta dissertação será apresentada também uma rede heterogênea, baseada na rede SoCIN, e composta por dois tipos de roteadores – RASoC e Tonga. Estes roteadores visam diferentes objetivos: Rasoc alcança uma maior performance comparada ao Tonga, mas ocupa área consideravelmente maior. Potencialmente, uma NoC heterogênea otimizada pode ser desenvolvida combinando estes roteadores, procurando o melhor compromisso entre área e latência. Os modelos desenvolvidos permitem a estimativa de área e do desempenho das arquiteturas de comunicação propostas e são apresentados resultados de performance para algumas aplicações.
Resumo:
It bet on the next generation of computers as architecture with multiple processors and/or multicore processors. In this sense there are challenges related to features interconnection, operating frequency, the area on chip, power dissipation, performance and programmability. The mechanism of interconnection and communication it was considered ideal for this type of architecture are the networks-on-chip, due its scalability, reusability and intrinsic parallelism. The networks-on-chip communication is accomplished by transmitting packets that carry data and instructions that represent requests and responses between the processing elements interconnected by the network. The transmission of packets is accomplished as in a pipeline between the routers in the network, from source to destination of the communication, even allowing simultaneous communications between pairs of different sources and destinations. From this fact, it is proposed to transform the entire infrastructure communication of network-on-chip, using the routing mechanisms, arbitration and storage, in a parallel processing system for high performance. In this proposal, the packages are formed by instructions and data that represent the applications, which are executed on routers as well as they are transmitted, using the pipeline and parallel communication transmissions. In contrast, traditional processors are not used, but only single cores that control the access to memory. An implementation of this idea is called IPNoSys (Integrated Processing NoC System), which has an own programming model and a routing algorithm that guarantees the execution of all instructions in the packets, preventing situations of deadlock, livelock and starvation. This architecture provides mechanisms for input and output, interruption and operating system support. As proof of concept was developed a programming environment and a simulator for this architecture in SystemC, which allows configuration of various parameters and to obtain several results to evaluate it
Resumo:
The increase of capacity to integrate transistors permitted to develop completed systems, with several components, in single chip, they are called SoC (System-on-Chip). However, the interconnection subsystem cans influence the scalability of SoCs, like buses, or can be an ad hoc solution, like bus hierarchy. Thus, the ideal interconnection subsystem to SoCs is the Network-on-Chip (NoC). The NoCs permit to use simultaneous point-to-point channels between components and they can be reused in other projects. However, the NoCs can raise the complexity of project, the area in chip and the dissipated power. Thus, it is necessary or to modify the way how to use them or to change the development paradigm. Thus, a system based on NoC is proposed, where the applications are described through packages and performed in each router between source and destination, without traditional processors. To perform applications, independent of number of instructions and of the NoC dimensions, it was developed the spiral complement algorithm, which finds other destination until all instructions has been performed. Therefore, the objective is to study the viability of development that system, denominated IPNoSys system. In this study, it was developed a tool in SystemC, using accurate cycle, to simulate the system that performs applications, which was implemented in a package description language, also developed to this study. Through the simulation tool, several result were obtained that could be used to evaluate the system performance. The methodology used to describe the application corresponds to transform the high level application in data-flow graph that become one or more packages. This methodology was used in three applications: a counter, DCT-2D and float add. The counter was used to evaluate a deadlock solution and to perform parallel application. The DCT was used to compare to STORM platform. Finally, the float add aimed to evaluate the efficiency of the software routine to perform a unimplemented hardware instruction. The results from simulation confirm the viability of development of IPNoSys system. They showed that is possible to perform application described in packages, sequentially or parallelly, without interruptions caused by deadlock, and also showed that the execution time of IPNoSys is more efficient than the STORM platform
Resumo:
Alongside the advances of technologies, embedded systems are increasingly present in our everyday. Due to increasing demand for functionalities, many tasks are split among processors, requiring more efficient communication architectures, such as networks on chip (NoC). The NoCs are structures that have routers with channel point-to-point interconnect the cores of system on chip (SoC), providing communication. There are several networks on chip in the literature, each with its specific characteristics. Among these, for this work was chosen the Integrated Processing System NoC (IPNoSyS) as a network on chip with different characteristics compared to general NoCs, because their routing components also accumulate processing function, ie, units have functional able to execute instructions. With this new model, packets are processed and routed by the router architecture. This work aims at improving the performance of applications that have repetition, since these applications spend more time in their execution, which occurs through repeated execution of his instructions. Thus, this work proposes to optimize the runtime of these structures by employing a technique of instruction-level parallelism, in order to optimize the resources offered by the architecture. The applications are tested on a dedicated simulator and the results compared with the original version of the architecture, which in turn, implements only packet level parallelism