919 resultados para Network-on-chip
Resumo:
The digital electronic market development is founded on the continuous reduction of the transistors size, to reduce area, power, cost and increase the computational performance of integrated circuits. This trend, known as technology scaling, is approaching the nanometer size. The lithographic process in the manufacturing stage is increasing its uncertainty with the scaling down of the transistors size, resulting in a larger parameter variation in future technology generations. Furthermore, the exponential relationship between the leakage current and the threshold voltage, is limiting the threshold and supply voltages scaling, increasing the power density and creating local thermal issues, such as hot spots, thermal runaway and thermal cycles. In addiction, the introduction of new materials and the smaller devices dimension are reducing transistors robustness, that combined with high temperature and frequently thermal cycles, are speeding up wear out processes. Those effects are no longer addressable only at the process level. Consequently the deep sub-micron devices will require solutions which will imply several design levels, as system and logic, and new approaches called Design For Manufacturability (DFM) and Design For Reliability. The purpose of the above approaches is to bring in the early design stages the awareness of the device reliability and manufacturability, in order to introduce logic and system able to cope with the yield and reliability loss. The ITRS roadmap suggests the following research steps to integrate the design for manufacturability and reliability in the standard CAD automated design flow: i) The implementation of new analysis algorithms able to predict the system thermal behavior with the impact to the power and speed performances. ii) High level wear out models able to predict the mean time to failure of the system (MTTF). iii) Statistical performance analysis able to predict the impact of the process variation, both random and systematic. The new analysis tools have to be developed beside new logic and system strategies to cope with the future challenges, as for instance: i) Thermal management strategy that increase the reliability and life time of the devices acting to some tunable parameter,such as supply voltage or body bias. ii) Error detection logic able to interact with compensation techniques as Adaptive Supply Voltage ASV, Adaptive Body Bias ABB and error recovering, in order to increase yield and reliability. iii) architectures that are fundamentally resistant to variability, including locally asynchronous designs, redundancy, and error correcting signal encodings (ECC). The literature already features works addressing the prediction of the MTTF, papers focusing on thermal management in the general purpose chip, and publications on statistical performance analysis. In my Phd research activity, I investigated the need for thermal management in future embedded low-power Network On Chip (NoC) devices.I developed a thermal analysis library, that has been integrated in a NoC cycle accurate simulator and in a FPGA based NoC simulator. The results have shown that an accurate layout distribution can avoid the onset of hot-spot in a NoC chip. Furthermore the application of thermal management can reduce temperature and number of thermal cycles, increasing the systemreliability. Therefore the thesis advocates the need to integrate a thermal analysis in the first design stages for embedded NoC design. Later on, I focused my research in the development of statistical process variation analysis tool that is able to address both random and systematic variations. The tool was used to analyze the impact of self-timed asynchronous logic stages in an embedded microprocessor. As results we confirmed the capability of self-timed logic to increase the manufacturability and reliability. Furthermore we used the tool to investigate the suitability of low-swing techniques in the NoC system communication under process variations. In this case We discovered the superior robustness to systematic process variation of low-swing links, which shows a good response to compensation technique as ASV and ABB. Hence low-swing is a good alternative to the standard CMOS communication for power, speed, reliability and manufacturability. In summary my work proves the advantage of integrating a statistical process variation analysis tool in the first stages of the design flow.
Resumo:
In questa tesi sono stati apportati due importanti contributi nel campo degli acceleratori embedded many-core. Abbiamo implementato un runtime OpenMP ottimizzato per la gestione del tasking model per sistemi a processori strettamente accoppiati in cluster e poi interconnessi attraverso una network on chip. Ci siamo focalizzati sulla loro scalabilità e sul supporto di task di granularità fine, come è tipico nelle applicazioni embedded. Il secondo contributo di questa tesi è stata proporre una estensione del runtime di OpenMP che cerca di prevedere la manifestazione di errori dati da fenomeni di variability tramite una schedulazione efficiente del carico di lavoro.
Resumo:
The performance, energy efficiency and cost improvements due to traditional technology scaling have begun to slow down and present diminishing returns. Underlying reasons for this trend include fundamental physical limits of transistor scaling, the growing significance of quantum effects as transistors shrink, and a growing mismatch between transistors and interconnects regarding size, speed and power. Continued Moore's Law scaling will not come from technology scaling alone, and must involve improvements to design tools and development of new disruptive technologies such as 3D integration. 3D integration presents potential improvements to interconnect power and delay by translating the routing problem into a third dimension, and facilitates transistor density scaling independent of technology node. Furthermore, 3D IC technology opens up a new architectural design space of heterogeneously-integrated high-bandwidth CPUs. Vertical integration promises to provide the CPU architectures of the future by integrating high performance processors with on-chip high-bandwidth memory systems and highly connected network-on-chip structures. Such techniques can overcome the well-known CPU performance bottlenecks referred to as memory and communication wall. However the promising improvements to performance and energy efficiency offered by 3D CPUs does not come without cost, both in the financial investments to develop the technology, and the increased complexity of design. Two main limitations to 3D IC technology have been heat removal and TSV reliability. Transistor stacking creates increases in power density, current density and thermal resistance in air cooled packages. Furthermore the technology introduces vertical through silicon vias (TSVs) that create new points of failure in the chip and require development of new BEOL technologies. Although these issues can be controlled to some extent using thermal-reliability aware physical and architectural 3D design techniques, high performance embedded cooling schemes, such as micro-fluidic (MF) cooling, are fundamentally necessary to unlock the true potential of 3D ICs. A new paradigm is being put forth which integrates the computational, electrical, physical, thermal and reliability views of a system. The unification of these diverse aspects of integrated circuits is called Co-Design. Independent design and optimization of each aspect leads to sub-optimal designs due to a lack of understanding of cross-domain interactions and their impacts on the feasibility region of the architectural design space. Co-Design enables optimization across layers with a multi-domain view and thus unlocks new high-performance and energy efficient configurations. Although the co-design paradigm is becoming increasingly necessary in all fields of IC design, it is even more critical in 3D ICs where, as we show, the inter-layer coupling and higher degree of connectivity between components exacerbates the interdependence between architectural parameters, physical design parameters and the multitude of metrics of interest to the designer (i.e. power, performance, temperature and reliability). In this dissertation we present a framework for multi-domain co-simulation and co-optimization of 3D CPU architectures with both air and MF cooling solutions. Finally we propose an approach for design space exploration and modeling within the new Co-Design paradigm, and discuss the possible avenues for improvement of this work in the future.
Resumo:
Monitoring and tracking of IP traffic flows are essential for network services (i.e. packet forwarding). Packet header lookup is the main part of flow identification by determining the predefined matching action for each incoming flow. In this paper, an improved header lookup and flow rule update solution is investigated. A detailed study of several well-known lookup algorithms reveals that searching individual packet header field and combining the results achieve high lookup speed and flexibility. The proposed hybrid lookup architecture is comprised of various lookup algorithms, which are selected based on the user applications and system requirements.
Resumo:
The discrete-time neural network proposed by Hopfield can be used for storing and recognizing binary patterns. Here, we investigate how the performance of this network on pattern recognition task is altered when neurons are removed and the weights of the synapses corresponding to these deleted neurons are divided among the remaining synapses. Five distinct ways of distributing such weights are evaluated. We speculate how this numerical work about synaptic compensation may help to guide experimental studies on memory rehabilitation interventions.
Resumo:
The emergence of smartphones with Wireless LAN (WiFi) network interfaces brought new challenges to application developers. The expected increase of users connectivity will impact their expectations for example on the performance of background applications. Unfortunately, the number and breadth of the studies on the new patterns of user mobility and connectivity that result from the emergence of smartphones is still insufficient to support this claim. This paper contributes with preliminary results on a large scale study of the usage pattern of about 49000 devices and 31000 users who accessed at least one access point of the eduroam WiFi network on the campuses of the Lisbon Polytechnic Institute. Results confirm that the increasing number of smartphones resulted in significant changes to the pattern of use, with impact on the amount of traffic and users connection time.
Resumo:
The lanthanide binuclear helicate [Eu(2)(L(C2(CO(2)H)))(3)] is coupled to avidin to yield a luminescent bioconjugate EuB1 (Q = 9.3%, tau((5)D(0)) = 2.17 ms). MALDI/TOF mass spectrometry confirms the covalent binding of the Eu chelate and UV-visible spectroscopy allows one to determine a luminophore/protein ratio equal to 3.2. Bio-affinity assays involving the recognition of a mucin-like protein expressed on human breast cancer MCF-7 cells by a biotinylated monoclonal antibody 5D10 to which EuB1 is attached via avidin-biotin coupling demonstrate that (i) avidin activity is little affected by the coupling reaction and (ii) detection limits obtained by time-resolved (TR) luminescence with EuB1 and a commercial Eu-avidin conjugate are one order of magnitude lower than those of an organic conjugate (FITC-streptavidin). In the second part of the paper, conditions for growing MCF-7 cells in 100-200 microm wide microchannels engraved in PDMS are established; we demonstrate that EuB1 can be applied as effectively on this lab-on-a-chip device for the detection of tumour-associated antigens as on MCF-7 cells grown in normal culture vials. In order to exploit the versatility of the ligand used for self-assembling [Ln(2)(L(C2(CO(2)H)))(3)] helicates, which sensitizes the luminescence of both Eu(III) and Tb(III) ions, a dual on-chip assay is proposed in which estrogen receptors (ERs) and human epidermal growth factor receptors (Her2/neu) can be simultaneously detected on human breast cancer tissue sections. The Ln helicates are coupled to two secondary antibodies: ERs are visualized by red-emitting EuB4 using goat anti-mouse IgG and Her2/neu receptors by green-emitting TbB5 using goat anti-rabbit IgG. The fact that the assay is more than 6 times faster and requires 5 times less reactants than conventional immunohistochemical assays provides essential advantages over conventional immunohistochemistry for future clinical biomarker detection.
Resumo:
The study examines the internationalisation process of a contemporary SME firm and explores the impact of its business network on this development. The objective of the study is to understand SME internationalisation and its dynamics from a network perspective. The purpose of this research project is to describe and explore the development process of a firm and its business network by identifying the changes, critical events and influence factors that form this development. It is a qualitative case study, which focuses on a Finnish focal firm and its respective business network as it expands into the Greek market. It is a longitudinal research process, which covers a period of time from 1994 to 2004. The empirical study concentrates on the paper trading and converting business. The study builds on the network theory and the framework provided by Johanson and Mattsson's (1988) model on network internationalisation. The incremental internationalisation theories and network theories form the theoretical focus. The research project is organised according to a process view. The focal firm evolves from a domestically-oriented small subsidiary into an internationally experienced company, which has activities in several market areas and numerous business networks in various market segments and product categories. The findings illustrate the importance of both the domestic and foreign business network context in a firm's internationalisation process. The results of the study suggest theoretical modifications on a firm's internationalisation process by broadening the perspective and incorporating the strategic context of a firm. The findings suggest that internationalisation process is a non-linear process, which does not have a deterministic order in its development. The findings emphasise the significance of relational networks, both managerial and entrepreneurial, for establishing position in foreign markets. It implies that a firm's evolution is significantly influenced by its business network and by critical events. Business networks gain coherence due to common goals and they use accumulated capabilities to exploit market opportunities. The business network sets constraints and provides opportunities, which makes the related decision making strategically important. The firm co-evolves with its business network. The research project provides an instrumental case study with a description of an SME internationalisation process. It contributes to existing knowledge by illustrating dynamics in an international business network and by pinpointing the importance of suppliers, customers, partners, ownerships and competition to the internationalisation process.
Resumo:
Com o advento dos processos submicrônicos, a capacidade de integração de transistores tem atingido níveis que possibilitam a construção de um sistema completo em uma única pastilha de silício. Esses sistemas, denominados sistemas integrados, baseiam-se no reuso de blocos previamente projetados e verificados, os quais são chamados de núcleos ou blocos de propriedade intelectual. Os sistemas integrados atuais incluem algumas poucas dezenas de núcleos, os quais são interconectados por meio de arquiteturas de comunicação baseadas em estruturas dedicadas de canais ponto-a-ponto ou em estruturas reutilizáveis constituídas por canais multiponto, denominadas barramentos. Os futuros sistemas integrados irão incluir de dezenas a centenas de núcleos em um mesmo chip com até alguns bilhões de transistores, sendo que, para atender às pressões do mercado e amortizar os custos de projeto entre vários sistemas, é importante que todos os seus componentes sejam reutilizáveis, incluindo a arquitetura de comunicação. Das arquiteturas utilizadas atualmente, o barramento é a única que oferece reusabilidade. Porém, o seu desempenho em comunicação e o seu consumo de energia degradam com o crescimento do sistema. Para atender aos requisitos dos futuros sistemas integrados, uma nova alternativa de arquitetura de comunicação tem sido proposta na comunidade acadêmica. Essa arquitetura, denominada rede-em-chip, baseia-se nos conceitos utilizados nas redes de interconexão para computadores paralelos. Esta tese se situa nesse contexto e apresenta uma arquitetura de rede-em-chip e um conjunto de modelos para a avaliação de área e desempenho de arquiteturas de comunicação para sistemas integrados. A arquitetura apresentada é denominada SoCIN (System-on-Chip Interconnection Network) e apresenta como diferencial o fato de poder ser dimensionada de modo a atender a requisitos de custo e desempenho da aplicação alvo. Os modelos desenvolvidos permitem a estimativa em alto nível da área em silício e do desempenho de arquiteturas de comunicação do tipo barramento e rede-em-chip. São apresentados resultados que demonstram a efetividade das redes-em-chip e indicam as condições que definem a aplicabilidade das mesmas.
Resumo:
Multi-Processor SoC (MPSOC) design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. Scaling down of process technologies has increased process and dynamic variations as well as transistor wearout. Because of this, delay variations increase and impact the performance of the MPSoCs. The interconnect architecture inMPSoCs becomes a single point of failure as it connects all other components of the system together. A faulty processing element may be shut down entirely, but the interconnect architecture must be able to tolerate partial failure and variations and operate with performance, power or latency overhead. This dissertation focuses on techniques at different levels of abstraction to face with the reliability and variability issues in on-chip interconnection networks. By showing the test results of a GALS NoC testchip this dissertation motivates the need for techniques to detect and work around manufacturing faults and process variations in MPSoCs’ interconnection infrastructure. As a physical design technique, we propose the bundle routing framework as an effective way to route the Network on Chips’ global links. For architecture-level design, two cases are addressed: (I) Intra-cluster communication where we propose a low-latency interconnect with variability robustness (ii) Inter-cluster communication where an online functional testing with a reliable NoC configuration are proposed. We also propose dualVdd as an orthogonal way of compensating variability at the post-fabrication stage. This is an alternative strategy with respect to the design techniques, since it enforces the compensation at post silicon stage.
Resumo:
In this paper the model of an Innovative Monitoring Network involving properly connected nodes to develop an Information and Communication Technology (ICT) solution for preventive maintenance of historical centres from early warnings is proposed. It is well known that the protection of historical centres generally goes from a large-scale monitoring to a local one and it could be supported by a unique ICT solution. More in detail, the models of a virtually organized monitoring system could enable the implementation of automated analyses by presenting various alert levels. An adequate ICT solution tool would allow to define a monitoring network for a shared processing of data and results. Thus, a possible retrofit solution could be planned for pilot cases shared among the nodes of the network on the basis of a suitable procedure utilizing a retrofit catalogue. The final objective would consist in providing a model of an innovative tool to identify hazards, damages and possible retrofit solutions for historical centres, assuring an easy early warning support for stakeholders. The action could proactively target the needs and requirements of users, such as decision makers responsible for damage mitigation and safeguarding of cultural heritage assets.
Resumo:
Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.
Resumo:
Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.
Resumo:
The first topic analyzed in the thesis will be Neural Architecture Search (NAS). I will focus on two different tools that I developed, one to optimize the architecture of Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing that has recently emerged, and one to optimize the data precision of tensors inside CNNs. The first NAS proposed explicitly targets the optimization of the most peculiar architectural parameters of TCNs, namely dilation, receptive field, and the number of features in each layer. Note that this is the first NAS that explicitly targets these networks. The second NAS proposed instead focuses on finding the most efficient data format for a target CNN, with the granularity of the layer filter. Note that applying these two NASes in sequence allows an "application designer" to minimize the structure of the neural network employed, minimizing the number of operations or the memory usage of the network. After that, the second topic described is the optimization of neural network deployment on edge devices. Importantly, exploiting edge platforms' scarce resources is critical for NN efficient execution on MCUs. To do so, I will introduce DORY (Deployment Oriented to memoRY) -- an automatic tool to deploy CNNs on low-cost MCUs. DORY, in different steps, can manage different levels of memory inside the MCU automatically, offload the computation workload (i.e., the different layers of a neural network) to dedicated hardware accelerators, and automatically generates ANSI C code that orchestrates off- and on-chip transfers with the computation phases. On top of this, I will introduce two optimized computation libraries that DORY can exploit to deploy TCNs and Transformers on edge efficiently. I conclude the thesis with two different applications on bio-signal analysis, i.e., heart rate tracking and sEMG-based gesture recognition.
Resumo:
Miniaturized flying robotic platforms, called nano-drones, have the potential to revolutionize the autonomous robots industry sector thanks to their very small form factor. The nano-drones’ limited payload only allows for a sub-100mW microcontroller unit for the on-board computations. Therefore, traditional computer vision and control algorithms are too computationally expensive to be executed on board these palm-sized robots, and we are forced to rely on artificial intelligence to trade off accuracy in favor of lightweight pipelines for autonomous tasks. However, relying on deep learning exposes us to the problem of generalization since the deployment scenario of a convolutional neural network (CNN) is often composed by different visual cues and different features from those learned during training, leading to poor inference performances. Our objective is to develop and deploy and adaptation algorithm, based on the concept of latent replays, that would allow us to fine-tune a CNN to work in new and diverse deployment scenarios. To do so we start from an existing model for visual human pose estimation, called PULPFrontnet, which is used to identify the pose of a human subject in space through its 4 output variables, and we present the design of our novel adaptation algorithm, which features automatic data gathering and labeling and on-device deployment. We therefore showcase the ability of our algorithm to adapt PULP-Frontnet to new deployment scenarios, improving the R2 scores of the four network outputs, with respect to an unknown environment, from approximately [−0.2, 0.4, 0.0,−0.7] to [0.25, 0.45, 0.2, 0.1]. Finally we demonstrate how it is possible to fine-tune our neural network in real time (i.e., under 76 seconds), using the target parallel ultra-low power GAP 8 System-on-Chip on board the nano-drone, and we show how all adaptation operations can take place using less than 2mWh of energy, a small fraction of the available battery power.