912 resultados para design space exploration
Resumo:
The performance, energy efficiency and cost improvements due to traditional technology scaling have begun to slow down and present diminishing returns. Underlying reasons for this trend include fundamental physical limits of transistor scaling, the growing significance of quantum effects as transistors shrink, and a growing mismatch between transistors and interconnects regarding size, speed and power. Continued Moore's Law scaling will not come from technology scaling alone, and must involve improvements to design tools and development of new disruptive technologies such as 3D integration. 3D integration presents potential improvements to interconnect power and delay by translating the routing problem into a third dimension, and facilitates transistor density scaling independent of technology node. Furthermore, 3D IC technology opens up a new architectural design space of heterogeneously-integrated high-bandwidth CPUs. Vertical integration promises to provide the CPU architectures of the future by integrating high performance processors with on-chip high-bandwidth memory systems and highly connected network-on-chip structures. Such techniques can overcome the well-known CPU performance bottlenecks referred to as memory and communication wall. However the promising improvements to performance and energy efficiency offered by 3D CPUs does not come without cost, both in the financial investments to develop the technology, and the increased complexity of design. Two main limitations to 3D IC technology have been heat removal and TSV reliability. Transistor stacking creates increases in power density, current density and thermal resistance in air cooled packages. Furthermore the technology introduces vertical through silicon vias (TSVs) that create new points of failure in the chip and require development of new BEOL technologies. Although these issues can be controlled to some extent using thermal-reliability aware physical and architectural 3D design techniques, high performance embedded cooling schemes, such as micro-fluidic (MF) cooling, are fundamentally necessary to unlock the true potential of 3D ICs. A new paradigm is being put forth which integrates the computational, electrical, physical, thermal and reliability views of a system. The unification of these diverse aspects of integrated circuits is called Co-Design. Independent design and optimization of each aspect leads to sub-optimal designs due to a lack of understanding of cross-domain interactions and their impacts on the feasibility region of the architectural design space. Co-Design enables optimization across layers with a multi-domain view and thus unlocks new high-performance and energy efficient configurations. Although the co-design paradigm is becoming increasingly necessary in all fields of IC design, it is even more critical in 3D ICs where, as we show, the inter-layer coupling and higher degree of connectivity between components exacerbates the interdependence between architectural parameters, physical design parameters and the multitude of metrics of interest to the designer (i.e. power, performance, temperature and reliability). In this dissertation we present a framework for multi-domain co-simulation and co-optimization of 3D CPU architectures with both air and MF cooling solutions. Finally we propose an approach for design space exploration and modeling within the new Co-Design paradigm, and discuss the possible avenues for improvement of this work in the future.
Resumo:
In this thesis, we deal with the design of experiments in the drug development process, focusing on the design of clinical trials for treatment comparisons (Part I) and the design of preclinical laboratory experiments for proteins development and manufacturing (Part II). In Part I we propose a multi-purpose design methodology for sequential clinical trials. We derived optimal allocations of patients to treatments for testing the efficacy of several experimental groups by also taking into account ethical considerations. We first consider exponential responses for survival trials and we then present a unified framework for heteroscedastic experimental groups that encompasses the general ANOVA set-up. The very good performance of the suggested optimal allocations, in terms of both inferential and ethical characteristics, are illustrated analytically and through several numerical examples, also performing comparisons with other designs proposed in the literature. Part II concerns the planning of experiments for processes composed of multiple steps in the context of preclinical drug development and manufacturing. Following the Quality by Design paradigm, the objective of the multi-step design strategy is the definition of the manufacturing design space of the whole process and, as we consider the interactions among the subsequent steps, our proposal ensures the quality and the safety of the final product, by enabling more flexibility and process robustness in the manufacturing.
Resumo:
As the development of integrated circuit technology continues to follow Moore’s law the complexity of circuits increases exponentially. Traditional hardware description languages such as VHDL and Verilog are no longer powerful enough to cope with this level of complexity and do not provide facilities for hardware/software codesign. Languages such as SystemC are intended to solve these problems by combining the powerful expression of high level programming languages and hardware oriented facilities of hardware description languages. To fully replace older languages in the desing flow of digital systems SystemC should also be synthesizable. The devices required by modern high speed networks often share the same tight constraints for e.g. size, power consumption and price with embedded systems but have also very demanding real time and quality of service requirements that are difficult to satisfy with general purpose processors. Dedicated hardware blocks of an application specific instruction set processor are one way to combine fast processing speed, energy efficiency, flexibility and relatively low time-to-market. Common features can be identified in the network processing domain making it possible to develop specialized but configurable processor architectures. One such architecture is the TACO which is based on transport triggered architecture. The architecture offers a high degree of parallelism and modularity and greatly simplified instruction decoding. For this M.Sc.(Tech) thesis, a simulation environment for the TACO architecture was developed with SystemC 2.2 using an old version written with SystemC 1.0 as a starting point. The environment enables rapid design space exploration by providing facilities for hw/sw codesign and simulation and an extendable library of automatically configured reusable hardware blocks. Other topics that are covered are the differences between SystemC 1.0 and 2.2 from the viewpoint of hardware modeling, and compilation of a SystemC model into synthesizable VHDL with Celoxica Agility SystemC Compiler. A simulation model for a processor for TCP/IP packet validation was designed and tested as a test case for the environment.
Resumo:
With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.
Resumo:
Electronic applications are currently developed under the reuse-based paradigm. This design methodology presents several advantages for the reduction of the design complexity, but brings new challenges for the test of the final circuit. The access to embedded cores, the integration of several test methods, and the optimization of the several cost factors are just a few of the several problems that need to be tackled during test planning. Within this context, this thesis proposes two test planning approaches that aim at reducing the test costs of a core-based system by means of hardware reuse and integration of the test planning into the design flow. The first approach considers systems whose cores are connected directly or through a functional bus. The test planning method consists of a comprehensive model that includes the definition of a multi-mode access mechanism inside the chip and a search algorithm for the exploration of the design space. The access mechanism model considers the reuse of functional connections as well as partial test buses, cores transparency, and other bypass modes. The test schedule is defined in conjunction with the access mechanism so that good trade-offs among the costs of pins, area, and test time can be sought. Furthermore, system power constraints are also considered. This expansion of concerns makes it possible an efficient, yet fine-grained search, in the huge design space of a reuse-based environment. Experimental results clearly show the variety of trade-offs that can be explored using the proposed model, and its effectiveness on optimizing the system test plan. Networks-on-chip are likely to become the main communication platform of systemson- chip. Thus, the second approach presented in this work proposes the reuse of the on-chip network for the test of the cores embedded into the systems that use this communication platform. A power-aware test scheduling algorithm aiming at exploiting the network characteristics to minimize the system test time is presented. The reuse strategy is evaluated considering a number of system configurations, such as different positions of the cores in the network, power consumption constraints and number of interfaces with the tester. Experimental results show that the parallelization capability of the network can be exploited to reduce the system test time, whereas area and pin overhead are strongly minimized. In this manuscript, the main problems of the test of core-based systems are firstly identified and the current solutions are discussed. The problems being tackled by this thesis are then listed and the test planning approaches are detailed. Both test planning techniques are validated for the recently released ITC’02 SoC Test Benchmarks, and further compared to other test planning methods of the literature. This comparison confirms the efficiency of the proposed methods.
Resumo:
The focus of this thesis is to discuss the development and modeling of an interface architecture to be employed for interfacing analog signals in mixed-signal SOC. We claim that the approach that is going to be presented is able to achieve wide frequency range, and covers a large range of applications with constant performance, allied to digital configuration compatibility. Our primary assumptions are to use a fixed analog block and to promote application configurability in the digital domain, which leads to a mixed-signal interface. The use of a fixed analog block avoids the performance loss common to configurable analog blocks. The usage of configurability on the digital domain makes possible the use of all existing tools for high level design, simulation and synthesis to implement the target application, with very good performance prediction. The proposed approach utilizes the concept of frequency translation (mixing) of the input signal followed by its conversion to the ΣΔ domain, which makes possible the use of a fairly constant analog block, and also, a uniform treatment of input signal from DC to high frequencies. The programmability is performed in the ΣΔ digital domain where performance can be closely achieved according to application specification. The interface performance theoretical and simulation model are developed for design space exploration and for physical design support. Two prototypes are built and characterized to validate the proposed model and to implement some application examples. The usage of this interface as a multi-band parametric ADC and as a two channels analog multiplier and adder are shown. The multi-channel analog interface architecture is also presented. The characterization measurements support the main advantages of the approach proposed.
Resumo:
Virtual platforms are of paramount importance for design space exploration and their usage in early software development and verification is crucial. In particular, enabling accurate and fast simulation is specially useful, but such features are usually conflicting and tradeoffs have to be made. In this paper we describe how we integrated TLM communication mechanisms into a state-of-the-art, cycle-accurate, MPSoC simulation platform. More specifically, we show how we adapted ArchC fast functional instruction set simulators to the MPARM platform in order to achieve both fast simulation speed and accuracy. Our implementation led to a much faster hybrid platform, reaching speedups of up to 2.9 and 2.1x on average with negligible impact on power estimation accuracy (average 3.26% and 2.25% of standard deviation). © 2011 IEEE.
Resumo:
During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.
Resumo:
Nowadays the rise of non-recurring engineering (NRE) costs associated with complexity is becoming a major factor in SoC design, limiting both scaling opportunities and the flexibility advantages offered by the integration of complex computational units. The introduction of embedded programmable elements can represent an appealing solution, able both to guarantee the desired flexibility and upgradabilty and to widen the SoC market. In particular embedded FPGA (eFPGA) cores can provide bit-level optimization for those applications which benefits from synthesis, paying on the other side in terms of performance penalties and area overhead with respect to standard cell ASIC implementations. In this scenario this thesis proposes a design methodology for a synthesizable programmable device designed to be embedded in a SoC. A soft-core embedded FPGA (eFPGA) is hence presented and analyzed in terms of the opportunities given by a fully synthesizable approach, following an implementation flow based on Standard-Cell methodology. A key point of the proposed eFPGA template is that it adopts a Multi-Stage Switching Network (MSSN) as the foundation of the programmable interconnects, since it can be efficiently synthesized and optimized through a standard cell based implementation flow, ensuring at the same time an intrinsic congestion-free network topology. The evaluation of the flexibility potentialities of the eFPGA has been performed using different technology libraries (STMicroelectronics CMOS 65nm and BCD9s 0.11μm) through a design space exploration in terms of area-speed-leakage tradeoffs, enabled by the full synthesizability of the template. Since the most relevant disadvantage of the adopted soft approach, compared to a hardcore, is represented by a performance overhead increase, the eFPGA analysis has been made targeting small area budgets. The generation of the configuration bitstream has been obtained thanks to the implementation of a custom CAD flow environment, and has allowed functional verification and performance evaluation through an application-aware analysis.
Resumo:
La partición hardware/software es una etapa clave dentro del proceso de co-diseño de los sistemas embebidos. En esta etapa se decide qué componentes serán implementados como co-procesadores de hardware y qué componentes serán implementados en un procesador de propósito general. La decisión es tomada a partir de la exploración del espacio de diseño, evaluando un conjunto de posibles soluciones para establecer cuál de estas es la que mejor balance logra entre todas las métricas de diseño. Para explorar el espacio de soluciones, la mayoría de las propuestas, utilizan algoritmos metaheurísticos; destacándose los Algoritmos Genéticos, Recocido Simulado. Esta decisión, en muchos casos, no es tomada a partir de análisis comparativos que involucren a varios algoritmos sobre un mismo problema. En este trabajo se presenta la aplicación de los algoritmos: Escalador de Colinas Estocástico y Escalador de Colinas Estocástico con Reinicio, para resolver el problema de la partición hardware/software. Para validar el empleo de estos algoritmos se presenta la aplicación de este algoritmo sobre un caso de estudio, en particular la partición hardware/software de un codificador JPEG. En todos los experimentos es posible apreciar que ambos algoritmos alcanzan soluciones comparables con las obtenidas por los algoritmos utilizados con más frecuencia.
Resumo:
Robots are needed to perform important field tasks such as hazardous material clean-up, nuclear site inspection, and space exploration. Unfortunately their use is not widespread due to their long development times and high costs. To make them practical, a modular design approach is proposed. Prefabricated modules are rapidly assembled to give a low-cost system for a specific task. This paper described the modular design problem for field robots and the application of a hierarchical selection process to solve this problem. Theoretical analysis and an example case study are presented. The theoretical analysis of the modular design problem revealed the large size of the search space. It showed the advantages of approaching the design on various levels. The hierarchical selection process applies physical rules to reduce the search space to a computationally feasible size and a genetic algorithm performs the final search in a greatly reduced space. This process is based on the observation that simple physically based rules can eliminate large sections of the design space to greatly simplify the search. The design process is applied to a duct inspection task. Five candidate robots were developed. Two of these robots are evaluated using detailed physical simulation. It is shown that the more obvious solution is not able to complete the task, while the non-obvious asymmetric design develop by the process is successful.
Resumo:
Self-stabilization is a property of a distributed system such that, regardless of the legitimacy of its current state, the system behavior shall eventually reach a legitimate state and shall remain legitimate thereafter. The elegance of self-stabilization stems from the fact that it distinguishes distributed systems by a strong fault tolerance property against arbitrary state perturbations. The difficulty of designing and reasoning about self-stabilization has been witnessed by many researchers; most of the existing techniques for the verification and design of self-stabilization are either brute-force, or adopt manual approaches non-amenable to automation. In this dissertation, we first investigate the possibility of automatically designing self-stabilization through global state space exploration. In particular, we develop a set of heuristics for automating the addition of recovery actions to distributed protocols on various network topologies. Our heuristics equally exploit the computational power of a single workstation and the available parallelism on computer clusters. We obtain existing and new stabilizing solutions for classical protocols like maximal matching, ring coloring, mutual exclusion, leader election and agreement. Second, we consider a foundation for local reasoning about self-stabilization; i.e., study the global behavior of the distributed system by exploring the state space of just one of its components. It turns out that local reasoning about deadlocks and livelocks is possible for an interesting class of protocols whose proof of stabilization is otherwise complex. In particular, we provide necessary and sufficient conditions – verifiable in the local state space of every process – for global deadlock- and livelock-freedom of protocols on ring topologies. Local reasoning potentially circumvents two fundamental problems that complicate the automated design and verification of distributed protocols: (1) state explosion and (2) partial state information. Moreover, local proofs of convergence are independent of the number of processes in the network, thereby enabling our assertions about deadlocks and livelocks to apply on rings of arbitrary sizes without worrying about state explosion.
Resumo:
2000 Mathematics Subject Classification: 78A50
Resumo:
A Work Project, presented as part of the requirements for the Award of a Masters Degree in Management from the NOVA – School of Business and Economics
Resumo:
This research addresses the problem of creating interactive experiences to encourage people to explore spaces. Besides the obvious spaces to visit, such as museums or art galleries, spaces that people visit can be, for example, a supermarket or a restaurant. As technology evolves, people become more demanding in the way they use it and expect better forms of interaction with the space that surrounds them. Interaction with the space allows information to be transmitted to the visitors in a friendly way, leading visitors to explore it and gain knowledge. Systems to provide better experiences while exploring spaces demand hardware and software that is not in the reach of every space owner either because of the cost or inconvenience of the installation, that can damage artefacts or the space environment. We propose a system adaptable to the spaces, that uses a video camera network and a wi-fi network present at the space (or that can be installed) to provide means to support interactive experiences using the visitor’s mobile device. The system is composed of an infrastructure (called vuSpot), a language grammar used to describe interactions at a space (called XploreDescription), a visual tool used to design interactive experiences (called XploreBuilder) and a tool used to create interactive experiences (called urSpace). By using XploreBuilder, a tool built of top of vuSpot, a user with little or no experience in programming can define a space and design interactive experiences. This tool generates a description of the space and of the interactions at that space (that complies with the XploreDescription grammar). These descriptions can be given to urSpace, another tool built of top of vuSpot, that creates the interactive experience application. With this system we explore new forms of interaction and use mobile devices and pico projectors to deliver additional information to the users leading to the creation of interactive experiences. The several components are presented as well as the results of the respective user tests, which were positive. The design and implementation becomes cheaper, faster, more flexible and, since it does not depend on the knowledge of a programming language, accessible for the general public.